DOTLESS ARABIC: HOW SOCIAL MEDIA USERS ARE ROLLING BACK LINGUISTIC DEVELOPMENT TO FOOL AI TRAWLING TECHNOLOGIES

By Max Wrey

The most recent round of violent clashes between Israel and Palestine has triggered the emergence of an Arabic social media trend with potentially serious implications. The adoption by social media users of dotless Arabic, where the diacritical markings are removed from Arabic text to evade trawling and translation technology, has risen exponentially in response to perceived discriminatory censorship by the major platforms.   

The role of social media in manifesting Arab thought

The Palestine issue has shown no sign of being displaced from front and centre of the Arab psyche, and in a region where social media penetration is relatively high, a significant majority of young Arabs consult social media, rather than traditional sources, for their news. As the situation in the Holy Land has grown increasingly fraught, the conversation in the streets has been dominated by Jerusalem and Palestine – and a huge volume of related activity across social media platforms has ensued. Amid the torrent of horror, sadness and vitriolic messages, platforms such as Instagram and Twitter have reasserted themselves as the most significant platforms for voices wanting to be heard and for the co-ordinators of solidarity events. Social media has become an unrivalled vehicle for manifesting Arab thought. 

Social media algorithms and accusations of discrimination

On social media, opposing beliefs converge on our screens. While freedom of speech is upheld by the major social media companies where possible and they aim to adopt a politically-neutral stance, content that breaches their policies on a range of criteria is nevertheless limited or removed.

The algorithms that are used to trawl the platforms’ content, identifying and removing posts that breach policy, have recently come under fire, with the social media companies accused of censoring pro-Palestine voices and removing or limiting the reach of posts containing certain hashtags. The Haifa-based non-profit social media observatory Arab Centre for the Advancement of Social Media (known colloquially as 7amleh) has been active in documenting the alleged censorship, and has accused Facebook, Instagram and Twitter of using discriminatory algorithms. In May 2021, amid rising tensions in the final week of Ramadan, 7amleh reported receiving hundreds of complaints of deleted posts and suspended accounts. Some of the affected users claimed to have received messages about “violating community standards” from Instagram.

In response to the furore, Twitter and Instagram issued official statements claiming accounts were suspended “in error” by automated systems. Media reports of alleged internal communications at Instagram suggest that posts containing the Arabic language hashtag #AlAqsa (الأقصى#) were blocked when the site’s content moderation system confused the term with the similar, albeit not identically named, Al Aqsa Martyrs Brigades, a US-designated terror group.

Regardless of the plausibility of this explanation, other high-profile social media figures have also highlighted what they regard as censorship. In May 2021 Tunisian model Azza Slimene complained to her 1.4 million Instagram followers that she could not post anything on her account after uploading a lengthy conversation with a Gaza public figure. She also claimed that she was unable to open an account on Twitter or post anything on TikTok. During the same period, US-based civil rights activist Khaled Beydoun claimed that Twitter removed his Gaza-related video. The hundreds of comments under these users’ protests suggest that the censorship was not confined only to them. Israeli Defence Minister Benny Gantz’s May 2021 zoom meeting with Facebook and TikTok executives only served to fuel some Arab users’ perceptions of unfairness.

The adoption of dotless Arabic to evade censorship

In response, users have rolled back centuries of linguistic development and turned to the use of Arabic without dots to fool artificial intelligence trawling technologies and evade censorship. The dots – diacritical markings on Arabic letters to denote consonants – have been in place in Arabic script since the seventh century, but an eye familiar with the language is still able to make sense of words and sentences in Arabic with the dots removed.

The existence of dotless Arabic throws up challenges to digital monitoring, with targeted word searches and automated translators struggling to decipher text correctly when the dots are removed from Arabic script. And the trend is being facilitated by the widespread sharing of free and easy-to-use software tools that convert normal Arabic to the dotless script.

This title of a widely-distributed and entirely dotless article in an Egyptian publication illustrates the phenomenon:

Google’s attempt at translating the dotless title presents the following: “Al-Asa ں Sadd Al-God: ٮ Al-Arat Al-Ariyah”.  But people familiar with the Arabic language make light work of the text, reading it correctly as: الانسان ضد الالة: ثورة النقاط العربية. Google has no problem with translating this more accurately as “Man against the Machine: The Revolution of Arabic Points”.

Managing the risk and looking ahead

The existence of online posts and articles in a script that can be read accurately by the human eye, but with which automated trawling and translating technology struggles, is clearly a concern. While many of those employing the dotless script claim to do so to avoid unfair censorship by the social media companies, there are implications for it to be used for a variety of purposes to publish content that fails to comply with the social media platforms’ rules and policies.

While it is unlikely that the Arabic script will return to its seventh century dotless form in widespread and general usage, the rising occurrence of it on social media is likely to continue. Shorouk News, an Egyptian-managed Facebook page with 5.6 million followers, used the script in May 2021 to post a message calling for a general strike in all Palestine “from the river to the sea”. How effective this mode of communication will be at reaching mass audiences, or facilitating the organisation of events, remains to be seen.

Nevertheless, those trying to manage the risks posed by viral digital campaigns must now consider another potential threat. Until algorithmic trawling technology can handle the dotless Arabic script, the best means of preparedness is awareness of its existence and careful human-led analysis of heavily-circulated text.