FB’s Language Gaps Weaken Screening Of Hate, Terrorism

Internal company documents from the former Facebook product manager-turned-whistleblower Frances Haugen show the problems are far more systemic than just a few innocent mistakes, and that Facebook has understood the depth of these failings for years while doing little about it

As the Gaza war raged and tensions surged across the Middle East last May, Instagram briefly banned the hashtag #AlAqsa, a reference to the Al-Aqsa Mosque in Jerusalem’s Old City, a flashpoint in the conflict.

Facebook, which owns Instagram, later apologised, explaining its algorithms had mistaken the third-holiest site in Islam for the militant group Al-Aqsa Martyrs’ Brigade, an armed offshoot of the secular Fatah party.

For many Arabic-speaking users, it was just the latest potent example of how the social media giant muzzles political speech in the region. Arabic is among the most common languages on Facebook’s platforms, and the company issues frequent public apologies after similarly botched content removals.

Now, internal company documents from the former Facebook product manager-turned-whistleblower Frances Haugen show the problems are far more systemic than just a few innocent mistakes, and that Facebook has understood the depth of these failings for years while doing little about it.

Such errors are not limited to Arabic. An examination of the files reveals that in some of the world’s most volatile regions, terrorist content and hate speech proliferate because the company remains short on moderators who speak local languages and understand cultural contexts. And its platforms have failed to develop artificial intelligence solutions that can catch harmful content in different languages.

In countries like Afghanistan and Myanmar, these loopholes have allowed inflammatory language to flourish on the platform, while in Syria and the Palestinian territories, Facebook suppresses ordinary speech, imposing blanket bans on common words.

“The root problem is that the platform was never built with the intention it would one day mediate the political speech of everyone in the world,” said Eliza Campbell, director of the Middle East Institute’s Cyber Program. “But for the amount of political importance and resources that Facebook has, moderation is a bafflingly under-resourced project.”

This story is based on Haugen’s disclosures to the Securities and Exchange Commission, which were also provided to the US Congress in redacted form by her legal team. The redacted versions were reviewed by a consortium of news organizations, including The Associated Press.

In a statement to the AP, a Facebook spokesperson said that over the last two years, the company has invested in recruiting more staff with local dialect and topic expertise to bolster its review capacity around the world.

But when it comes to Arabic content moderation, the company said, “We still have more work to do…We conduct research to better understand this complexity and identify how we can improve.”

In Myanmar, where Facebook-based misinformation has been linked repeatedly to ethnic and religious violence, the company acknowledged in its internal reports that it had failed to stop the spread of hate speech targeting the minority Rohingya Muslim population.

The Rohingya’s persecution, which the US has described as ethnic cleansing, led Facebook to publicly pledge in 2018 that it would recruit 100 native Myanmar language speakers to police its platforms. But the company never disclosed how many content moderators it ultimately hired or revealed which of the nation’s many dialects they covered.

Despite Facebook’s public promises and many internal reports on the problems, the rights group Global Witness said the company’s recommendation algorithm continued to amplify army propaganda and other content that breaches the company’s Myanmar policies following a military coup in February.

Arabic poses particular challenges to Facebook’s automated systems and human moderators, each of which struggles to understand spoken dialects unique to each country and region, their vocabularies salted with different historical influences and cultural contexts.

The Moroccan colloquial Arabic, for instance, includes French and Berber words and is spoken with short vowels. Egyptian Arabic, on the other hand, includes some Turkish from the Ottoman conquest. Other dialects are closer to the official version found in the Quran. In some cases, these dialects are not mutually comprehensible, and there is no standard way of transcribing colloquial Arabic.

Facebook first developed a massive following in the Middle East during the 2011 Arab Spring uprisings, and users credited the platform with providing a rare opportunity for free expression and a critical source of news in a region where autocratic governments exert tight controls over both. But in recent years, that reputation has changed.

Scores of Palestinian journalists and activists have had their accounts deleted. Archives of the Syrian civil war have disappeared. And a vast vocabulary of everyday words has become off-limits to speakers of Arabic, Facebook’s third-most-common language with millions of users worldwide.

Criticism, satire and even simple mentions of groups on the company’s Dangerous Individuals and Organizations list a docket modelled on the US government equivalent are grounds for a takedown.

“We were incorrectly enforcing counter-terrorism content in Arabic,” one document reads, noting the current system limits users from participating in political speech, impeding their right to freedom of expression.”

In response to questions from the AP, Facebook said it consults independent experts to develop its moderation policies and goes to great lengths to ensure they are agnostic to religion, region, political outlook or ideology.”

“We know our systems are not perfect,” it added.

The company’s language gaps and biases have led to the widespread perception that its reviewers skew in favour of governments and against minority groups.

Facebook said in a statement that it fields takedown requests from governments no differently than those from rights organizations or community members, although it may restrict access to content based on local laws. “Any suggestion that we remove content solely under pressure from the Israeli government is completely inaccurate,” it said.

In Afghanistan, many users literally cannot understand Facebook’s rules. According to an internal report in January, Facebook did not translate the site’s hate speech and misinformation pages into Dari and Pashto, the two most common languages in Afghanistan, where English is not widely understood. (PTI)