However, if you try to talk to your phone in Yoruba or Igbo, or any other widely spoken African language, you will find problems that could prevent you from accessing information, trade, personal communication, and customer service, as well as the benefits of the global technology economy.
“We are approaching the point where if an machine doesn’t comprehend your language, it will be as if it never existed,” Vukosi Marivate (chief of data science at University of Pretoria, South Africa) said in a call for action prior to a December virtual gathering of artificial intelligence researchers around the world.
American tech giants haven’t had a good track record in making their language technology work outside of the richest markets. This has made it difficult for them to spot dangerous misinformation on their platforms.
Marivate is one of many African researchers working together to address this problem. One of their projects was to find out if machine translation tools could properly translate online COVID-19 survey results from English into many African languages.
Marivate stated in an interview that “most people want to be in a position to interact with other information highways in their native language.” Masakhane is a pan-African research group that aims to improve the representation of dozens of languages in natural language processing, an AI branch. This is the largest of many grassroots language technology projects, which have been started from the Andes to Sri Lanka.
Tech companies offer their products in many languages but don’t always consider the details that make those apps work in real life. The problem is partly that the AI systems can’t access enough data online in these languages, including medical terms and scientific terms, to be able to understand them.
Google, for example, insulted members of the Yoruba Community several years back when its language app mistranslated Esu as the devil. Esu is a benevolent trickster God. Facebook’s language misinterpretations are due to political strife in the world and its inability stop harmful misinformation about COVID-19 vaccinations. Jokes online memes have replaced mundane translation errors with humorous ones.
Omolewa Adedipe is frustrated by her inability to post her thoughts on Twitter using Yoruba because her automatically translated tweets often have different meanings.
The 25-year old content designer tweeted “T’Iluo ba dun T’Iluo ba t’oro.” Eyin bi ese” means “If the country (or land, in this context), is not peaceful or merry you’re responsible.” Twitter managed to translate the tweet: “If your not happy, if are you not happy.”
These accent marks, often used in conjunction with tones, make a big difference when communicating complex Nigerian languages such as Yoruba. For example, ‘Ogun’ is a Yoruba word meaning war. However, it can also refer to a state in Nigeria (Ogun), God of Iron (Ogun), God of Sta (Ogun), or Property (Ogun).
Marivate said that some biases are deliberate due to our history. He has devoted much of his AI research towards the southern African languages of Xitsonga, Setswana, and the common conversational practice “code-switching”, which allows for the switching of languages.
He said, “The history of Africa and colonized countries is that language was only translated in very specific ways.” The colonizing country may have been concerned that people might communicate with each other and write books about revolutions or insurrections. Therefore, you were forbidden from writing a general text in any language. They would accept religious texts.
Google and Microsoft are two of the companies that claim they are working to improve technology for “low-resource languages” that AI systems lack sufficient data. Meta, the former Facebook company, has announced a major breakthrough in the quest for a universal translator that can translate multiple languages simultaneously and works better with lower-resource languages like Hausa or Icelandic.
This is a significant step but, at the moment, only large tech firms and AI labs in developed nations can create these models. David Ifeoluwa Adelani said. He is a researcher at Saarland Universität in Germany, and another member Masakhane. The mission of Masakhane is to support and encourage African-led research that addresses technology “that doesn’t understand our names and cultures, our places and our history.”
It takes more than just data. You also need to have human review by native speakers, who are often underrepresented in the global tech workforce. Independent researchers may not be able to access the necessary computing power.
Kola Tubosun, a writer and linguist, created a multimedia dictionary and text-to-speech machines for Yoruba. To help those who need to write short sentences or passages, he is currently working on speech recognition technology for Nigeria’s other major languages, Hausa, and Igbo.
He said, “We are financing ourselves.” “The goal is to prove that these things can be financially profitable.”
Tubosun was the leader of the team that developed Google’s “Nigerian English”, voice and accent for tools such as maps. He said that it is still difficult to raise enough money to develop technology that would allow farmers to use voice-based tools to track weather and market trends.
Remy Muhire, a Rwandan software engineer, is working to create a new open-source speech database for Kinyarwanda. This involves many volunteers recording themselves reading Kinyarwanda newspaper article and other texts.
“They are native speakers. They speak the language,” Muhire, a Mozilla fellow and creator of Firefox, said. A collaboration was made with a government-supported smartphone application that answers questions about COVID-19. Masakhane researchers also tap into news sources from across Africa to improve AI systems in different African languages. This includes Voice of America’s Hausa and BBC Igbo broadcasts.
People are becoming more open to developing their own language approaches, rather than waiting for elite institutions to solve their problems. Damian Blasi, who studies linguistic diversity at Harvard Data Science Initiative, stated that this trend is growing.
Blasi was the co-author of a new study that examined the uneven development in language technology among the more than 6,000 languages around the globe. It found that Swahili and Dutch have millions of speakers each, but there are only about 20 scientific reports in East Africa on natural language processing.