Telecom company Veon, mobile operator Beeline Kazakhstan, the Barcelona Supercomputing Center and the GSMA lobby group said on Wednesday they would work together to bridge an “AI language gap” for under-represented languages.
Large language models powering ‘bots’ like chatGPT often rely on swathes of online data, such as digital books, websites, articles and blogs to learn how to generate human-like responses. But data and resources in some languages are limited.
“Out of nearly 7000 languages spoken around the globe, only seven are considered high-resource languages in the digital world: English, Spanish, French, Mandarin, Arabic, German and Japanese,” the groups said in a joint statement.
They will collaborate on developing tools and language model documentation in under-represented languages, including those spoken in the countries where Veon operates — Pakistan, Ukraine, Bangladesh, Kazakhstan, Uzbekistan, and Kyrgyzstan.
Another language was Catalan, which is spoken by around 10 million people, the statement said.
“The lack of resources in other languages results in an AI language gap which leads to sub-optimal user experience in AI applications, deepens the bias in AI models and risks deepening the digital divide in AI technologies,” they added.