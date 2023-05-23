Facebook owner Meta Platforms has built an artificial intelligence (AI) language model that it says can recognise over 4,000 spoken languages and “speak” in over 1,100.

Dubbed the Massively Multilingual Speech (MMS) project, Meta said it is open-sourcing the language model to preserve language diversity.

“Today, we are publicly sharing our models and code so that others in the research community can build upon our work,” Meta stated.

“Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world.”

Meta said its MMS project would also help researchers to build on the foundation of languages in danger of becoming extinct.

These languages presented a challenge, as they aren’t widely spoken in industrialised nations, with very little data available to train the language model.

To overcome this, Meta tapped into audio recordings of translated religious texts.

“We turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” it said.

“These translations have publicly available audio recordings of people reading these texts in different languages.”

Meta noted that while the language model is trained mainly on religious content, its analyses show that it does not bias the model.

“We believe this is because we use a connectionist temporal classification (CTC) approach, which is far more constrained compared with large language models (LLMs) or sequence-to-sequence models for speech recognition,” it added.

It also noted that, despite most recordings featuring male speakers, the language performs equally well in producing male and female voices.

Meta said it outperforms existing models like OpenAI’s Whisper — the company’s speech recognition system.

“We found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages,” it said.