Scientists Determine: Majority of Internet Text is Poorly Translated AI Trash

A new study reveals that a significant portion of online content has been badly translated, mostly by machine, in languages spoken in Africa and the Global South.

25 January 2024

Image by CyberBeat

A startling discovery has been made by researchers: a significant portion of online content is badly translated by machines. This is especially true for languages spoken in Africa and the Global South.

The study conducted by the Amazon Web Services AI lab has raised serious concerns about the training of large language models. The researchers discovered that over 57% of the sentences on the web have been poorly translated into two or more languages, resulting in a significant amount of machine-translated garbage.

To conduct the study, the researchers collected 6.38 billion sentences from the web. They observed patterns of multi-way parallelism, where sets of sentences are translations of each other in three or more languages. Surprisingly, a majority of the internet is composed of translated content, with 57.1% of the sentences in the corpus being multi-way parallel translations.

The quality of these translations varies greatly, as machine learning is influenced by human biases and tends to favor languages spoken in the Western world and the Global North. This poses challenges for "low-resource" languages spoken in places like Africa, where insufficient training data is available for accurate translations.

- CyberBeat

Latest News

<< Back to News

08 May 2025

2025 Australian Federal Election - Digital Sovereignty and Human Rights
24 April 2025

Protect Your Digital Rights: Secure Your Data from Overreach Today
17 April 2025

Unveiling the Mask: 'Careless People' Exposes the Hidden World of Facebook's Power Struggles
10 April 2025

-->

Reference

https://www.vice.com/en/article/y3w4gw/a-shocking-amount-of-the-web-is-already-ai-translated-trash-scientists-determine?

Scientists Determine: Majority of Internet Text is Poorly Translated AI Trash

Latest News

2025 Australian Federal Election - Digital Sovereignty and Human Rights
24 April 2025

Protect Your Digital Rights: Secure Your Data from Overreach Today
17 April 2025

Unveiling the Mask: 'Careless People' Exposes the Hidden World of Facebook's Power Struggles
10 April 2025

Reference

About CyberBeat

Contact CyberBeat

Terms & Policies >>

Sponsors

Scientists Determine: Majority of Internet Text is Poorly Translated AI Trash

Latest News

2025 Australian Federal Election - Digital Sovereignty and Human Rights 24 April 2025

Protect Your Digital Rights: Secure Your Data from Overreach Today 17 April 2025

Unveiling the Mask: 'Careless People' Exposes the Hidden World of Facebook's Power Struggles 10 April 2025

Reference

About CyberBeat

Contact CyberBeat

Terms & Policies >>

Sponsors

2025 Australian Federal Election - Digital Sovereignty and Human Rights
24 April 2025

Protect Your Digital Rights: Secure Your Data from Overreach Today
17 April 2025

Unveiling the Mask: 'Careless People' Exposes the Hidden World of Facebook's Power Struggles
10 April 2025