Crowdsourcing as Human-Machine Translation
The use of crowdsourcing and text corpus in human-machine translation (HMT) within the last few years have become predominant in their area,[1] in comparison to solely using machine translation (MT). There have been a few recent academic journals looking into the benefits that using crowdsourcing as a translation technique could bring to the current approach to the task, and how it could help improve and make more efficient the current tools available to the public.
Contents
Crowdsourcing in translation
Technologies translated through crowdsourcing
Known as a distribution of Linux, the translation of this open-source system is often carried out by individual users who have a desire to use it in their mother tongue.
The service "Google in your own language" (GIYL)[2] was a project which was translated by both users and translation volunteers.[3]
In March 2008, through crowdsourcing, the entire site was translated into French within 24 hours by allegedly over 4,000 native French speakers.[4]
This language institute, based on the fictional language heard in Star Trek, provides a forum to its members for discussion and interaction with other fellow enthusiasts.
Partnered with TED (conference) as its technological partner, dotSUB developed a "browser based, one-stop, self contained system for creating and viewing subtitles for videos in multiple languages across all platforms, including web based, mobile devices, and transcription and video editing systems"[5]
- Worldwide Lexicon
An open source collaborative translation platform, Worldwide Lexicon (WWL) consider themselves to be "a translation memory, essentially a giant database of translations, which can be embedded in almost any website or web application"[6] through the use of a browser plugin.
Advantages
Crowdsourcing translation is considered to be highly efficient, thanks to their threefold advantages:
- Multilingual support
Through human or manual translation, there are no boundaries or limitations to the languages or dialects the source text can be translated into. Through crowdsourcing, the creation of a large base of translators with a large variety of native tongues provides the possibility for the original text to be translated into many different languages.
- Quick solution
If the text provided is submitted as an open source, the time in which it is translated could be within only a few minutes (if the text in question is relatively small). This is due to the large number of people who have access to the task. Despite varying levels of competency within the users, an accurate translation is usually reached due to the sheer number of participants that would be able to correct and overrule mistakes. However, communication between large amounts of people would be difficult to co-ordinate effectively.
- Monetary benefits
The company which implements the crowdsourcing is considered to be the main benefactor, due to the low cost of maintaining a crowdsourcing platform once it has been set up. Translators on open sources are not generally considered to be freelancers or professional translators; rather hobbyists who are willing to translate for free.
Challenges
- Technological boundaries
Crowdsourcing tends to only be effective to its fullest extent when employed on the internet. This renders groups of people who are not internet-savvy, or even without free, reliable access to the internet under-represented in crowdsourcing. Therefore valid and perhaps important dialects could be omitted from the results. Time zone barriers also play an important role in delay in the delivery time of the final product, and should be taken into consideration.
- Quality
Research into specifically the quality of Wikipedia[7] "concluded that adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not.".[3] The most important issue to take into account is the aforementioned unprofessionalism of the open source translators the text is released to, thus creating somewhat variable results.
- Motivation
Without a source of motivation, obtaining usable results from a crowdsourcing project is almost impossible. It is vital to create interest and enthusiasm within the group of people in order to maintain their commitment to the project. Therefore, rewards are often offered for the best contributions.
- Control
As the following for a project increases, the ability to control and manage the group decreases, leading to an unorganised and chaotic result, and can therefore be very time consuming and high cost.
Crowdsourcing vs. Machine Translation
The main difference between these two techniques is that crowdsourcing is human-generated translation, whilst MT is automated by a computer, although both share similarities. A concise table is available in Crowdsourcing as Human-Machine Translation by Anastasiou and Gupta:[3]
Crowdsourcing Translation | Machine Translation | |
---|---|---|
Start | 2006 | 1955 |
Output "engine" | Humans | Computer software |
Human involvement | Always | At revision |
Control | No | Yes |
Terminological consistency | No | Yes |
Source text | Uncontrolled | Controlled |
Speed | Less than MT | High |
Cost | Low implementation cost | Acquisition cost of commercial systems |
Quality | High | Low |
Profit | Company profits | MT user (single person or company) profits |
Future Prospects
Anastasious and Gupta believe that in the future, both advantages of using crowdsourcing and machines in translating will merge to form an efficient, cost-effective and high quality translation service.[3]
References
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 3.0 3.1 3.2 3.3 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.