Adsotrans

Last updated

Adso is a Chinese to English dictionary and natural language processing engine for Chinese text. The Adso project started in 2001. Its gist translation and dictionary interface are online at the Adsotrans website, [1] where its software and database are also available for download. [2] These downloads include a copy of the "Adsotrans Attribution-NonCommercial License 1.1" [3] and an additional README that states "Free use may be made of the software for machine translation, hanzi-to-pinyin conversion and text segmentation purposes, provided that attribution is given, including a link to our project."

Chinese language family of languages

Chinese is a group of related, but in many cases not mutually intelligible, language varieties, forming the Sinitic branch of the Sino-Tibetan language family. Chinese is spoken by the Han majority and many minority ethnic groups in China. About 1.2 billion people speak some form of Chinese as their first language.

English language West Germanic language

English is a West Germanic language that was first spoken in early medieval England and eventually became a global lingua franca. Named after the Angles, one of the Germanic tribes that migrated to the area of Great Britain that would later take their name, England, both names ultimately deriving from the Anglia peninsula in the Baltic Sea. It is closely related to Frisian and Low Saxon, and its vocabulary has been significantly influenced by other Germanic languages, particularly Norse, and to a greater extent Latin and French.

README file that contains information about other files in a directory or archive

A README file contains information about other files in a directory or archive of computer software. A form of documentation, it is usually a simple plain text file called READ.ME, README.TXT, README.md, README.1ST – or simply README.

Contents

Content

With over 195,000 entries, Adso is the largest free Chinese–English dictionary compilation on the Internet. It differs from other projects in providing part of speech and ontological data on word entries, and in reviewing user contributions. Project data is generated collaboratively by users through an online dictionary hosted at Popup Chinese.

In traditional grammar, a part of speech is a category of words which have similar grammatical properties. Words that are assigned to the same part of speech generally display similar behavior in terms of syntax—they play similar roles within the grammatical structure of sentences—and sometimes in terms of morphology, in that they undergo inflection for similar properties.

The Adso software engine provides text segmentation, hanzi-to-pinyin, gist translation, annotation, gist extraction and semantic analysis services. It is heavily used as a translation aid for Chinese-English translation. Adso also supports a specially-defined XML language which customizes software output. This has made it useful as preprocessor for statistical machine translation software such as GIZA++ or for reverse-index search engines such as Lucene.

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation.

Related Research Articles

Chinese input methods for computers Chinese character text entry

Chinese input methods are methods that allow a computer user to input Chinese characters. Most, if not all, Chinese input methods fall into one of two categories: phonetic readings or root shapes. Methods under the phonetic category usually are easier to learn but are less efficient, thus resulting in slower typing speeds because they typically require users to choose from a list of phonetically similar characters for input; whereas methods under the root shape category allow very precise and speedy input but have a difficult learning curve because they often require a thorough understanding of a character's strokes and composition.

Corpus linguistics is the study of language as expressed in corpora (samples) of "real world" text. The text-corpus method is a digestive approach that derives a set of abstract rules that govern a natural language from texts in that language, and explores how that language relates to other languages. Originally derived manually, corpora now are automatically derived from source texts. Corpus linguistics proposes that reliable language analysis is more feasible with corpora collected in the field in its natural context ("realia"), and with minimal experimental-interference.

Baidu Chinese web services company

Baidu, Inc., incorporated on 18 January 2000, is a Chinese multinational technology company specializing in Internet-related services and products and artificial intelligence (AI), headquartered at the Baidu Campus in Beijing's Haidian District. It is one of the largest AI and internet companies in the world. The holding company of the group is incorporated in the Cayman Islands. Baidu was established in 2000 by Robin Li and Eric Xu. Baidu is currently ranked 4th overall in the Alexa Internet rankings.

<i>Gratis</i> versus <i>libre</i> distinction between concepts

The English adjective free is commonly used in one of two meanings: "for free" (gratis) and "with little or no restriction" (libre). This ambiguity of free can cause issues where the distinction is important, as it often is in dealing with laws concerning the use of information, such as copyright and patents.

Wenlin Software for Learning Chinese is a software application designed by Tom Bishop, who is also president of the Wenlin Institute. It is based on his experience of the needs of learners of the Chinese language, predominantly Mandarin. It contains a dictionary function, a corpus of Chinese texts, a function for reading and creating Chinese text files, and a flashcard function. By pointing the cursor at a Chinese character the software looks up an English word, and vice versa, working like a dictionary. The software recognizes files in Unicode, GB 2312, Big5, and HZ format.

The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.

Google Developers is Google's site for software development tools, application programming interfaces (APIs), and technical resources. The site contains documentation on using Google developer tools and APIs—including discussion groups and blogs for developers using Google's developer products.

LetterWise is a patented predictive text entry system keypads on handheld devices developed by Eatoni Ergonomics.

Babylon (software) computer dictionary and translation program

Babylon is a computer dictionary and translation program developed by the Israeli company Babylon Software Ltd. based in the city of Or Yehuda. The company was established in 1997 by the Israeli entrepreneur Amnon Ovadia. Its IPO took place ten years later. It is considered a part of Israel's Download Valley, a cluster of software companies monetizing "free" software downloads through adware. Babylon includes in-house proprietary dictionaries, as well as community-created dictionaries and glossaries. It is a tool used for translation and conversion of currencies, measurements and time, and for obtaining other contextual information. The program also uses a text-to-speech agent, so users hear the proper pronunciation of words and text. Babylon has developed 36 English-based proprietary dictionaries in 21 languages. In 2008–2009, Babylon reported earnings of 50 million NIS through its collaboration with Google.

General Architecture for Text Engineering

General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

Zhuyin Fuhao, Zhuyin, Bopomofo (ㄅㄆㄇㄈ) or Mandarin Phonetic Symbols is the major Chinese transliteration system for Taiwanese Mandarin. It is also used to transcribe other varieties of Chinese, particularly other varieties of Standard Chinese and related Mandarin dialects, as well as Taiwanese Hokkien.

3DSlicer

3D Slicer (Slicer) is a free and open source software package for image analysis and scientific visualization. Slicer is used in a variety of medical applications, including autism, multiple sclerosis, systemic lupus erythematosus, prostate cancer, schizophrenia, orthopedic biomechanics, COPD, cardiovascular disease and neurosurgery.

PowerWord is a collection of Chinese, English and bilingual dictionaries and supporting proprietary software, published on CD-ROM in China by Kingsoft, which claims to have 20 million users including 50,000 organisations. Originally produced for the Microsoft Windows platform, it is now available for Mac OS X, iPhone, Java for Nokia smartphones, and is available online. The CD-ROM often prominently carries the label "CIBA" as well as Chinese characters.

Youdao (有道) is a search engine released by Chinese internet company NetEase (網易) in 2007. It is the featured search engine of its parent company's web portal, 163.com, and lets users search for web pages, images, news, music, blogs, Chinese-to-English dictionary entries, and more.

Ultralingua is a single-click and drag-and-drop multilingual translation dictionary, thesaurus, and language reference utility. The full suite of Ultralingua language tools is available free online without the need for download and installation. As well as its online products, the developer offers premium downloadable language software with extended features and content for Macintosh and Windows computer platforms, smartphones, and other hand held devices.

Wikidot Inc. is a Polish wiki hosting corporation which owns, operates and supports the community of wiki-based web projects at Wikidot.com, a social networking service and wiki hosting service, developed in Toruń, Poland. Wikidot.com was launched on August 1, 2006 and in 2009 it was the world's third-largest wiki farm, with 3,000,000 users running 150,000 sites with 61 million pages of user-created content. Wikidot.com grows by about 3000-4000 new users each day. Wikidot.com roughly doubled in size during 2011.

The Lexham English Bible (LEB) is an online bible released by Logos Bible Software. The New Testament was published in October 2010 and has an audio narration spoken by Marv Allen. It lists as General Editor W. Hall Harris, III. The Old Testament translation was completed in 2011.

JMdict is a large machine-readable multilingual Japanese dictionary. As of July 2018, it contained Japanese–English translations for over 180,000 entries, representing more than 205,000 unique headword–reading combinations. Because the dictionary files are free to use, they have been widely adopted on the Internet and are used in many computer and smartphone applications. This project is considered a standard Japanese–English reference on the Internet and is used by the Unihan Database and several other Japanese–English projects.

The Alpheios Project is an open source initiative originally focused on developing software to facilitate reading Latin and ancient Greek. Dictionaries, grammars and inflection tables were combined in a set of web-based tools to provide comprehensive reading support for scholars, students and independent readers. The tools were implemented as browser add-ons so that they could be used on any web site or any page that a user might create in Unicoded HTML.

References

  1. http://adsotrans.com
  2. http://adsotrans.com/downloads
  3. https://github.com/wtanaka/adso/blob/master/license/ADSO.license.html