WWWJDIC

Last updated • 1 min readFrom Wikipedia, The Free Encyclopedia

WWWJDIC is an online Japanese dictionary based on the electronic dictionaries compiled and collected by Australian academic Jim Breen. The main Japanese–English dictionary file (EDICT) contains over 180,000 [1] entries, and the ENAMDICT dictionary contains over 720,000 [1] Japanese surnames, first names, place names and product names. WWWJDIC also contains several specialized dictionaries covering topics such as life sciences, law, computing, engineering, etc.

Contents

For example, sentences with Japanese words, WWWJDIC makes use of a sentence database from the Tatoeba project, [2] largely based on the Tanaka Corpus. Unlike the original Tanaka Corpus, the sentences from the Tatoeba project are not public domain, but are available under the non-restrictive CC-BY license. The sentence collection contains over 150,000 sentence pairs in Japanese and English.

In addition to Japanese–English, the dictionary has Japanese paired with German, French, Russian, Hungarian, Swedish, Spanish and Dutch. However, currently there are no example sentences for these languages.

The dictionary is updated freely and may be copied under its own licence arrangements.

Several mirror sites of the main WWWJDIC also exist around the world. These sites update daily from the home site at the Electronic Dictionary Research and Development Group (EDRDG).

See also

Related Research Articles

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

James William Breen is a Research Fellow at Monash University in Australia, where he was a professor in the area of IT and telecommunications before his retirement in 2003. He holds a BSc in mathematics, an MBA and a PhD in computational linguistics, all from the University of Melbourne. He is well known for his involvement in several popular free Japanese-related projects: the EDICT and JMDict Japanese–English dictionaries, the KANJIDIC kanji dictionary, and the WWWJDIC portal which provides an interface to search them.

Parallel text Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

In corpus linguistics, part-of-speech tagging, also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Wenlin Software for Learning Chinese is a software application designed by Tom Bishop, who is also president of the Wenlin Institute. It is based on his experience of the needs of learners of the Chinese language, predominantly Mandarin. It contains a dictionary function, a corpus of Chinese texts, a function for reading and creating Chinese text files, and a flashcard function. By pointing the cursor at a Chinese character the software looks up an English word, and vice versa, working like a dictionary. The software recognizes files in Unicode, GB 2312, Big5, and HZ format.

FrameNet is a research and resource development project based at the International Computer Science Institute (ICSI) in Berkeley, California, which has produced an electronic resource based on a theory of meaning called frame semantics. The data that FrameNet has analyzed show that the sentence "John sold a car to Mary" essentially describes the same basic situation as "Mary bought a car from John", just from a different perspective. A semantic frame is a conceptual structure describing an event, relation, or object along with its participants. The FrameNet lexical database contains over 1,200 semantic frames, 13,000 lexical units and 202,000 example sentences. Charles J. Fillmore, who developed the theory of frame semantics which serves as the theoretical the basis of FrameNet, founded the project in 1997 and continued to lead the effort until he passed away in 2014. Frame Semantic theory and FrameNet have been influential in linguistics and natural language processing, where it led to the task of automatic Semantic Role Labeling.

Electronic dictionary Dictionary whose data exists in digital form and can be accessed through a number of different media

An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computers, mobile apps, web applications, and as a built-in function of E-readers. They may be free or require payment.

Japanese dictionaries have a history that began over 1300 years ago when Japanese Buddhist priests, who wanted to understand Chinese sutras, adapted Chinese character dictionaries. Present-day Japanese lexicographers are exploring computerized editing and electronic dictionaries. According to Nakao Keisuke (中尾啓介):

It has often been said that dictionary publishing in Japan is active and prosperous, that Japanese people are well provided for with reference tools, and that lexicography here, in practice as well as in research, has produced a number of valuable reference books together with voluminous academic studies. (1998:35)

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing. Or just indexing.

JWPce

JWPce is a simple Japanese-language text editor that runs on the Windows 95, ME, 2000, XP, NT, and CE platforms. It is designed for non-native speakers of Japanese who want to produce Japanese-language documents. Distributed under the terms of the GNU General Public License, JWPce is free software.

The Pennsylvania Sumerian Dictionary (PSD) is a project to compile a comprehensive dictionary of the Sumerian language. It is run out of the University of Pennsylvania's Museum of Archaeology and Anthropology and funded by both private donors and the National Endowment for the Humanities. The project began under the direction of Åke W. Sjöberg (1924–2014) and Erle Leichty in 1974 and was modeled on the Chicago Assyrian Dictionary, itself begun in 1921. In 1976 it received its first federal funds from the National Endowment for the Humanities, and in 1984 published its first section for the letter B; only 750 copies were originally printed, but more were soon published as the first batch sold out surprisingly quickly at US$40 a piece. As of 1989 Sjöberg was still project director, and despite retiring in 1996 continued to contribute.

<i>Oxford Dictionary of English</i> Single-volume completely new dictionary first published in 1998

The Oxford Dictionary of English (ODE) is a single-volume English dictionary published by Oxford University Press, first published in 1998 as The New Oxford Dictionary of English (NODE). The word "new" was dropped from the title with the Second Edition in 2003. This dictionary is not based on the Oxford English Dictionary (OED) and should not be mistaken for a new or updated version of the OED. It is a completely new dictionary which strives to represent as faithfully as possible the current usage of English words. The Revised Second Edition contains 355,000 words, phrases, and definitions, including biographical references and thousands of encyclopaedic entries. The Third Edition was published in August 2010, with some new words, including "vuvuzela".

Macmillan English Dictionary for Advanced Learners, also known as MEDAL, was first published in 2002 by Macmillan Education. MEDAL is an advanced learner’s dictionary and shares most of the features of this type of dictionary: it provides definitions in simple language, using a controlled defining vocabulary; most words have example sentences to illustrate how they are typically used; and information is given about how words combine grammatically or in collocations. MEDAL also introduced a number of innovations. These include:

Tatoeba

Tatoeba is a free collaborative online database of example sentences geared towards foreign language learners. Its name comes from the Japanese term "tatoeba" (例えば), meaning "for example". Unlike other online dictionaries, which focus on words, Tatoeba focuses on translation of complete sentences. In addition, the structure of the database and interface emphasizes one-to-many relationships. Not only can a sentence have multiple translations within a single language, but its translations into all languages are readily visible, as are indirect translations that involve a chain of stepwise links from one language to another.

JMdict is a large machine-readable multilingual Japanese dictionary. As of February 2021, it contained Japanese–English translations for around 191,000 entries, representing 267,000 unique headword-reading combinations. The dictionary files are free to use with attribution and have been widely adopted on the Internet and are used in many computer and smartphone applications. The project is considered a standard Japanese–English reference on the Internet and is used by the Unihan Database and several other Japanese–English projects.

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning.

Akiho is a Japanese given name and surname. According to WWWJDIC, there are more than a hundred different ways this name might be written in kanji.

References

  1. 1 2 Breen, Jim. "The EDICT Dictionary File" . Retrieved 18 June 2015.
  2. Example Sentence Management System