Center for the Evaluation of Language and Communication Technologies

Last updated

The Center for the Evaluation of Language and Communication Technologies (CELCT) was an organisation devoted to the evaluation of language technologies, located in Povo, Trento (Italy).

Contents

CELCT was established in 2003 by FBK (Fondazione Bruno Kessler) and DFKI (Deutsches Forschungszentrum für Künstliche Intelligenz), and was funded by the Autonomous Province of Trento. The goals of CELCT were "to set up infrastructures and develop skills in order to operate successfully in the field of the evaluation of language and communication technologies, becoming a reference point in the field at the national and European levels." [1] CELCT interpreted its mission by carrying out several activities in the field of HLT evaluation, mainly focusing on the organization of national and international evaluation campaigns and on the creation of speech and text corpora in different languages and at different linguistic annotation levels. [1]

CELCT's activities were closed on December 31, 2013. The staff working at CELCT at the time of its closure is continuing their research activities within FBK. [2]

European projects

Other projects

Evaluation campaigns

CELCT was involved in the following initiatives devoted to the evaluation of Natural Language Processing tools, collaborating with various organizations and networks of excellence both at the national and international level:

Publications

CELCT produced a number of scientific publications in all its activity fields. [27] [28] [29]

Related Research Articles

WordNet Computational lexicon of English

WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. WordNet was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website.

Trento Comune in Trentino-Alto Adige/Südtirol, Italy

Trento is a city on the Adige River in Trentino-Alto Adige/Südtirol in Italy. It is the capital of the autonomous province of Trento. In the 16th century, the city was the location of the Council of Trent. Formerly part of Austria and Austria-Hungary, it was annexed by Italy in 1919. With almost 120,000 inhabitants, Trento is the third largest city in the Alps and second largest in the Tyrol.

Wiktionary Free online dictionary that anyone can edit

Wiktionary is a multilingual, web-based project to create a free content dictionary of terms in all natural languages and a number of artificial languages. These entries may contain definitions, pronunciation guides, inflections, usage examples, related terms, and images for illustration, among other features. It is collaboratively edited via a wiki. Its name is a portmanteau of the words wiki and dictionary. It is available in 171 languages and in Simple English. Like its sister project Wikipedia, Wiktionary is run by the Wikimedia Foundation, and is written collaboratively by volunteers, dubbed "Wiktionarians". Its wiki software, MediaWiki, allows almost anyone with access to the website to create and edit entries.

Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. The term "cross-language information retrieval" has many synonyms, of which the following are perhaps the most frequent: cross-lingual information retrieval, translingual information retrieval, multilingual information retrieval. The term "multilingual information retrieval" refers more generally both to technology for retrieval of multilingual collections and to technology which has been moved to handle material in one language to another. The term Multilingual Information Retrieval (MLIR) involves the study of systems that accept queries for information in various languages and return objects of various languages, translated into the user's language. Cross-language information retrieval refers more specifically to the use case where users formulate their information need in one language and the system retrieves relevant documents in another. To do so, most CLIR systems use various translation techniques. CLIR techniques can be classified into different categories based on different translation resources:

English Wikipedia English-language edition of the free online encyclopedia

The English Wikipedia is the English-language edition of the free online encyclopedia Wikipedia. Founded on 15 January 2001, it is the first edition of Wikipedia and, as of April 2019, has the most articles of any edition. As of July 2020, 11% of articles in all Wikipedias belong to the English-language edition. This share has gradually declined from more than 50 percent in 2003, due to the growth of Wikipedias in other languages. As of 17 July 2020, there are 6,123,223 articles on the site, having surpassed the 6 million mark on 23 January 2020. In August 2019, the total volume of the compressed texts of the English Wikipedia's articles amounted to 16.1 gigabytes.

Wikisource Wikimedia project, an online digital library of free content textual sources on a wiki

Wikisource is an online digital library of free-content textual sources on a wiki, operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole and the name for each instance of that project ; multiple Wikisources make up the overall project of Wikisource. The project's aim is to host all forms of free text, in many languages, and translations. Originally conceived as an archive to store useful or important historical texts, it has expanded to become a general-content library. The project officially began in November 24, 2003 under the name Project Sourceberg, a play on the famous Project Gutenberg. The name Wikisource was adopted later that year and it received its own domain name seven months later.

The Common European Framework of Reference for Languages: Learning, Teaching, Assessment, abbreviated in English as CEFR or CEF or CEFRL, is a guideline used to describe achievements of learners of foreign languages across Europe and, increasingly, in other countries. It was put together by the Council of Europe as the main part of the project "Language Learning for European Citizenship" between 1989 and 1996. Its main aim is to provide a method of learning, teaching and assessing which applies to all languages in Europe. In November 2001, a European Union Council Resolution recommended using the CEFR to set up systems of validation of language ability. The six reference levels are becoming widely accepted as the European standard for grading an individual's language proficiency.

Trentino Autonomous province of Italy

Trentino, officially the Autonomous Province of Trento, is an autonomous province of Italy, in the country's far north. The Trentino and South Tyrol constitute the region of Trentino-Alto Adige/Südtirol, an autonomous region under the constitution. The province is composed of 177 comuni (municipalities). Its capital is the city of Trento. The province covers an area of more than 6,000 km2 (2,300 sq mi), with a total population of 541,098 in 2019. Trentino is renowned for its mountains, such as the Dolomites, which are part of the Alps.

Multilingualism Use of multiple languages

Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all Europeans claim to speak at least one language other than their mother tongue; but many read and write in one language. Always useful to traders, multilingualism is advantageous for people wanting to participate in globalization and cultural openness. Owing to the ease of access to information facilitated by the Internet, individuals' exposure to multiple languages is becoming increasingly possible. People who speak several languages are also called polyglots.

UniCredit Italian global banking and financial services company

UniCredit S.p.A. is an Italian global banking and financial services company. Its network spans 50 markets in 17 countries, with more than 8,500 branches and over 97,775 employees. Its strategic position in Western and Eastern Europe gives the group one of the continent's highest market shares.

Wil van der Aalst Dutch computer scientist and professor

Willibrordus Martinus Pancratius van der Aalst is a Dutch computer scientist and full professor at RWTH Aachen University, leading the Process and Data Science (PADS) group. His research and teaching interests include information systems, workflow management, Petri nets, process mining, specification languages, and simulation. He is also known for his work on workflow patterns.

LGTE

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

<i>Russia Beyond</i> Russian state news agency

Russia Beyond is a multilingual publication operated by "autonomous non-profit organization TV-Novosti," offering news, comment, opinion and analysis on culture, politics, business, science and public life in Russia.

The Conference and Labs of the Evaluation Forum, or CLEF, is an organization promoting research in multilingual information access. Its specific functions are to maintain an underlying framework for testing information retrieval systems and to create repositories of data for researchers to use in developing comparable standards. The organization holds a conference every September in Europe since a first constituting workshop in 2000. From 1997 to 1999, TREC, the similar evaluation conference organised annually in the USA, included a track for the evaluation of Cross-Language IR for European languages. This track was coordinated jointly by NIST and by a group of European volunteers that grew over the years. At the end of 1999, a decision by some of the participants was made to transfer the activity to Europe and set it up independently. The aim was to expand coverage to a larger number of languages and to focus on a wider range of issues, including monolingual system evaluation for languages other than English. Over the years, CLEF has been supported by a number of various EU funded projects and initiatives.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively. Textual entailment is not the same as pure logical entailment — it has a more relaxed definition: "t entails h" if, typically, a human reading t would infer that h is most likely true. The relation is directional because even if "t entails h", the reverse "h entails t" is much less certain.

Judit Kormos Hungarian linguist

Judit Kormos is a Hungarian-born British linguist. She is a professor and the Director of Studies for the MA TESOL Distance programme at the Department of Linguistics and English Language at Lancaster University, United Kingdom. She is renowned for her work on motivation in second language learning, and self-regulation in second language writing. Her current interest is in dyslexia in second language learning.

MateCat tool

MateCat is a web-based computer-assisted translation (CAT) tool, of which there are several on the current market. MateCat is released as open source software under the Lesser General Public License (LGPL) from the Free Software Foundation.

Osservatorio Balcani e Caucaso Transeuropa Italian think tank

Osservatorio Balcani e Caucaso Transeuropa is a think tank and online newspaper based in Trento, Italy, and specialised on South East Europe.

ARLeF - Agjenzie Regjonâl pe Lenghe Furlane is a public body of the Autonomous Region of Friuli-Venezia Giulia that coordinates activities involving the safeguarding and promotion of the Friulian language across the regional territory. It plays a key role in the implementation of the legislation on the Friulian language, which comprises "Regulations on the protection of historical language minorities" [State Law]. Act No. 482 of December 15, 1999., "Rules for the protection and promotion of the Friulian language and culture and establishment of a service for regional and minority languages" [Regional Law]. Act No. 15 of March 22, 1996. as well as "Rules for the protection, valorisation and promotion of the Friulian language" [Regional Law]. Act No. 29 of December 18, 2007.

References

  1. 1 2 "CELCT web site". Archived from the original on 18 February 2014. Retrieved 10 January 2014.
  2. "Fondazione Bruno Kessler website" . Retrieved 10 January 2014.
  3. "TOSCA-MP European project website" . Retrieved 18 January 2014.
  4. "Deliverable D2.1 of TOSCA-MP project" (PDF). Retrieved 10 January 2014.
  5. "Deliverable D2.2 of TOSCA-MP project" (PDF). Retrieved 10 January 2014.
  6. "EXCITEMENT European project website" . Retrieved 18 January 2014.
  7. "PROMISE website" . Retrieved 18 January 2014.
  8. "Partners of PROMISE" . Retrieved 13 January 2014.
  9. "Euromatrix European project website". Archived from the original on 2016-09-05. Retrieved 2014-01-17.
  10. "LiveMemories website" . Retrieved 26 February 2014.
  11. "Ontotext project website" . Retrieved 18 January 2014.
  12. "Law Making Environment project website". Archived from the original on 4 March 2016. Retrieved 18 January 2014.
  13. "DUC 2005 website".
  14. "CLEF web site". Archived from the original on 31 January 2012. Retrieved 10 January 2014.
  15. "RTE challenges website" . Retrieved 18 January 2014.
  16. "IWSLT2006 website". Archived from the original on 2013-02-02. Retrieved 18 January 2014.
  17. "Organizers of IWSLT 2006". Archived from the original on 2013-02-04. Retrieved 10 January 2014.
  18. "IWSLT2007 website" . Retrieved 18 January 2014.
  19. "Organizers of IWSLT 2007" . Retrieved 10 January 2014.
  20. "IWSLT2011 website" . Retrieved 18 January 2014.
  21. "Organizers of IWSLT 2011 evaluation campaign". Archived from the original on 30 January 2014. Retrieved 10 January 2014.
  22. "IWSLT2012 website" . Retrieved 18 January 2014.
  23. "MT Track organizers of IWSLT 2012" . Retrieved 10 January 2014.
  24. "Evalita website" . Retrieved 18 January 2014.
  25. "CLTE@Semeval-2012 website" . Retrieved 18 January 2014.
  26. "CLTE@Semeval-2013 website" . Retrieved 22 January 2014.
  27. "Complete list of CELCT publications". Archived from the original on 27 June 2014. Retrieved 18 January 2014.
  28. "List of CELCT publications in the ACL Anthology Network". Archived from the original on 4 March 2016. Retrieved 18 January 2014.
  29. "List of CELCT publications in the ACL Anthology Searchbench" . Retrieved 18 January 2014.