Information extraction

Last updated November 05, 2024

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP).^[1] Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context.

Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR)^[3] has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template.

History

Information extraction dates back to the late 1970s in the early days of NLP.^[4] An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders.^[5]

Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference^[6] that focused on the following domains:

MUC-1 (1987), MUC-3 (1989): Naval operations messages.
MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.
MUC-5 (1993): Joint ventures and microelectronics domain.
MUC-6 (1995): News articles on management changes.
MUC-7 (1998): Satellite launch reports.

Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism.^{[ citation needed ]}

Present significance

The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents^[7] and advocates that more of the content be made available as a web of data.^[8] Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted.^[9]

Tasks and subtasks

Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include:

Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack.
- Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks.
Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama)
- Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences.^[10] Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about.
- Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith".
- Relationship extraction: identification of relations between entities,^[10] such as:
  - PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.")
  - PERSON located in LOCATION (extracted from the sentence "Bill is in France.")
Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as:
- Table extraction: finding and extracting tables from documents.^[11]^[12]
- Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction.^[11]^[13]^[14]
- Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences
Language and vocabulary analysis
- Terminology extraction: finding the relevant terms for a given corpus
Audio extraction
- Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance ^[15] time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece.

Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE.

IE on non-text documents is becoming an increasingly interesting topic^{[ when? ]} in research, and information extracted from multimedia documents can now^{[ when? ]} be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources.

World Wide Web applications

IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people to cope with the enormous amount of data that are available online. Systems that perform IE from online text should meet the requirements of low cost, flexibility in development and easy adaptation to new domains. MUC systems fail to meet those criteria. Moreover, linguistic analysis performed for unstructured text does not exploit the HTML/XML tags and the layout formats that are available in online texts. As a result, less linguistically intensive approaches have been developed for IE on the Web using wrappers, which are sets of highly accurate rules that extract a particular page's content. Manually developing wrappers has proved to be a time-consuming task, requiring a high level of expertise. Machine learning techniques, either supervised or unsupervised, have been used to induce such rules automatically.

Wrappers typically handle highly structured collections of web pages, such as product catalogs and telephone directories. They fail, however, when the text type is less structured, which is also common on the Web. Recent effort on adaptive information extraction motivates the development of IE systems that can handle different types of text, from well-structured to almost free text -where common wrappers fail- including mixed types. Such systems can exploit shallow natural language knowledge and thus can be also applied to less structured texts.

A recent^{[ when? ]} development is Visual Information Extraction,^[16]^[17] that relies on rendering a webpage in a browser and creating rules based on the proximity of regions in the rendered web page. This helps in extracting entities from complex web pages that may exhibit a visual pattern, but lack a discernible pattern in the HTML source code.

Approaches

The following standard approaches are now widely accepted:

Hand-written regular expressions (or nested group of regular expressions)
Using classifiers
- Generative: naïve Bayes classifier
- Discriminative: maximum entropy models such as Multinomial logistic regression
Sequence models
- Recurrent neural network
- Hidden Markov model
- Conditional Markov model (CMM) / Maximum-entropy Markov model (MEMM)
- Conditional random fields (CRF) are commonly used in conjunction with IE for tasks as varied as extracting information from research papers^[18] to extracting navigation instructions.^[19]

Numerous other approaches exist for IE including hybrid approaches that combine some of the standard approaches previously listed.

Free or open source software and services

General Architecture for Text Engineering (GATE) is bundled with a free Information Extraction system
Apache OpenNLP is a Java machine learning toolkit for natural language processing
OpenCalais is an automated information extraction web service from Thomson Reuters (Free limited version)
Machine Learning for Language Toolkit (Mallet) is a Java-based package for a variety of natural language processing tasks, including information extraction.
DBpedia Spotlight is an open source tool in Java/Scala (and free web service) that can be used for named entity recognition and name resolution.
Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language
See also CRF implementations

Related Research Articles

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005), there are three perspectives of text mining: information extraction, data mining, and knowledge discovery in databases (KDD). Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Automatic summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data.

Named-entity recognition (NER) (also known as (named)entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques.

General Architecture for Text Engineering (GATE) is a Java suite of natural language processing (NLP) tools for man tasks, including information extraction in many languages. It is now used worldwide by a wide community of scientists, companies, teachers and students. It was originally developed at the University of Sheffield beginning in 1995.

A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.

Data extraction is the act or process of retrieving data out of data sources for further data processing or data storage. The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow.

In information extraction, a named entity is a real-world object, such as a person, location, organization, product, etc., that can be denoted with a proper name. It can be abstract or have a physical existence. Examples of named entities include Barack Obama, New York City, Volkswagen Golf, or anything else that can be named. Named entities can simply be viewed as entity instances.

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution. These tasks are usually required to build more advanced text processing services.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context, and negated/not negated.

The following outline is provided as an overview of and topical guide to natural-language processing:

In natural language processing, Entity Linking, also referred to as named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities mentioned in text. For example, given the sentence "Paris is the capital of France", the main idea is to first identify "Paris" and "France" as named entities, and then to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris" and "France" to the french country. The Entity Linking task is composed of 3 subtasks. First, Named Entity Recognition, which consist in the extraction of named entities from a text. Second, for each named entity, the objective is to generate candidates from a Knowledge Base. We call this step candidate generation. The main challenge being that we want to get the corresponding entity inside the candidates set. Lastly, the objective is to choose from the candidate set the correct entity. We call this step disambiguation.

NetOwl is a suite of multilingual text and identity analytics products that analyze big data in the form of text data – reports, web, social media, etc. – as well as structured entity data about people, organizations, places, and things.

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages. The library is built on top of Apache Spark and its Spark ML library.

Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP). These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review.

References

↑ name=Kariampuzha2023 Kariampuzha, William; Alyea, Gioconda; Qu, Sue; Sanjak, Jaleal; Mathé, Ewy; Sid, Eric; Chatelaine, Haley; Yadaw, Arjun; Xu, Yanji; Zhu, Qian (2023). "Precision information extraction for rare disease epidemiology at scale". Journal of Translational Medicine. 21 (1): 157. doi: 10.1186/s12967-023-04011-y . PMC 9972634 . PMID 36855134.
↑ Christina Niklaus, Matthias Cetto, André Freitas, and Siegfried Handschuh. 2018. A Survey on Open Information Extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3866–3878, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
↑ FREITAG, DAYNE. "Machine Learning for Information Extraction in Informal Domains" (PDF). 2000 Kluwer Academic Publishers. Printed in the Netherlands.
↑ Cowie, Jim; Wilks, Yorick (1996). Information Extraction (PDF). p. 3. CiteSeerX 10.1.1.61.6480 . S2CID 10237124. Archived from the original (PDF) on 2019-02-20.
↑ Andersen, Peggy M.; Hayes, Philip J.; Huettner, Alison K.; Schmandt, Linda M.; Nirenburg, Irene B.; Weinstein, Steven P. (1992). "Automatic Extraction of Facts from Press Releases to Generate News Stories". Proceedings of the third conference on Applied natural language processing -. pp. 170–177. CiteSeerX 10.1.1.14.7943 . doi:10.3115/974499.974531. S2CID 14746386.
↑ Marco Costantino, Paolo Coletti, Information Extraction in Finance, Wit Press, 2008. ISBN 978-1-84564-146-7
↑ "Linked Data - The Story So Far" (PDF).
↑ "Tim Berners-Lee on the next Web". Archived from the original on 2011-04-10. Retrieved 2010-03-27.
↑ R. K. Srihari, W. Li, C. Niu and T. Cornell,"InfoXtract: A Customizable Intermediate Level Information Extraction Engine",Journal of Natural Language Engineering,^{[ dead link ]} Cambridge U. Press, 14(1), 2008, pp.33-69.
1 2 Dat Quoc Nguyen and Karin Verspoor (2019). "End-to-end neural relation extraction using deep biaffine attention". Proceedings of the 41st European Conference on Information Retrieval (ECIR). arXiv: 1812.11275 . doi:10.1007/978-3-030-15712-8_47.
1 2 Milosevic N, Gregson C, Hernandez R, Nenadic G (February 2019). "A framework for information extraction from tables in biomedical literature". International Journal on Document Analysis and Recognition. 22 (1): 55–78. arXiv: 1902.10031 . Bibcode:2019arXiv190210031M. doi:10.1007/s10032-019-00317-0. S2CID 62880746.
↑ Milosevic, Nikola (2018). A multi-layered approach to information extraction from tables in biomedical documents (PDF) (PhD). University of Manchester.
↑ Milosevic N, Gregson C, Hernandez R, Nenadic G (June 2016). "Disentangling the Structure of Tables in Scientific Literature" (PDF). Natural Language Processing and Information Systems. Lecture Notes in Computer Science. Vol. 21. pp. 162–174. doi:10.1007/978-3-319-41754-7_14. ISBN 978-3-319-41753-0. S2CID 19538141.
↑ Milosevic, Nikola (2018). A multi-layered approach to information extraction from tables in biomedical documents (PDF) (PhD). University of Manchester.
↑ A.Zils, F.Pachet, O.Delerue and F. Gouyon, Automatic Extraction of Drum Tracks from Polyphonic Music Signals Archived 2017-08-29 at the Wayback Machine , Proceedings of WedelMusic, Darmstadt, Germany, 2002.
↑ Chenthamarakshan, Vijil; Desphande, Prasad M; Krishnapuram, Raghu; Varadarajan, Ramakrishnan; Stolze, Knut (2015). "WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction". arXiv: 1506.08454 [cs.CL].
↑ Baumgartner, Robert; Flesca, Sergio; Gottlob, Georg (2001). "Visual Web Information Extraction with Lixto". pp. 119–128. CiteSeerX 10.1.1.21.8236 .
↑ Peng, F.; McCallum, A. (2006). "Information extraction from research papers using conditional random fields☆". Information Processing & Management. 42 (4): 963. doi:10.1016/j.ipm.2005.09.002.
↑ Shimizu, Nobuyuki; Hass, Andrew (2006). "Extracting Frame-based Knowledge Representation from Route Instructions" (PDF). Archived from the original (PDF) on 2006-09-01. Retrieved 2010-03-27.

External links

Alias-I "competition" page A listing of academic toolkits and industrial toolkits for natural language information extraction.
Gabor Melli's page on IE Detailed description of the information extraction task.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] =Kariampuzha2023 Kariampuzha, William; Alyea, Gioconda; Qu, Sue; Sanjak, Jaleal; Mathé, Ewy; Sid, Eric; Chatelaine, Haley; Yadaw, Arjun; Xu, Yanji; Zhu, Qian (2023). "Precision information extraction for rare disease epidemiology at scale". Journal of Translational Medicine. 21 (1): 157. doi: 10.1186/s12967-023-04011-y . PMC 9972634 . PMID 36855134.

[2] Christina Niklaus, Matthias Cetto, André Freitas, and Siegfried Handschuh. 2018. A Survey on Open Information Extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3866–3878, Santa Fe, New Mexico, USA. Association for Computational Linguistics.

[3] FREITAG, DAYNE. "Machine Learning for Information Extraction in Informal Domains" (PDF). 2000 Kluwer Academic Publishers. Printed in the Netherlands.

[4] Cowie, Jim; Wilks, Yorick (1996). Information Extraction (PDF). p. 3. CiteSeerX 10.1.1.61.6480 . S2CID 10237124. Archived from the original (PDF) on 2019-02-20.

[5] Andersen, Peggy M.; Hayes, Philip J.; Huettner, Alison K.; Schmandt, Linda M.; Nirenburg, Irene B.; Weinstein, Steven P. (1992). "Automatic Extraction of Facts from Press Releases to Generate News Stories". Proceedings of the third conference on Applied natural language processing -. pp. 170–177. CiteSeerX 10.1.1.14.7943 . doi:10.3115/974499.974531. S2CID 14746386.

[6] Marco Costantino, Paolo Coletti, Information Extraction in Finance, Wit Press, 2008. ISBN 978-1-84564-146-7

[7] "Linked Data - The Story So Far" (PDF).

[8] "Tim Berners-Lee on the next Web". Archived from the original on 2011-04-10. Retrieved 2010-03-27.

[9] R. K. Srihari, W. Li, C. Niu and T. Cornell,"InfoXtract: A Customizable Intermediate Level Information Extraction Engine",Journal of Natural Language Engineering,^{[ dead link ]} Cambridge U. Press, 14(1), 2008, pp.33-69.

[ecir2019-10] 1 2 Dat Quoc Nguyen and Karin Verspoor (2019). "End-to-end neural relation extraction using deep biaffine attention". Proceedings of the 41st European Conference on Information Retrieval (ECIR). arXiv: 1812.11275 . doi:10.1007/978-3-030-15712-8_47.

[A_framework_for_information_extract-11] 1 2 Milosevic N, Gregson C, Hernandez R, Nenadic G (February 2019). "A framework for information extraction from tables in biomedical literature". International Journal on Document Analysis and Recognition. 22 (1): 55–78. arXiv: 1902.10031 . Bibcode:2019arXiv190210031M. doi:10.1007/s10032-019-00317-0. S2CID 62880746.

[12] Milosevic, Nikola (2018). A multi-layered approach to information extraction from tables in biomedical documents (PDF) (PhD). University of Manchester.

[13] Milosevic N, Gregson C, Hernandez R, Nenadic G (June 2016). "Disentangling the Structure of Tables in Scientific Literature" (PDF). Natural Language Processing and Information Systems. Lecture Notes in Computer Science. Vol. 21. pp. 162–174. doi:10.1007/978-3-319-41754-7_14. ISBN 978-3-319-41753-0. S2CID 19538141.

[14] Milosevic, Nikola (2018). A multi-layered approach to information extraction from tables in biomedical documents (PDF) (PhD). University of Manchester.

[15] A.Zils, F.Pachet, O.Delerue and F. Gouyon, Automatic Extraction of Drum Tracks from Polyphonic Music Signals Archived 2017-08-29 at the Wayback Machine , Proceedings of WedelMusic, Darmstadt, Germany, 2002.

[16] Chenthamarakshan, Vijil; Desphande, Prasad M; Krishnapuram, Raghu; Varadarajan, Ramakrishnan; Stolze, Knut (2015). "WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction". arXiv: 1506.08454 [cs.CL].

[17] Baumgartner, Robert; Flesca, Sergio; Gottlob, Georg (2001). "Visual Web Information Extraction with Lixto". pp. 119–128. CiteSeerX 10.1.1.21.8236 .

[18] Peng, F.; McCallum, A. (2006). "Information extraction from research papers using conditional random fields☆". Information Processing & Management. 42 (4): 963. doi:10.1016/j.ipm.2005.09.002.

[19] Shimizu, Nobuyuki; Hass, Andrew (2006). "Extracting Frame-based Knowledge Representation from Route Instructions" (PDF). Archived from the original (PDF) on 2006-09-01. Retrieved 2010-03-27.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]