Annotation

Last updated August 12, 2024

An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation.^[1] Annotations are sometimes presented in the margin of book pages. For annotations of different digital media, see web annotation and text annotation.

Literature, grammar and educational purposes

Practising visually

Annotation Practices are highlighting a phrase or sentence and including a comment, circling a word that needs defining, posing a question when something is not fully understood and writing a short summary of a key section.^[3] It also invites students to "(re)construct a history through material engagement and exciting DIY (Do-It-Yourself) annotation practices."^[4] Annotation practices that are available today offer a remarkable set of tools for students to begin to work, and in a more collaborative, connected way than has been previously possible.^[5]

Text and film annotation

Text and Film Annotation is a technique that involves using comments, text within a film. Analyzing videos is an undertaking that is never entirely free of preconceived notions, and the first step for researchers is to find their bearings within the field of possible research approaches and thus reflect on their own basic assumptions.^[6] Annotations can take part within the video, and can be used when the data video is recorded. It is being used as a tool in text and film to write one's thoughts and emotion into the markings.^[3] In any number of steps of analysis, it can also be supplemented with more annotations. Anthropologists Clifford Geertz calls it a "thick description." This can give a sense of how useful annotation is, especially by adding a description of how it can be implemented in film.^[6]

Medieval marginalia

Marginalia refers to writing or decoration in the margins of a manuscript. Medieval marginalia is so well known that amusing or disconcerting instances of it are fodder for viral aggregators such as Buzzfeed and Brainpickings, and the fascination with other readers’ reading is manifest in sites such as Melville's Marginalia Online or Harvard's online exhibit of marginalia from six personal libraries.^[5] It can also be a part of other websites such as Pinterest, or even meme generators and GIF tools.

Textual scholarship

Textual scholarship is a discipline that often uses the technique of annotation to describe or add additional historical context to texts and physical documents to make it easier to understand.^[7]

Student uses

Students often highlight passages in books in order to actively engage with the text. Students can use annotations to refer back to key phrases easily, or add marginalia to aid studying and finding connections between the text and prior knowledge or running themes.^[8]

Annotated bibliographies add commentary on the relevance or quality of each source, in addition to the usual bibliographic information that merely identifies the source.

Students use Annotation not only for academic purposes, but interpreting their own thoughts, feelings, and emotions.^[3] Sites such as Scalar and Omeka are sites that students use. There are multiple genres with Annotation such as math, film, linguists, and literary theory which students find it most helpful to use. Most students reported the annotation process as helpful for improving overall writing ability, grammar, and academic vocabulary knowledge.

Mathematical expression annotation

Mathematical expressions (symbols and formulae) can be annotated with their natural language meaning. This is essential for disambiguation, since symbols may have different meanings (e.g., "E" can be "energy" or "expectation value", etc.).^[9]^[10] The annotation process can be facilitated and accelerated through recommendation, e.g., using the "AnnoMathTeX" system that is hosted by Wikimedia.^[11]^[12]^[13]

Learning and instruction

From a cognitive perspective, annotation has an important role in learning and instruction. As part of guided noticing it involves highlighting, naming or labelling and commenting aspects of visual representations to help focus learners' attention on specific visual aspects. In other words, it means the assignment of typological representations (culturally meaningful categories), to topological representations (e.g. images).^[14] This is especially important when experts, such as medical doctors, interpret visualizations in detail and explain their interpretations to others, for example by means of digital technology.^[15] Here, annotation can be a way to establish common ground between interactants with different levels of knowledge.^[16] The value of annotation has been empirically confirmed, for example, in a study which shows that in computer-based teleconsultations the integration of image annotation and speech leads to significantly improved knowledge exchange compared with the use of images and speech without annotation.^[17]

On YouTube

Annotations were removed on January 15, 2019, from YouTube after around a decade of service.^[18] They had allowed users to provide information that popped up during videos, but YouTube indicated they did not work well on small mobile screens, and were being abused.

Software and engineering

Text documents

Markup languages like XML and HTML annotate text in a way that is syntactically distinguishable from that text. They can be used to add information about the desired visual presentation, or machine-readable semantic information, as in the semantic web.^[19]

Tabular data

This includes CSV and XLS. The process of assigning semantic annotations to tabular data is referred to as semantic labelling. Semantic Labelling is the process of assigning annotations from ontologies to tabular data.^[20]^[21]^[22]^[23] This process is also referred to as semantic annotation.^[24]^[23] Semantic Labelling is often done in a (semi-)automatic fashion. Semantic Labelling techniques work on entity columns,^[23] numeric columns,^[20]^[22]^[25]^[26] coordinates,^[27] and more.^[27]^[26]

Semantic labelling techniques

There are several semantic labelling types which utilises machine learning techniques. These techniques can be categorised following the work of Flach^[28]^[29] as follows: geometric (using lines and planes, such as Support-vector machine, Linear regression), probabilistic (e.g., Conditional random field), logical (e.g., Decision tree learning), and Non-ML techniques (e.g., balancing coverage and specificity^[23]). Note that the geometric, probabilistic, and logical machine learning models are not mutually exclusive.^[28]

Geometric techniques

Pham et al.^[30] use Jaccard index and TF-IDF similarity for textual data and Kolmogorov–Smirnov test for the numeric ones. Alobaid and Corcho^[22] use fuzzy clustering (c-means^[31]^[32]) to label numeric columns.

Probabilistic techniques

Limaye et al.^[33] uses TF-IDF similarity and graphical models. They also use support-vector machine to compute the weights. Venetis et al.^[34] construct an isA database which consists of the pairs (instance, class) and then compute maximum likelihood using these pairs. Alobaid and Corcho^[35] approximated the q-q plot for predicting the properties of numeric columns.

Logical techniques

Syed et al.^[36] built Wikitology, which is "a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Data resources."^[36] For the Wikitology index, they use PageRank for Entity linking, which is one of the tasks often used in semantic labelling. Since they were not able to query Google for all Wikipedia articles to get the PageRank, they used Decision tree to approximate it.^[36]

Non-ML techniques

Alobaid and Corcho^[23] presented an approach to annotate entity columns. The technique starts by annotating the cells in the entity column with the entities from the reference knowledge graph (e.g., DBpedia). The classes are then gathered and each one of them is scored based on several formulas they presented taking into account the frequency of each class and their depth according to the subClass hierarchy.^[37]

Semantic labelling common tasks

Here are some of the common semantic labelling tasks presented in the literature:

Entity linking and disambiguation

This is the most common task in semantic labelling. Given a text of a cell and a data source, the approach predicts the entity and link it to the one identified in the given data source. For example, if the input to the approach were the text "Richard Feynman" and a URL to the SPARQL endpoint of DBpedia, the approach would return "http://dbpedia.org/resource/Richard_Feynman", which is the entity from DBpedia. Some approaches use exact match.^[23] while others use similarity metrics such as Cosine similarity ^[33]

Subject column identification

The subject column of a table is the column that contain the main subjects/entities in the table.^[20]^[29]^[34]^[38]^[39] Some approaches expects the subject column as an input^[23] while others predict the subject column such as TableMiner+.^[39]

Column data-type detection

Columns types are divided differently by different approaches.^[29] Some divide them into strings/text and numbers^[22]^[30]^[40]^[26] while others divide them further^[29] (e.g., Number Typology,^[20] Date,^[36]^[34] coordinates^[41]).

Relation prediction

The relation between Madrid and Spain is "capitalOf".^[42] Such relations can easily be found in ontologies, such as DBpedia. Venetis et al.^[34] use TextRunner^[43] to extract the relation between two columns. Syed et al.^[36] use the relation between the entities of the two columns and the most frequent relation is selected.

Gold standards

T2D^[44] is the most common gold standard for semantic labelling. Two versions exists of T2D: T2Dv1 (sometimes are referred to T2D as well) and T2Dv2.^[44] Another known benchmarks are published with the SemTab Challenge.^[45]

Source control

The "annotate" function (also known as "blame" or "praise") used in source control systems such as Git, Team Foundation Server and Subversion determines who committed changes to the source code into the repository. This outputs a copy of the source code where each line is annotated with the name of the last contributor to edit that line (and possibly a revision number). This can help establish blame in the event a change caused a malfunction, or identify the author of brilliant code.

Java annotations

A special case is the Java programming language, where annotations can be used as a special form of syntactic metadata in the source code.^[46] Classes, methods, variables, parameters and packages may be annotated. The annotations can be embedded in class files generated by the compiler and may be retained by the Java virtual machine and thus influence the run-time behaviour of an application. It is possible to create meta-annotations out of the existing ones in Java.^[47]

Image annotation

Automatic image annotation is used to classify images for image retrieval systems.^[48]

Computational biology

Since the 1980s, molecular biology and bioinformatics have created the need for DNA annotation. DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.^[49]

Digital imaging

In the digital imaging community the term annotation is commonly used for visible metadata superimposed on an image without changing the underlying master image, such as sticky notes, virtual laser pointers, circles, arrows, and black-outs (cf. redaction).^[50]

In the medical imaging community, an annotation is often referred to as a region of interest and is encoded in DICOM format.

Other uses

Law

In the United States, legal publishers such as Thomson West and Lexis Nexis publish annotated versions of statutes, providing information about court cases that have interpreted the statutes. Both the federal United States Code and state statutes are subject to interpretation by the courts, and the annotated statutes are valuable tools in legal research.^[51]

Linguistics

One purpose of annotation is to transform the data into a form suitable for computer-aided analysis. Prior to annotation, an annotation scheme is defined that typically consists of tags. During tagging, transcriptionists manually add tags into transcripts where required linguistical features are identified in an annotation editor. The annotation scheme ensures that the tags are added consistently across the data set and allows for verification of previously tagged data.^[52] Aside from tags, more complex forms of linguistic annotation include the annotation of phrases and relations, e.g., in treebanks. Many different forms of linguistic annotation have been developed, as well as different formats and tools for creating and managing linguistic annotations, as described, for example, in the Linguistic Annotation Wiki.^[53]

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

<span class="mw-page-title-main">Table (information)</span> Arrangement of information or data, typically in rows and columns

A table is an arrangement of information or data, typically in rows and columns, or possibly in a more complex structure. Tables are widely used in communication, research, and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs, and many other places. The precise conventions and terminology for describing tables vary depending on the context. Further, tables differ significantly in variety, structure, flexibility, notation, representation and use. Information or data conveyed in table form is said to be in tabular format. In books and technical articles, tables are typically presented apart from the main text in numbered and captioned floating blocks.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic searches, used for aggregation of pages, displayed in formats like maps, calendars and graphs, and exported to the outside world via formats like RDF and CSV.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

The Semantic Sensor Web (SSW) is a marriage of sensor web and semantic Web technologies. The encoding of sensor descriptions and sensor observation data with Semantic Web languages enables more expressive representation, advanced access, and formal analysis of sensor resources. The SSW annotates sensor data with spatial, temporal, and thematic semantic metadata. This technique builds on current standardization efforts within the Open Geospatial Consortium's Sensor Web Enablement (SWE) and extends them with Semantic Web technologies to provide enhanced descriptions and access to sensor data.

The International Semantic Web Conference (ISWC) is a series of academic conferences and the premier international forum for the Semantic Web, Linked Data and Knowledge Graph Community. Here, scientists, industry specialists, and practitioners meet to discuss the future of practical, scalable, user-friendly, and game changing solutions. Its proceedings are published in the Lecture Notes in Computer Science by Springer-Verlag.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

In natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris". Entity linking is different from named-entity recognition (NER) in that NER identifies the occurrence of a named entity in text but it does not identify which specific entity it is.

An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.

UMBEL is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired at the end of 2019. UMBEL was first released in July 2008. Version 1.00 was released in February 2011. Its current release is version 1.50.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a sequence of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions.

Drama annotation is the process of annotating the metadata of a drama. Given a drama expressed in some medium, the process of metadata annotation identifies what are the elements that characterize the drama and annotates such elements in some metadata format. For example, in the sentence "Laertes and Polonius warn Ophelia to stay away from Hamlet." from the text Hamlet, the word "Laertes", which refers to a drama element, namely a character, will be annotated as "Char", taken from some set of metadata. This article addresses the drama annotation projects, with the sets of metadata and annotations proposed in the scientific literature, based markup languages and ontologies.

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.

Table extraction is the process of recognizing and separating a table from a large document, possibly also recognizing individual rows, columns or elements. It may be regarded as a special form of information extraction.

References

↑ "Definition of Data Annotation". macgence.com/. 17 August 2023.
↑ "Types of Data Annotation". maanz-ai.com/. 30 March 2024.
1 2 3 Crosthwaite, Peter; Sanhueza, Alicia Gazmuri; Schweinberger, Martin (September 2021). "Training disciplinary genre awareness through blended learning: An exploration into EAP students' perceptions of online annotation of genres across disciplines". Journal of English for Academic Purposes. 53: 101021. doi:10.1016/j.jeap.2021.101021. S2CID 236238505.
↑ Jocius, Robin (March 2018). "Becoming Entangled: An Analysis of 5th Grade Students Collaborative Multimodal Composing Practices". Computers and Composition. 47: 14–30. doi:10.1016/j.compcom.2017.12.008. ISSN 8755-4615.
1 2 Dougherty, Jack; O'Donnell, Tennyson, eds. (2015-04-21). Web Writing. University of Michigan Press. doi:10.2307/j.ctv65sxgk. ISBN 978-0-472-90012-1.
1 2 Lösel, Gunter; Zimper, Martin, eds. (2021-05-26). Filming, Researching, Annotating: Research Video Handbook. doi:10.1515/9783035623079. ISBN 9783035623079. S2CID 238919442.
↑ Greetham, David C. (28 October 2015) [1992]. Textual Scholarship: An Introduction. Garland Reference Library of the Humanities. Vol. 1417. Routledge. ISBN 978-1-136-75579-8.
↑ Wallen, Erik; Plass, Jan L.; Brünken, Roland (September 2005). "The function of annotations in the comprehension of scientific texts: Cognitive load effects and the impact of verbal ability". Educational Technology Research and Development. 53 (3): 59–71. doi:10.1007/BF02504798. ISSN 1042-1629. S2CID 17846801.
↑ Moritz Schubotz; Philipp Scharpf; et al. (12 September 2018). "Introducing MathQA: a Math-Aware question answering system". Information Discovery and Delivery. 46 (4). Emerald Publishing Limited: 214–224. arXiv: 1907.01642 . doi:10.1108/IDD-06-2018-0022. S2CID 49484035.
↑ Scharpf, P.; Schubotz, M.; et al. (2018). Representing Mathematical Formulae in Content MathML using Wikidata. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).
↑ "AnnoMathTeX Formula/Identifier Annotation Recommender System".
↑ Philipp Scharpf; Ian Mackerracher; et al. (17 September 2019). "AnnoMathTeX - a formula identifier annotation recommender system for STEM documents". Proceedings of the 13th ACM Conference on Recommender Systems (PDF). pp. 532–3. doi:10.1145/3298689.3347042. ISBN 9781450362436. S2CID 202639987.
↑ Philipp Scharpf; Moritz Schubotz; Bela Gipp (14 April 2021). "Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation". Companion Proceedings of the Web Conference 2021 (PDF). pp. 602–9. arXiv: 2104.05111 . doi:10.1145/3442442.3452348. ISBN 9781450383134. S2CID 233210264.
↑ Pea, R.D. (2006). "Video-as-Data and Digital Video Manipulation Techniques for Transforming Learning Sciences Research, Education, and Other Cultural Practices". The International Handbook of Virtual Learning Environments (PDF). Springer. pp. 1321–93. doi:10.1007/978-1-4020-3803-7_55. ISBN 978-1-4020-3803-7.
↑ Coiera, E. (2014). "Communication spaces". J Am Med Inform Assoc. 21 (3): 414–422. doi:10.1136/amiajnl-2012-001520. PMC 3994845 . PMID 24005797.
↑ Clark, Herbert H. (1996). Using Language. Cambridge University Press. ISBN 978-0-521-56745-9.
↑ Pimmer, C.; Mateescu, M.; Zahn, C.; Genewein, U. (2013). "Smartphones as multimodal communication devices to facilitate clinical knowledge processes — a randomized controlled trial". Journal of Medical Internet Research. 15 (11): e263. doi: 10.2196/jmir.2758 . PMC 3868983 . PMID 24284080.
↑ "YouTube annotations will disappear for good in January". engadget. 2018-11-27. Retrieved 2019-01-19.
↑ "Web Annotation Data Model". World Wide Web Consortium. 11 December 2014. Retrieved 25 August 2015.
1 2 3 4 Alobaid, Ahmad; Kacprzak, Emilia; Corcho, Oscar (January 1, 2021). "Typology-based semantic labeling of numeric tabular data". Semantic Web. 12 (1): 5–20. doi:10.3233/SW-200397. S2CID 224853014 – via content.iospress.com.
↑ Taheriyan, Mohsen; Knoblock, Craig A.; Szekely, Pedro; Ambite, José Luis (March 1, 2016). "Learning the semantics of structured data sources". Web Semantics: Science, Services and Agents on the World Wide Web. 37 (C): 152–169. arXiv: 1601.04105 . doi:10.1016/j.websem.2015.12.003. S2CID 7409058 – via March 2016.
1 2 3 4 Alobaid, Ahmad; Corcho, Oscar (2018). "Fuzzy Semantic Labeling of Semi-structured Numerical Datasets". In Faron Zucker, Catherine; Ghidini, Chiara; Napoli, Amedeo; Toussaint, Yannick (eds.). Knowledge Engineering and Knowledge Management. Lecture Notes in Computer Science. Vol. 11313. Cham: Springer International Publishing. pp. 19–33. doi:10.1007/978-3-030-03667-6_2. ISBN 978-3-030-03667-6.
1 2 3 4 5 6 7 Alobaid, Ahmad; Corcho, Oscar (2022-03-15). "Balancing coverage and specificity for semantic labelling of subject columns". Knowledge-Based Systems. 240: 108092. doi:10.1016/j.knosys.2021.108092. ISSN 0950-7051. S2CID 245971543.
↑ Hassanzadeh, O.; Ward, Michael J.; Rodriguez-Muro, Mariano; Srinivas, Kavitha (December 17, 2015). "Understanding a large corpus of web tables through matching with knowledge bases: an empirical study". S2CID 442374.
↑ Neumaier, Sebastian; Umbrich, Jürgen; Parreira, Josiane Xavier; Polleres, Axel (2016). "Multi-level Semantic Labelling of Numerical Values". In Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda (eds.). The Semantic Web – ISWC 2016. Lecture Notes in Computer Science. Vol. 9981. Cham: Springer International Publishing. pp. 428–445. doi:10.1007/978-3-319-46523-4_26. ISBN 978-3-319-46523-4.
1 2 3 Zhang, Meihui; Chakrabarti, Kaushik (2013-06-22). "InfoGather+". Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. SIGMOD '13. New York, NY, USA: Association for Computing Machinery. pp. 145–156. doi:10.1145/2463676.2465276. ISBN 978-1-4503-2037-5. S2CID 15540847.
1 2 Ritze, Dominique; Lehmberg, Oliver; Bizer, Christian (July 13, 2015). "Matching HTML Tables to DBpedia". Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. Association for Computing Machinery. pp. 1–6. doi:10.1145/2797115.2797118. ISBN 9781450332934. S2CID 207228254 – via ACM Digital Library.
1 2 Flach, Peter (2012). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511973000. ISBN 978-1-107-09639-4.
1 2 3 4 Alobaid, Ahmad (c. 2020). Knowledge-Graph-Based Semantic Labeling of Tabular Data (phd thesis). E.T.S. de Ingenieros Informáticos (UPM). doi:10.20868/upm.thesis.64068.
1 2 Pham, Minh; Alse, Suresh; Knoblock, Craig A.; Szekely, Pedro (2016). "Semantic Labeling: A Domain-Independent Approach". In Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda (eds.). The Semantic Web – ISWC 2016. Lecture Notes in Computer Science. Vol. 9981. Cham: Springer International Publishing. pp. 446–462. doi:10.1007/978-3-319-46523-4_27. ISBN 978-3-319-46523-4. S2CID 37873758.
↑ Fuzzy c-Means Library, Ontology Engineering Group (UPM), 2022-01-29, retrieved 2023-01-04
↑ fuzzy-c-means, Ontology Engineering Group (UPM), 2022-12-12, retrieved 2023-01-04
1 2 Limaye, Girija; Sarawagi, Sunita; Chakrabarti, Soumen (2010-09-01). "Annotating and searching web tables using entities, types and relationships". Proceedings of the VLDB Endowment. 3 (1–2): 1338–1347. doi:10.14778/1920841.1921005. ISSN 2150-8097. S2CID 9262964.
1 2 3 4 Venetis, Petros; Halevy, Alon; Madhavan, Jayant; Paşca, Marius; Shen, Warren; Wu, Fei; Miao, Gengxin; Wu, Chung (2011-06-01). "Recovering semantics of tables on the web". Proceedings of the VLDB Endowment. 4 (9): 528–538. doi:10.14778/2002938.2002939. ISSN 2150-8097. S2CID 11359711.
↑ Alobaid, Ahmad; Corcho, Oscar (March 2024). "Linear approximation of the quantile–quantile plot for semantic labelling of numeric columns in tabular data". Expert Systems with Applications. 238: 122152. doi:10.1016/j.eswa.2023.122152.
1 2 3 4 5 Syed, Zareen; Finin, Tim; Mulwad, Varish; Joshi, Anupam (2010-04-26). "Exploiting a Web of Semantic Data for Interpreting Tables". Proceedings of the Second Web Science Conference.
↑ "OWL Web Ontology Language Reference". www.w3.org. Retrieved 2022-09-22.
↑ Ermilov, Ivan; Ngomo, Axel-Cyrille Ngonga (2016), "TAIPAN: Automatic Property Mapping for Tabular Data", Knowledge Engineering and Knowledge Management, Lecture Notes in Computer Science, vol. 10024, Cham: Springer International Publishing, pp. 163–179, doi:10.1007/978-3-319-49004-5_11, ISBN 978-3-319-49003-8, S2CID 37730677 , retrieved 2022-09-22
1 2 Zhang, Ziqi (2017-08-07). Hitzler, Pascal; Cruz, Isabel (eds.). "Effective and efficient Semantic Table Interpretation using TableMiner+". Semantic Web. 8 (6): 921–957. doi:10.3233/SW-160242.
↑ Ramnandan, S.K.; Mittal, Amol; Knoblock, Craig A.; Szekely, Pedro (2015). "Assigning Semantic Labels to Data Sources". In Gandon, Fabien; Sabou, Marta; Sack, Harald; d’Amato, Claudia; Cudré-Mauroux, Philippe; Zimmermann, Antoine (eds.). The Semantic Web. Latest Advances and New Domains. Lecture Notes in Computer Science. Vol. 9088. Cham: Springer International Publishing. pp. 403–417. doi: 10.1007/978-3-319-18818-8_25 . ISBN 978-3-319-18818-8. S2CID 7040223.
↑ Quercini, Gianluca; Reynaud, Chantal (2013). "Entity discovery and annotation in tables". Proceedings of the 16th International Conference on Extending Database Technology (PDF). New York, New York, USA: ACM Press. p. 693. doi:10.1145/2452376.2452457. ISBN 9781450315975. S2CID 8252126.
↑ "About: capital of". dbpedia.org. Retrieved 2022-09-22.
↑ Etzioni, Oren; Banko, Michele; Soderland, Stephen; Weld, Daniel S. (2008-12-01). "Open information extraction from the web". Communications of the ACM. 51 (12): 68–74. doi:10.1145/1409360.1409378. ISSN 0001-0782. S2CID 207169186.
1 2 Bizer, Dominique Ritze, Oliver Lehmberg, Christian. "Web Data Commons - T2Dv2". webdatacommons.org. Retrieved 2022-07-18.{{cite web}}: CS1 maint: multiple names: authors list (link)
↑ "Semantic Web Challenge on Tabular Data to Knowledge Graph Matching". www.cs.ox.ac.uk. Retrieved 2022-09-30.
↑ "JDK 5.0 Developer's Guide: Annotations". Sun Microsystems. 2007-12-18. Archived from the original on 6 March 2008. Retrieved 2008-03-05..
↑ Characterizing the Usage, Evolution and Impact of Java Annotations in Practice. "Characterizing the Usage, Evolution and Impact of Java Annotations in Practice".
↑ Zhang, D.; Islam, M.M.; Lu, G. (2012). "A review on automatic image annotation techniques". Pattern Recognition. 45 (1): 346–362. Bibcode:2012PatRe..45..346Z. doi:10.1016/j.patcog.2011.05.013.
↑ "Medical Definition of Genome annotation". MedicineNet. Retrieved 2021-09-09.
↑ Pelka, Obioma; Nensa, Felix; Friedrich, Christoph M. (2018-11-12). "Annotation of enhanced radiographs for medical image retrieval with deep convolutional neural networks". PLOS ONE. 13 (11): e0206229. Bibcode:2018PLoSO..1306229P. doi: 10.1371/journal.pone.0206229 . ISSN 1932-6203. PMC 6231616 . PMID 30419028.
↑ Wyner, Adam; Peters, Wim; Katz, Daniel (2013). "A Case Study on Legal Case Annotation". In Ashley, Kevin D. (ed.). Legal Knowledge and Information Systems. Frontiers in Artificial Intelligence and Applications. Vol. 259. Amsterdam: IOS Press. pp. 165–174. doi:10.3233/978-1-61499-359-9-165. ISBN 978-1-61499-359-9.
↑ "Annotation Schemes | CAWSE" (PDF). General Annotation Conventions. 2017-02-05. Retrieved 2019-01-06.
↑ "LinguisticAnnotation". annotation.exmaralda.org.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Definition of Data Annotation". macgence.com/. 17 August 2023.

[2] "Types of Data Annotation". maanz-ai.com/. 30 March 2024.

[:22-3] 1 2 3 Crosthwaite, Peter; Sanhueza, Alicia Gazmuri; Schweinberger, Martin (September 2021). "Training disciplinary genre awareness through blended learning: An exploration into EAP students' perceptions of online annotation of genres across disciplines". Journal of English for Academic Purposes. 53: 101021. doi:10.1016/j.jeap.2021.101021. S2CID 236238505.

[4] Jocius, Robin (March 2018). "Becoming Entangled: An Analysis of 5th Grade Students Collaborative Multimodal Composing Practices". Computers and Composition. 47: 14–30. doi:10.1016/j.compcom.2017.12.008. ISSN 8755-4615.

[:03-5] 1 2 Dougherty, Jack; O'Donnell, Tennyson, eds. (2015-04-21). Web Writing. University of Michigan Press. doi:10.2307/j.ctv65sxgk. ISBN 978-0-472-90012-1.

[:12-6] 1 2 Lösel, Gunter; Zimper, Martin, eds. (2021-05-26). Filming, Researching, Annotating: Research Video Handbook. doi:10.1515/9783035623079. ISBN 9783035623079. S2CID 238919442.

[7] Greetham, David C. (28 October 2015) [1992]. Textual Scholarship: An Introduction. Garland Reference Library of the Humanities. Vol. 1417. Routledge. ISBN 978-1-136-75579-8.

[8] Wallen, Erik; Plass, Jan L.; Brünken, Roland (September 2005). "The function of annotations in the comprehension of scientific texts: Cognitive load effects and the impact of verbal ability". Educational Technology Research and Development. 53 (3): 59–71. doi:10.1007/BF02504798. ISSN 1042-1629. S2CID 17846801.

[SchubotzScharpf2018-9] Moritz Schubotz; Philipp Scharpf; et al. (12 September 2018). "Introducing MathQA: a Math-Aware question answering system". Information Discovery and Delivery. 46 (4). Emerald Publishing Limited: 214–224. arXiv: 1907.01642 . doi:10.1108/IDD-06-2018-0022. S2CID 49484035.

[ScharpfSchubotz2018-10] Scharpf, P.; Schubotz, M.; et al. (2018). Representing Mathematical Formulae in Content MathML using Wikidata. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).

[AnnoMathTeX-11] "AnnoMathTeX Formula/Identifier Annotation Recommender System".

[Scharpf2019-12] Philipp Scharpf; Ian Mackerracher; et al. (17 September 2019). "AnnoMathTeX - a formula identifier annotation recommender system for STEM documents". Proceedings of the 13th ACM Conference on Recommender Systems (PDF). pp. 532–3. doi:10.1145/3298689.3347042. ISBN 9781450362436. S2CID 202639987.

[Scharpf2021-13] Philipp Scharpf; Moritz Schubotz; Bela Gipp (14 April 2021). "Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation". Companion Proceedings of the Web Conference 2021 (PDF). pp. 602–9. arXiv: 2104.05111 . doi:10.1145/3442442.3452348. ISBN 9781450383134. S2CID 233210264.

[14] Pea, R.D. (2006). "Video-as-Data and Digital Video Manipulation Techniques for Transforming Learning Sciences Research, Education, and Other Cultural Practices". The International Handbook of Virtual Learning Environments (PDF). Springer. pp. 1321–93. doi:10.1007/978-1-4020-3803-7_55. ISBN 978-1-4020-3803-7.

[15] Coiera, E. (2014). "Communication spaces". J Am Med Inform Assoc. 21 (3): 414–422. doi:10.1136/amiajnl-2012-001520. PMC 3994845 . PMID 24005797.

[16] Clark, Herbert H. (1996). Using Language. Cambridge University Press. ISBN 978-0-521-56745-9.

[17] Pimmer, C.; Mateescu, M.; Zahn, C.; Genewein, U. (2013). "Smartphones as multimodal communication devices to facilitate clinical knowledge processes — a randomized controlled trial". Journal of Medical Internet Research. 15 (11): e263. doi: 10.2196/jmir.2758 . PMC 3868983 . PMID 24284080.

[18] "YouTube annotations will disappear for good in January". engadget. 2018-11-27. Retrieved 2019-01-19.

[Web_Annotation_Data_Model-19] "Web Annotation Data Model". World Wide Web Consortium. 11 December 2014. Retrieved 25 August 2015.

[auto12-20] 1 2 3 4 Alobaid, Ahmad; Kacprzak, Emilia; Corcho, Oscar (January 1, 2021). "Typology-based semantic labeling of numeric tabular data". Semantic Web. 12 (1): 5–20. doi:10.3233/SW-200397. S2CID 224853014 – via content.iospress.com.

[21] Taheriyan, Mohsen; Knoblock, Craig A.; Szekely, Pedro; Ambite, José Luis (March 1, 2016). "Learning the semantics of structured data sources". Web Semantics: Science, Services and Agents on the World Wide Web. 37 (C): 152–169. arXiv: 1601.04105 . doi:10.1016/j.websem.2015.12.003. S2CID 7409058 – via March 2016.

[auto2-22] 1 2 3 4 Alobaid, Ahmad; Corcho, Oscar (2018). "Fuzzy Semantic Labeling of Semi-structured Numerical Datasets". In Faron Zucker, Catherine; Ghidini, Chiara; Napoli, Amedeo; Toussaint, Yannick (eds.). Knowledge Engineering and Knowledge Management. Lecture Notes in Computer Science. Vol. 11313. Cham: Springer International Publishing. pp. 19–33. doi:10.1007/978-3-030-03667-6_2. ISBN 978-3-030-03667-6.

[:02-23] 1 2 3 4 5 6 7 Alobaid, Ahmad; Corcho, Oscar (2022-03-15). "Balancing coverage and specificity for semantic labelling of subject columns". Knowledge-Based Systems. 240: 108092. doi:10.1016/j.knosys.2021.108092. ISSN 0950-7051. S2CID 245971543.

[24] Hassanzadeh, O.; Ward, Michael J.; Rodriguez-Muro, Mariano; Srinivas, Kavitha (December 17, 2015). "Understanding a large corpus of web tables through matching with knowledge bases: an empirical study". S2CID 442374.

[25] Neumaier, Sebastian; Umbrich, Jürgen; Parreira, Josiane Xavier; Polleres, Axel (2016). "Multi-level Semantic Labelling of Numerical Values". In Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda (eds.). The Semantic Web – ISWC 2016. Lecture Notes in Computer Science. Vol. 9981. Cham: Springer International Publishing. pp. 428–445. doi:10.1007/978-3-319-46523-4_26. ISBN 978-3-319-46523-4.

[:102-26] 1 2 3 Zhang, Meihui; Chakrabarti, Kaushik (2013-06-22). "InfoGather+". Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. SIGMOD '13. New York, NY, USA: Association for Computing Machinery. pp. 145–156. doi:10.1145/2463676.2465276. ISBN 978-1-4503-2037-5. S2CID 15540847.

[:1-27] 1 2 Ritze, Dominique; Lehmberg, Oliver; Bizer, Christian (July 13, 2015). "Matching HTML Tables to DBpedia". Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. Association for Computing Machinery. pp. 1–6. doi:10.1145/2797115.2797118. ISBN 9781450332934. S2CID 207228254 – via ACM Digital Library.

[:2-28] 1 2 Flach, Peter (2012). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511973000. ISBN 978-1-107-09639-4.

[:5-29] 1 2 3 4 Alobaid, Ahmad (c. 2020). Knowledge-Graph-Based Semantic Labeling of Tabular Data (phd thesis). E.T.S. de Ingenieros Informáticos (UPM). doi:10.20868/upm.thesis.64068.

[:6-30] 1 2 Pham, Minh; Alse, Suresh; Knoblock, Craig A.; Szekely, Pedro (2016). "Semantic Labeling: A Domain-Independent Approach". In Groth, Paul; Simperl, Elena; Gray, Alasdair; Sabou, Marta; Krötzsch, Markus; Lecue, Freddy; Flöck, Fabian; Gil, Yolanda (eds.). The Semantic Web – ISWC 2016. Lecture Notes in Computer Science. Vol. 9981. Cham: Springer International Publishing. pp. 446–462. doi:10.1007/978-3-319-46523-4_27. ISBN 978-3-319-46523-4. S2CID 37873758.

[31] Fuzzy c-Means Library, Ontology Engineering Group (UPM), 2022-01-29, retrieved 2023-01-04

[32] fuzzy-c-means, Ontology Engineering Group (UPM), 2022-12-12, retrieved 2023-01-04

[:7-33] 1 2 Limaye, Girija; Sarawagi, Sunita; Chakrabarti, Soumen (2010-09-01). "Annotating and searching web tables using entities, types and relationships". Proceedings of the VLDB Endowment. 3 (1–2): 1338–1347. doi:10.14778/1920841.1921005. ISSN 2150-8097. S2CID 9262964.

[:8-34] 1 2 3 4 Venetis, Petros; Halevy, Alon; Madhavan, Jayant; Paşca, Marius; Shen, Warren; Wu, Fei; Miao, Gengxin; Wu, Chung (2011-06-01). "Recovering semantics of tables on the web". Proceedings of the VLDB Endowment. 4 (9): 528–538. doi:10.14778/2002938.2002939. ISSN 2150-8097. S2CID 11359711.

[35] Alobaid, Ahmad; Corcho, Oscar (March 2024). "Linear approximation of the quantile–quantile plot for semantic labelling of numeric columns in tabular data". Expert Systems with Applications. 238: 122152. doi:10.1016/j.eswa.2023.122152.

[:4-36] 1 2 3 4 5 Syed, Zareen; Finin, Tim; Mulwad, Varish; Joshi, Anupam (2010-04-26). "Exploiting a Web of Semantic Data for Interpreting Tables". Proceedings of the Second Web Science Conference.

[37] "OWL Web Ontology Language Reference". www.w3.org. Retrieved 2022-09-22.

[38] Ermilov, Ivan; Ngomo, Axel-Cyrille Ngonga (2016), "TAIPAN: Automatic Property Mapping for Tabular Data", Knowledge Engineering and Knowledge Management, Lecture Notes in Computer Science, vol. 10024, Cham: Springer International Publishing, pp. 163–179, doi:10.1007/978-3-319-49004-5_11, ISBN 978-3-319-49003-8, S2CID 37730677 , retrieved 2022-09-22

[:9-39] 1 2 Zhang, Ziqi (2017-08-07). Hitzler, Pascal; Cruz, Isabel (eds.). "Effective and efficient Semantic Table Interpretation using TableMiner+". Semantic Web. 8 (6): 921–957. doi:10.3233/SW-160242.

[40] Ramnandan, S.K.; Mittal, Amol; Knoblock, Craig A.; Szekely, Pedro (2015). "Assigning Semantic Labels to Data Sources". In Gandon, Fabien; Sabou, Marta; Sack, Harald; d’Amato, Claudia; Cudré-Mauroux, Philippe; Zimmermann, Antoine (eds.). The Semantic Web. Latest Advances and New Domains. Lecture Notes in Computer Science. Vol. 9088. Cham: Springer International Publishing. pp. 403–417. doi: 10.1007/978-3-319-18818-8_25 . ISBN 978-3-319-18818-8. S2CID 7040223.

[41] Quercini, Gianluca; Reynaud, Chantal (2013). "Entity discovery and annotation in tables". Proceedings of the 16th International Conference on Extending Database Technology (PDF). New York, New York, USA: ACM Press. p. 693. doi:10.1145/2452376.2452457. ISBN 9781450315975. S2CID 8252126.

[42] "About: capital of". dbpedia.org. Retrieved 2022-09-22.

[43] Etzioni, Oren; Banko, Michele; Soderland, Stephen; Weld, Daniel S. (2008-12-01). "Open information extraction from the web". Communications of the ACM. 51 (12): 68–74. doi:10.1145/1409360.1409378. ISSN 0001-0782. S2CID 207169186.

[:3-44] 1 2 Bizer, Dominique Ritze, Oliver Lehmberg, Christian. "Web Data Commons - T2Dv2". webdatacommons.org. Retrieved 2022-07-18.{{cite web}}: CS1 maint: multiple names: authors list (link)

[45] "Semantic Web Challenge on Tabular Data to Knowledge Graph Matching". www.cs.ox.ac.uk. Retrieved 2022-09-30.

[46] "JDK 5.0 Developer's Guide: Annotations". Sun Microsystems. 2007-12-18. Archived from the original on 6 March 2008. Retrieved 2008-03-05..

[47] Characterizing the Usage, Evolution and Impact of Java Annotations in Practice. "Characterizing the Usage, Evolution and Impact of Java Annotations in Practice".

[48] Zhang, D.; Islam, M.M.; Lu, G. (2012). "A review on automatic image annotation techniques". Pattern Recognition. 45 (1): 346–362. Bibcode:2012PatRe..45..346Z. doi:10.1016/j.patcog.2011.05.013.

[49] "Medical Definition of Genome annotation". MedicineNet. Retrieved 2021-09-09.

[50] Pelka, Obioma; Nensa, Felix; Friedrich, Christoph M. (2018-11-12). "Annotation of enhanced radiographs for medical image retrieval with deep convolutional neural networks". PLOS ONE. 13 (11): e0206229. Bibcode:2018PLoSO..1306229P. doi: 10.1371/journal.pone.0206229 . ISSN 1932-6203. PMC 6231616 . PMID 30419028.

[51] Wyner, Adam; Peters, Wim; Katz, Daniel (2013). "A Case Study on Legal Case Annotation". In Ashley, Kevin D. (ed.). Legal Knowledge and Information Systems. Frontiers in Artificial Intelligence and Applications. Vol. 259. Amsterdam: IOS Press. pp. 165–174. doi:10.3233/978-1-61499-359-9-165. ISBN 978-1-61499-359-9.

[52] "Annotation Schemes | CAWSE" (PDF). General Annotation Conventions. 2017-02-05. Retrieved 2019-01-06.

[53] "LinguisticAnnotation". annotation.exmaralda.org.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

Authority control databases
National	Germany Czech Republic
Other	NARA