Steven J DeRose (born 1960) is a computer scientist noted for his contributions to Computational Linguistics and to key standards related to document processing, mostly around ISO's Standard Generalized Markup Language (SGML) and W3C's Extensible Markup Language (XML).
His contributions include the following:
He served as Chief Scientist of the Scholarly Technology Group, and Adjunct Associate Professor of Computer Science, at Brown University. [2] While there he received NSF and NEH [3] [4] grants and contributed heavily to the Open eBook and Encoded Archival Description standards. Previously, he was co-founder and Chief Scientist at Electronic Book Technologies, Inc., where he designed the first SGML browser (Dynatext), which earned 11 US Patents and won Seybold [5] and other awards.
His 1987 article with James Coombs and Allen Renear, "Markup Systems and the Future of Scholarly Text Processing", is a seminal source for the theory of markup systems, and has been widely cited and reprinted. [6] [7] [8] [9] [10] [11] [12] The article "What is Text, Really?" [13] has also been widely cited and reprinted, [14] and led to several follow-on articles [15] In addition, he has published 2 books (Making Hypermedia Work: A User's Guide to HyTime and The SGML FAQ Book); as well as articles in a variety of journals, magazines, and proceedings.
He has given papers and tutorials at the ACM Hypertext Conference and various SGML and XML conferences, [16] a keynote address at the ACM Conference on Very Large DataBases (VLDB), [17] and a plenary talk at the Text Encoding Initiative 10 Conference. [18]
In Computational Linguistics, he is known [19] for pioneering the use of dynamic programming methods for part-of-speech tagging (DeRose 1988, 1990).
A markuplanguage is a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.
The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.
XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains the TEI technical standard, a journal, a wiki, a GitHub repository and a toolchain.
HyTime is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hypertext and multimedia presentations in a standardized way.
Open Scripture Information Standard (OSIS) is an XML application, that defines tags for marking up Bibles, theological commentaries, and other related literature.
C. Michael Sperberg-McQueen is an American markup language specialist. He was co-editor of the Extensible Markup Language (XML) 1.0 spec (1998), and chair of the XML Schema working group.
DynaText is an SGML publishing tool. It was introduced in 1990, and was the first system to handle arbitrarily large SGML documents, and to render them according to multiple style-sheets that could be switched at will.
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
A structured document is an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.
MECS is the Multi-Element Code System, a markup system developed by the Wittgenstein Archives at the University of Bergen. It is very similar to SGML and XML except that it allows elements to overlap.
Hypertext is text displayed on a computer or other electronic device with references (hyperlinks) to other text that the reader can immediately access, usually by a mouse click or keypress sequence. Early conceptions of hypertext defined it as text that could be connected by a linking system to a range of other documents that were stored outside that text. In 1934 Belgian bibliographer, Paul Otlet, developed a blueprint for links that telescoped out from hypertext electrically to allow readers to access documents, books, photographs, and so on, stored anywhere in the world.
The European Association for Digital Humanities (EADH), formerly known as the Association for Literary and Linguistic Computing (ALLC), is a digital humanities organisation founded in London in 1973. Its purpose is to promote the advancement of education in the digital humanities through the development and use of computational methods in research and teaching in the Humanities and related disciplines, especially literary and linguistic computing. In 2005, the Association joined the Alliance of Digital Humanities Organizations (ADHO).
The Association for Computers and the Humanities (ACH) is the primary international professional society for digital humanities. ACH was founded in 1978. According to the official website, the organization "support[s] and disseminate[s] research and cultivate[s] a vibrant professional community through conferences, publications, and outreach activities." ACH is based in the United States, and has an international membership. ACH is a founding member of the Alliance of Digital Humanities Organizations (ADHO), a co-originator of the Text Encoding Initiative, and a co-sponsor of an annual conference.
COCOA was an early text file utility and associated file format for digital humanities, then known as humanities computing. It was approximately 4000 punched cards of FORTRAN and created in the late 1960s and early 1970s at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.
In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.
Susan Hockey is an Emeritus Professor of Library and Information Studies at University College London. She has written about the history of digital humanities, the development of text analysis applications, electronic textual mark-up, teaching computing in the humanities, and the role of libraries in managing digital resources. In 2014, University College London created a Digital Humanities lecture series in her honour.
Lou Burnard is an internationally recognised expert in digital humanities, particularly in the area of text encoding and digital libraries. He was assistant director of Oxford University Computing Services (OUCS) from 2001 to September 2010, when he officially retired from OUCS. Before that, he was manager of the Humanities Computing Unit at OUCS for five years. He has worked in ICT support for research in the humanities since the 1990s. He was one of the founding editors of the Text Encoding Initiative (TEI) and continues to play an active part in its maintenance and development, as a consultant to the TEI Technical Council and as an elected TEI board member. He has played a key role in the establishment of many other activities and initiatives in this area, such as the UK Arts and Humanities Data Service and the British National Corpus, and has published and lectured widely. Since 2008 he has worked as a Member of the Conseil Scientifique for the CNRS-funded "Adonis" TGE.
{{cite web}}
: CS1 maint: archived copy as title (link)Angelo Di Iorio (March 2007), Pattern-based Segmentation of Digital Documents: Model and Implementation, vol. Technical Report UBLCS-2007-5, University of Bologna, Mura Anteo Zamboni 7 40127 Bologna (Italy): Department of Computer Science{{citation}}
: CS1 maint: location (link){{citation}}
: CS1 maint: multiple names: authors list (link){{cite web}}
: CS1 maint: archived copy as title (link). Susan Schreibman, Raymond George Siemens, and John M. Unsworth (eds). 2005. A companion to digital humanities. (Blackwell Companions to Literature and Culture). ISBN 978-1-4051-0321-3.