Steven DeRose

Last updated

Steven J DeRose (born 1960) is a computer scientist noted for his contributions to Computational Linguistics and to key standards related to document processing, mostly around ISO's Standard Generalized Markup Language (SGML) and W3C's Extensible Markup Language (XML).

Contents

His contributions include the following:

He served as Chief Scientist of the Scholarly Technology Group, and Adjunct Associate Professor of Computer Science, at Brown University. [2] While there he received NSF and NEH [3] [4] grants and contributed heavily to the Open eBook and Encoded Archival Description standards. Previously, he was co-founder and Chief Scientist at Electronic Book Technologies, Inc., where he designed the first SGML browser (Dynatext), which earned 11 US Patents and won Seybold [5] and other awards.

His 1987 article with James Coombs and Allen Renear, "Markup Systems and the Future of Scholarly Text Processing", is a seminal source for the theory of markup systems, and has been widely cited and reprinted. [6] [7] [8] [9] [10] [11] [12] The article "What is Text, Really?" [13] has also been widely cited and reprinted, [14] and led to several follow-on articles [15] In addition, he has published 2 books (Making Hypermedia Work: A User's Guide to HyTime and The SGML FAQ Book); as well as articles in a variety of journals, magazines, and proceedings.

He has given papers and tutorials at the ACM Hypertext Conference and various SGML and XML conferences, [16] a keynote address at the ACM Conference on Very Large DataBases (VLDB), [17] and a plenary talk at the Text Encoding Initiative 10 Conference. [18]

In Computational Linguistics, he is known [19] for pioneering the use of dynamic programming methods for part-of-speech tagging (DeRose 1988, 1990).

Selected publications

  1. DeRose, Steven J. (1988). "Grammatical category disambiguation by statistical optimization". Computational Linguistics. Vol. 14, no. 1. pp. 31–39.
  2. DeRose, Steven J. (1990). Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages (Thesis). Providence, RI: Brown University Department of Cognitive and Linguistic Sciences. Archived from the original on 2018-08-19. Retrieved 2013-09-30.
  3. DeRose, Steven J. & David G. Durand (1994). Making Hypermedia Work: A User's Guide to HyTime. Kluwer Academic Publishers. ISBN   978-0-7923-9432-7.
  4. DeRose, Steven J (1997). The SGML FAQ Book. Kluwer Academic Publishers. ISBN   978-0-7923-9943-8.

Related Research Articles

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.

<span class="mw-page-title-main">Text Encoding Initiative</span> Academic community concerned with text encoding

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains the TEI technical standard, a journal, a wiki, a GitHub repository and a toolchain.

HyTime is a markup language that is an application of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hypertext and multimedia presentations in a standardized way.

Open Scripture Information Standard (OSIS) is an XML application, that defines tags for marking up Bibles, theological commentaries, and other related literature.

<span class="mw-page-title-main">Michael Sperberg-McQueen</span> American computer programmer

C. Michael Sperberg-McQueen is an American markup language specialist. He was co-editor of the Extensible Markup Language (XML) 1.0 spec (1998), and chair of the XML Schema working group.

DynaText is an SGML publishing tool. It was introduced in 1990, and was the first system to handle arbitrarily large SGML documents, and to render them according to multiple style-sheets that could be switched at will.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

A structured document is an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.

MECS is the Multi-Element Code System, a markup system developed by the Wittgenstein Archives at the University of Bergen. It is very similar to SGML and XML except that it allows elements to overlap.

<span class="mw-page-title-main">History of hypertext</span>

Hypertext is text displayed on a computer or other electronic device with references (hyperlinks) to other text that the reader can immediately access, usually by a mouse click or keypress sequence. Early conceptions of hypertext defined it as text that could be connected by a linking system to a range of other documents that were stored outside that text. In 1934 Belgian bibliographer, Paul Otlet, developed a blueprint for links that telescoped out from hypertext electrically to allow readers to access documents, books, photographs, and so on, stored anywhere in the world.

<span class="mw-page-title-main">European Association for Digital Humanities</span>

The European Association for Digital Humanities (EADH), formerly known as the Association for Literary and Linguistic Computing (ALLC), is a digital humanities organisation founded in London in 1973. Its purpose is to promote the advancement of education in the digital humanities through the development and use of computational methods in research and teaching in the Humanities and related disciplines, especially literary and linguistic computing. In 2005, the Association joined the Alliance of Digital Humanities Organizations (ADHO).

The Association for Computers and the Humanities (ACH) is the primary international professional society for digital humanities. ACH was founded in 1978. According to the official website, the organization "support[s] and disseminate[s] research and cultivate[s] a vibrant professional community through conferences, publications, and outreach activities." ACH is based in the United States, and has an international membership. ACH is a founding member of the Alliance of Digital Humanities Organizations (ADHO), a co-originator of the Text Encoding Initiative, and a co-sponsor of an annual conference.

COCOA was an early text file utility and associated file format for digital humanities, then known as humanities computing. It was approximately 4000 punched cards of FORTRAN and created in the late 1960s and early 1970s at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.

In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.

Susan Hockey is an Emeritus Professor of Library and Information Studies at University College London. She has written about the history of digital humanities, the development of text analysis applications, electronic textual mark-up, teaching computing in the humanities, and the role of libraries in managing digital resources. In 2014, University College London created a Digital Humanities lecture series in her honour.

Lou Burnard is an internationally recognised expert in digital humanities, particularly in the area of text encoding and digital libraries. He was assistant director of Oxford University Computing Services (OUCS) from 2001 to September 2010, when he officially retired from OUCS. Before that, he was manager of the Humanities Computing Unit at OUCS for five years. He has worked in ICT support for research in the humanities since the 1990s. He was one of the founding editors of the Text Encoding Initiative (TEI) and continues to play an active part in its maintenance and development, as a consultant to the TEI Technical Council and as an elected TEI board member. He has played a key role in the establishment of many other activities and initiatives in this area, such as the UK Arts and Humanities Data Service and the British National Corpus, and has published and lectured widely. Since 2008 he has worked as a Member of the Conseil Scientifique for the CNRS-funded "Adonis" TGE.

References

  1. "XML Linking Language (XLink) Version 1.0". W3C. 27 June 2001.
  2. "Scholarly Technology Group, Staff Alumni pages". Archived from the original on 2013-10-02.
  3. NEH Grant number: PA-23769-01, 5/1/2001 – 4/30/2004. "Converting Text Encoding Initiative Guidelines and Documentation into the XML Format [TEI]". securegrants.neh.gov. Retrieved 2023-02-18.
  4. John Unsworth. "NEH grant". Posting to tei-council list. Sun Jan 6 13:16:27 EST 2002
  5. Seybold Publications (March 25, 1996). "Seybold Seminars Boston '96 [February 27 - March 1, 1996. Boston, MA]. Part I. Seybold Seminars Boston '96: When Worlds Collide". xml.coverpages.org. Seybold Special Report. 4 (8). ISSN   1069-7217 . Retrieved 2023-02-18.
  6. Robin Cover (11 July 1997). "SGML Bibliography 1994".
  7. Tim Bray (11 July 1997). "On Semantics and Markup".
  8. Susan Hockey (chair) (9–13 June 1999). Panel: What is text? A debate on the philosophical and epistemological nature of text in the light of humanities computing research. ACH-ALLC '99 International Humanities Computing Conference. Charlottesville, Virginia.
  9. Toby Burrows (2004). The Text in the Machine: Electronic Texts in the Humanities. Binghamton NY: University of Washington Press. p. 4. ISBN   978-0-7890-0424-6.
  10. Raimonda Modiano; Leroy Searle; Peter L. Shillingsburg, eds. (1999). Voice, Text, Hypertext: Emerging Practices in Textual Studies. Binghamton NY: Haworth Press. p. 375. ISBN   978-0-295-98305-9.
  11. Robin Cover (21 May 1990). "New Reading on Text".
  12. "Archived copy" (PDF). Archived from the original (PDF) on 2011-07-22. Retrieved 2010-04-13.{{cite web}}: CS1 maint: archived copy as title (link)Angelo Di Iorio (March 2007), Pattern-based Segmentation of Digital Documents: Model and Implementation, vol. Technical Report UBLCS-2007-5, University of Bologna, Mura Anteo Zamboni 7 40127 Bologna (Italy): Department of Computer Science{{citation}}: CS1 maint: location (link)
  13. Steven J. DeRose; David G. Durand; Elli Mylonas & Allen H. Renear (1997), "What is text, really?", Journal of Computing in Higher Education, 1 (2): 3–26.
  14. Steven J. DeRose; David G. Durand; Elli Mylonas & Allen H. Renear (August 1997), "What is text, really?", SIGDOC, J. Comput. Doc., 21 (3): 1–24, doi:10.1145/264842.264843, S2CID   12419068 (a special issue with multiple articles in response ).
  15. Renear, Allen, Durand, David, Mylonas, Elli (1996), Hockey, Susan, Ide, Nancy (ed.), "Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies", Research in Humanities Computing 4: Selected Papers from the 1992 ALLC/ACH Conference, Oxford: Oxford University Press: 263–280{{citation}}: CS1 maint: multiple names: authors list (link)
  16. Brown University Library. "STG Publications - 1999" Archived 2013-06-19 at the Wayback Machine
  17. 1999 Invited talks at VLDB. Malcolm P. Atkinson; Maria E. Orlowska; Patrick Valduriez; Stanley B. Zdonik; Michael L. Brodie., eds. (1999). Proceedings of the 25th International Conference on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann.
  18. The Relation Between TEI and XML, later published as DeRose, Steven J. (1999). "XML and the TEI". Computers and the Humanities. Vol. 33, no. 1–2. pp. 11–30.
  19. Steven Abney. 1997. "Part-of-speech tagging and partial parsing." In Corpus-Based Methods in Language and Speech. "Archived copy" (PDF). Archived from the original (PDF) on 2003-12-09. Retrieved 2010-12-15.{{cite web}}: CS1 maint: archived copy as title (link). Susan Schreibman, Raymond George Siemens, and John M. Unsworth (eds). 2005. A companion to digital humanities. (Blackwell Companions to Literature and Culture). ISBN   978-1-4051-0321-3.