Antonio Zamora

Last updated

Antonio Zamora
Antonio Zamora-2018.jpg
Zamora in 2018
Alma mater
Scientific career
Fields
Institutions IBM

Antonio Zamora is a consultant in the fields of computer programming, chemical information science, and computational linguistics who worked on chemical search systems and automatic spelling correction algorithms.[ citation needed ]

Contents

Career

Zamora studied chemistry at the University of Texas (B.S. 1962) and served in the U.S. Army during the Vietnam Era from 1962 to 1965. He studied medical technology at the Medical Field Service School (MFSS) in Fort Sam Houston and worked in hematology at Brooke Army Medical Center.[ citation needed ]

After concluding his military service, he worked at Chemical Abstracts Service (CAS) in Columbus, Ohio as an editor of one of the first computer-produced publications in the United States. While working for CAS, he gained a master's degree in Computer Science from Ohio State University (M.S. 1969), and began working in their programming department; eventually, he transferred to the research department where he was able to combine his chemical background with programming.[ citation needed ]

He contributed to the development of a chemical registry system, chemical structure input systems, devised an algorithm for determining the Smallest Set of Smallest Rings (SSSR), [1] a cheminformatics term for the minimal cycle basis of a molecular graph, experimental automatic abstracting, indexing programs, and spelling aid algorithms. [2] [3]

In 1982 he joined IBM Corporation as a senior programmer working on spell checkers and multilingual information retrieval tools. After his retirement from IBM in 1996, Zamora established Zamora Consulting, LLC [4] and worked as a consultant for the American Chemical Society (ACS), the National Library of Medicine (NLM), and the US Department of Energy (DOE) to support semantic enhancements for search engines. [5]

Post-retirement

In his retirement Zamora has also self-published a science fiction book, [6] and several small books while investigating the Carolina bays; [7] [8] [9] in his 2017 paper "A model for the geomorphology of the Carolina Bays" he proposed that the "Carolina Bays are the remodeled remains of oblique conical craters formed on ground liquefied by the seismic shock waves of secondary impacts of glacier ice boulders ejected by an extraterrestrial impact on the Laurentide Ice Sheet". [10] His research was based on geometrical analysis of the Carolina Bays using Google Earth in combination with LiDAR data. [11] Many other theories have been proposed to account for their formation. [12] [13]

SPEEDCOP project

Zamora carried out pioneering[ citation needed ] research on the automatic spelling correction SPEEDCOP project (SPEIIing Error Detection correction Project); the project was supported by National Science Foundation (NSF) at Chemical Abstracts Service (CAS) and extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. [14]

The purpose of the project was to automatically correct spelling errors, predominantly typing errors, in a database of scientific abstracts. For each word in a dictionary, a key is computed consisting of the first letter, followed by the consonant letters in order of occurrence, followed by the vowel letters in order of occurrence, each letter recorded once only, e.g. inoculation will produce a key INCLTOUA, the keys are sorted in order. The key of each word in the text is compared with the dictionary keys and if no exact match is found it compares with keys either side to find a probable match. The use of the key reduces the portion of the dictionary that has to be searched. [15]

Awards

Publications

Papers

Books

Further reading

Related Research Articles

<span class="mw-page-title-main">Error detection and correction</span> Techniques that enable reliable delivery of digital data over unreliable communication channels

In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

The data link layer, or layer 2, is the second layer of the seven-layer OSI model of computer networking. This layer is the protocol layer that transfers data between nodes on a network segment across the physical layer. The data link layer provides the functional and procedural means to transfer data between network entities and may also provide the means to detect and possibly correct errors that can occur in the physical layer.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

<span class="mw-page-title-main">Coding theory</span> Study of the properties of codes and their fitness

Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.

PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the database as part of the Entrez system of information retrieval.

<span class="mw-page-title-main">Spell checker</span> Software to help correct spelling errors

In software, a spell checker is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine.

SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements of the papers are formed, including graphs, diagrams, and citations. Created by scientists at the Massachusetts Institute of Technology, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use.

PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.

<i>Journal of Molecular Biology</i> Academic journal

The Journal of Molecular Biology is a biweekly peer-reviewed scientific journal covering all aspects of molecular biology. It was established in 1959 and is published by Elsevier. The editor-in-chief is Peter Wright.

<i>Journal of Chemical Information and Modeling</i> Academic journal

The Journal of Chemical Information and Modeling is a peer-reviewed scientific journal published by the American Chemical Society. It was established in 1961 as the Journal of Chemical Documentation, renamed in 1975 to Journal of Chemical Information and Computer Sciences, and obtained its current name in 2005. The journal covers the fields of computational chemistry and chemical informatics. The editor-in-chief is Kenneth M. Merz Jr.. The journal supports Open Science approaches.

Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs to cover almost all possible outputs. Use of software testing measures such as code coverage helps increase the chance that an adequate slice of the program's set of possible behaviors has been observed. Also, care must be taken to minimize the effect that instrumentation has on the execution of the target program. Dynamic analysis is in contrast to static program analysis. Unit tests, integration tests, system tests and acceptance tests use dynamic testing.

<i>Journal of Heredity</i> Academic journal

The Journal of Heredity is a peer-reviewed scientific journal concerned with heredity in a biological sense, covering all aspects of genetics. It is published by Oxford University Press on behalf of the American Genetic Association.

<span class="mw-page-title-main">Ensemble learning</span> Statistics and machine learning technique

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

<i>Archives of Biochemistry and Biophysics</i> Academic journal

Archives of Biochemistry and Biophysics is a biweekly peer-reviewed scientific journal that covers research on all aspects of biochemistry and biophysics. It is published by Elsevier and as of 2012, the editors-in-chief are Paul Fitzpatrick, Helmut Sies, Jian-Ping Jin, and Henry Jay Forman.

Frederick J. Damerau was a pioneer of research on natural language processing and data mining.

In linguistic morphology and information retrieval, stemming is the process of reducing inflected words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation.

<i>Radiation Measurements</i> Academic journal

Radiation Measurements is a monthly peer-reviewed scientific journal covering research on nuclear science and radiation physics. It was established in 1994 and is published by Elsevier.

Christopher D Paice was one of the pioneers of research into stemming. The Paice-Husk stemmer was published in 1990 and his method of evaluation of stemmer performance by means of Error Rate with Respect to Truncation (ERRT) was the first direct method of comparing under-stemming and over-stemming errors. Apart from his pioneering work on stemming algorithms and evaluation methods he made other research contributions in the area of Information Retrieval, anaphora resolution and automatic abstracting.

References

  1. Zamora, Antonio (1976). "An Algorithm for Finding the Smallest Set of Smallest Rings". Journal of Chemical Information and Computer Sciences. 16 (1): 40–43. doi:10.1021/ci60005a013.
  2. Zamora, Antonio (1978). "Control of Spelling Errors in Large Data Bases". The Information Age in Perspective, Proc. ASIS. 15: 364–367.
  3. Zamora, Antonio (January 1980). "Automatic detection and correction of spelling errors in a large data base". Journal of the American Society for Information Science. 31 (1): 51–57. doi:10.1002/asi.4630310106. ISSN   0002-8231.
  4. "Zamora Consulting, LLC". zamoraconsulting.com.
  5. "Antonio Zamora". scientificpsychic.com.
  6. Zamora, Antonio (2013). Rise of the Transgenic Queen. Zamora Consulting LLC. ISBN   9780983652359.
  7. Zamora, Antonio (2012). Meteorite Cluster Impacts. Zamora Consulting LLC.
  8. Zamora, Antonio (2014). Killer Comet: What the Carolina Bays tell us. Zamora Consulting LLC. ISBN   978-0983652373.
  9. Zamora, Antonio (2015). Solving the Mystery of the Carolina Bays. Zamora Consulting LLC. ISBN   9780983652397.
  10. Zamora, Antonio (2017). "A model for the geomorphology of the Carolina Bays". Geomorphology. 282: 282: 209–216. Bibcode:2017Geomo.282..209Z. doi:10.1016/j.geomorph.2017.01.019.
  11. Orengo, Hector; Petrie, Cameron (2018). "Multi‐scale relief model (MSRM): a new algorithm for the visualization of subtle topographic change of variable size in digital elevation models". Earth Surface Processes and Landforms. 43 (6): 1361–1369. Bibcode:2018ESPL...43.1361O. doi:10.1002/esp.4317. PMC   6036439 . PMID   30008499.
  12. "The Enigmatic Carolina bays". Cintos Research.
  13. "The Fiery Origins of Carolina Bays". Coastal Review Online. August 23, 2013. Retrieved June 12, 2019.
  14. Pollock, Joseph J.; Zamora, Antonio (1984). "System Design for Detection and Correction of Spelling Errors in Scientific and Scholarly Text". Journal of the American Society for Information Science. 35 (2): 104–9. doi:10.1002/asi.4630350206.
  15. Mitton, Dr Roger. "3. Spellchecking by Computer" (PDF). Journal of the Simplified Spelling Society. Retrieved June 14, 2019.
  16. "Best JASIST Paper Award". Archived from the original on August 13, 2018.
  17. "Antonio Zamora's (AntonioZamora) software portfolio". Devpost.
  18. "NLM Show Off Your Apps: Innovative Uses of NLM Information". NLM Show Off Your Apps: Innovative Uses of NLM Information.