Developer(s) | The Lexical Systems Group |
---|---|
Initial release | 2002 |
Stable release | lvg2014 / December 13, 2013 |
Written in | Java |
Platform | Java SE |
Type | Lexical semantics |
License | NLM copyright and terms of use |
Lexical Variant Generation (lvg) is a suite of CLI tools that are used to perform lexical transformations to text. The goal is to generate lexical variants in Natural language processing of patient clinical documents. [1]
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.
Lex is a computer program that generates lexical analyzers.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.
Doxygen is a documentation generator and static analysis tool for software source trees. When used as a documentation generator, Doxygen extracts information from specially-formatted comments within the code. When used for analysis, Doxygen uses its parse tree to generate diagrams and charts of the code structure. Doxygen can cross reference documentation and code, so that the reader of a document can easily refer to the actual code.
MedlinePlus is an online information service produced by the United States National Library of Medicine. The service provides curated consumer health information in English and Spanish with select content in additional languages. The site brings together information from the National Library of Medicine (NLM), the National Institutes of Health (NIH), other U.S. government agencies, and health-related organizations. There is also a site optimized for display on mobile devices, in both English and Spanish. In 2015, about 400 million people from around the world used MedlinePlus. The service is funded by the NLM and is free to users.
Haplogroup W is a human mitochondrial DNA (mtDNA) haplogroup.
The American Association for Medical Systems and Informatics (AAMSI) was an organization created to encourage improvements in the state of medical care by encouraging the development of computer systems for that field.
Health Sciences Online (HSO) is a non-profit online health information resource that launched in December 2008. The website hosts a virtual learning center providing weblinks to a collection of more than 50,000 courses, references, textbooks, guidelines, lectures, presentations, cases, articles, images and videos, available in 42 different languages. The content includes medicine, public health, nursing, pharmacy, dentistry, nutrition, kinesiology and other health sciences resources.
V. Mohan is an Indian physician/scientist specializing in diabetology. He is the Chairman of Dr. Mohan’s Diabetes Specialities Centre, which is an IDF Centre of Excellence in Diabetes Care. He is also the Chairman of the Madras Diabetes Research Foundation in Chennai which is an ICMR Centre for Advanced Research on Diabetes.
The Journal Article Tag Suite (JATS) is an XML format used to describe scientific literature published online. It is a technical standard developed by the National Information Standards Organization (NISO) and approved by the American National Standards Institute with the code Z39.96-2012.
Family with sequence similarity 63, member A is a protein that, is encoded by the FAM63A gene in humans,. It is located on the minus strand of chromosome 1 at locus 1q21.3.
KIAA0753 is a protein that in humans is encoded by the gene KIAA0753. The gene is located on chromosome 17p13.1, on the reverse strand spanning bases 6578141 to 6641744. The KIAA0753 gene contains 18 exons, 19 introns, and has no known aliases.
PRR29 is a protein encoded by the PRR29 gene located in humans on chromosome 17 at 17q23.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
Antenna House Formatter is a proprietary software program that uses either XSL-FO or Cascading Style Sheets (CSS) to convert XML and HTML documents into PDF, SVG, PostScript, XPS, text, and Microsoft Word formats. It supports 30 scripts and over 80 languages.
Transmembrane protein 125 is a protein that, in humans, is encoded by the TMEM125 gene. It has 4 transmembrane domains and is expressed in the lungs, thyroid, pancreas, intestines, spinal cord, and brain. Though its function is currently poorly understood by the scientific community, research indicates it may be involved in colorectal and lung cancer networks. Additionally, it was identified as a cell adhesion molecule in oligodendrocytes, suggesting it may play a role in neuron myelination.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.
Chromosome 4 open reading frame 50 is a protein that in humans is encoded by the C4orf50 gene. The protein localizes in the nucleus. C4orf50 has orthologs in vertebrates but not invertebrates
THAP domain-containing protein 3 (THAP3) is a protein that, in Homo sapiens (humans), is encoded by the THAP3 gene. The THAP3 protein is as known as MGC33488, LOC90326, and THAP domain-containing, apoptosis associated protein 3. This protein contains the Thanatos-associated protein (THAP) domain and a host-cell factor 1C binding motif. These domains allow THAP3 to influence a variety of processes, including transcription and neuronal development. THAP3 is ubiquitously expressed in H. sapiens, though expression is highest in the kidneys.