SYBYL line notation

Last updated
sybyl line notation
Filename extension
.sln
Type of format chemical file format

The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SLN differs from SMILES in several significant ways. SLN can specify molecules, molecular queries, and reactions in a single line notation whereas SMILES handles these through language extensions. SLN has support for relative stereochemistry, it can distinguish mixtures of enantiomers from pure molecules with pure but unresolved stereochemistry. In SMILES aromaticity is considered to be a property of both atoms and bonds whereas in SLN it is a property of bonds.

Contents

Description

Like SMILES, SLN is a linear language that describes molecules. This provides a lot of similarities with SMILES despite SLN's many differences from SMILES, and as a result, this description will heavily compare SLN to SMILES and its extensions.

Attributes

Attributes, bracketed strings with additional data like [key1=value1, key2...], is a core feature of SLN. Attributes can be applied to atoms and bonds. Attributes not defined officially are available to users for private extensions.

When searching for molecules, comparison operators such as fcharge>-0.125 can be used in place of the usual equal sign. A ! preceding a key/value group inverts the result of the comparison.

Entire molecules or reactions can too have attributes. The square brackets are changed to a pair of <> signs.

Atoms

Anything that starts with an uppercase letter identifies an atom in SLN. Hydrogens are not automatically added, but the single bonds with hydrogen can be abbreviated for organic compounds, resulting in CH4 instead of C(H)(H)(H)H for methane. The author argues that explicit hydrogens allow for more robust parsing.

Attributes defined for atoms include I= for isotope mass number, charge= for formal charge, fcharge for partial charge, s= for stereochemistry, and spin= for radicals (s, d, t respectively for singlet, doublet, triplet). A formal charge of charge=2 can be abbreviated as +2, and vice versa for negative charges; - and + is additionally recognized as −1 or +1 charges. * is a shorthand for spin=d. Stereochemistry on atoms is mostly tetrahedral, with the R/S and D/L available among others; it can be explicit (E) or relative (R), or specify a mixture (M) of stereoisomers at this atom. A normal/inverted (N/I) notation, equivalent to @@ and @ in SMILES, is provided. A lot of additional attributes are provided for searching.

In addition to elemental atoms SLN supports the specification of wild card atoms: Any (match any atom), and Hev (match any heavy atom). It also has an extensive Markush syntax for specifying combinatorial libraries and RGROUP queries. SLN has several query atom types for matching groups of atoms. Each type has the group name, followed by an optional positive integer.

GroupDescription
RUsed to match a side chain. Matched atoms must not have any connection to the core
XUsed to match side chains and rings. Atoms matching an X group can match side chains and rings
RxMatches side chains and rings, a ring closure must match a second Rx group

The "0" mass number denotes the usual isotope, so N[I=0] equals N[I=14] matching 14N and N[!I=0] matching every other isotope.

Bonds

SLN uses largely the same bonding notation as SMILES, with -, =, #, and : for single, double, triple, and aromatic bonds. . is used for zero-order bonds, similarly to reaction SMILES, although a + is preferred for distinct molecules.

Most single bonds are implicit, so CH3CH3(CH3CH3) can be used instead of CH3-CH3(CH3–CH3) for ethane. Explicit single bonds are useful for three-center bonds.

The s= attribute is defined for double bonds, to convey stereochemistry information in EZ (E/Z) or cistrans (c/t) notation. N/I is available and stands for the "main" chain, which is trans or cis to each other.

Rings

SLN writes rings in a more explicit pattern than SMILES, with benzene specified as C[1]H:CH:CH:CH:CH:CH:@1. An atom is tagged as an anchor on the ring with a single numeric attribute, and @1 can then be used to specify this (in our case, "number one") atom for bonding back to.

Branching

SLN branches are identical to SMILES branches, with parentheses specifying them. Propionic acid is CH3CH2C(=O)OH().

Reactions

SLN supports reactions with -> connecting the reactants and the products. Atom mapping is possible with the use of [#num] attributes. The reaction center (rc) attribute can be added to bonds, and the chiral conversion (cc) attribute to atoms.

Misc.

Multiple lines can be merged into a syntactical line by writing a \ (backslash) at the end of each line. This allows for breaking a long line into multiple lines, for example in a reaction with each molecule on its own line.

See also

Related Research Articles

<span class="mw-page-title-main">Alkene</span> Hydrocarbon compound containing one or more C=C bonds

In organic chemistry, an alkene, or olefin, is a hydrocarbon containing a carbon–carbon double bond. The double bond may be internal or in the terminal position. Terminal alkenes are also known as α-olefins.

In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule, using chemical element symbols, numbers, and sometimes also other symbols, such as parentheses, dashes, brackets, commas and plus (+) and minus (−) signs. These are limited to a single typographic line of symbols, which may include subscripts and superscripts. A chemical formula is not a chemical name since it does not contain any words. Although a chemical formula may imply certain simple chemical structures, it is not the same as a full chemical structural formula. Chemical formulae can fully specify the structure of only the simplest of molecules and chemical substances, and are generally more limited in power than chemical names and structural formulae.

In chemistry, a structural isomer of a compound is another compound whose molecule has the same number of atoms of each element, but with logically distinct bonds between them. The term metamer was formerly used for the same concept.

<span class="mw-page-title-main">Simplified molecular-input line-entry system</span> Chemical species structure notation

The simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

<span class="mw-page-title-main">Structural formula</span> Graphic representation of a molecular structure

The structural formula of a chemical compound is a graphic representation of the molecular structure, showing how the atoms are possibly arranged in the real three-dimensional space. The chemical bonding within the molecule is also shown, either explicitly or implicitly. Unlike other chemical formula types, which have a limited number of symbols and are capable of only limited descriptive power, structural formulas provide a more complete geometric representation of the molecular structure. For example, many chemical compounds exist in different isomeric forms, which have different enantiomeric structures but the same molecular formula. There are multiple types of ways to draw these structural formulas such as: Lewis Structures, condensed formulas, skeletal formulas, Newman projections, Cyclohexane conformations, Haworth projections, and Fischer projections.

<span class="mw-page-title-main">Lewis structure</span> Diagrams for the bonding between atoms of a molecule and lone pairs of electrons

Lewis structures – also called Lewis dot formulas, Lewis dot structures, electron dot structures, or Lewis electron dot structures (LEDs) – are diagrams that show the bonding between atoms of a molecule, as well as the lone pairs of electrons that may exist in the molecule. A Lewis structure can be drawn for any covalently bonded molecule, as well as coordination compounds. The Lewis structure was named after Gilbert N. Lewis, who introduced it in his 1916 article The Atom and the Molecule. Lewis structures extend the concept of the electron dot diagram by adding lines between atoms to represent shared pairs in a chemical bond.

A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.

<span class="mw-page-title-main">Skeletal formula</span> Representation method in chemistry

The skeletal formula, line-angle formula, or shorthand formula of an organic compound is a type of molecular structural formula that serves as a shorthand representation of a molecule's bonding and some details of its molecular geometry. A skeletal formula shows the skeletal structure or skeleton of a molecule, which is composed of the skeletal atoms that make up the molecule. It is represented in two dimensions, as on a piece of paper. It employs certain conventions to represent carbon and hydrogen atoms, which are the most common in organic chemistry.

In organic chemistry, a methine group or methine bridge is a trivalent functional group =CH−, derived formally from methane. It consists of a carbon atom bound by two single bonds and one double bond, where one of the single bonds is to a hydrogen. The group is also called methyne or methene, but its IUPAC systematic name is methylylidene or methanylylidene.

A chemical file format is a type of data file which is used specifically for depicting molecular data. One of the most widely used is the chemical table file format, which is similar to Structure Data Format (SDF) files. They are text files that represent multiple chemical structure records and associated data fields. The XYZ file format is a simple format that usually gives the number of atoms in the first line, a comment on the second, followed by a number of lines with atomic symbols and cartesian coordinates. The Protein Data Bank Format is commonly used for proteins but is also used for other types of molecules. There are many other types which are detailed below. Various software systems are available to convert from one format to another.

Chemical table file is a family of text-based chemical file formats that describe molecules and chemical reactions. One format, for example, lists each atom in a molecule, the x-y-z coordinates of that atom, and the bonds among the atoms.

<span class="mw-page-title-main">Conformational isomerism</span> Different molecular structures formed only by rotation about single bonds

In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted just by rotations about formally single bonds. While any two arrangements of atoms in a molecule that differ by rotation about single bonds can be referred to as different conformations, conformations that correspond to local minima on the potential energy surface are specifically called conformational isomers or conformers. Conformations that correspond to local maxima on the energy surface are the transition states between the local-minimum conformational isomers. Rotations about single bonds involve overcoming a rotational energy barrier to interconvert one conformer to another. If the energy barrier is low, there is free rotation and a sample of the compound exists as a rapidly equilibrating mixture of multiple conformers; if the energy barrier is high enough then there is restricted rotation, a molecule may exist for a relatively long time period as a stable rotational isomer or rotamer. When the time scale for interconversion is long enough for isolation of individual rotamers, the isomers are termed atropisomers. The ring-flip of substituted cyclohexanes constitutes another common form of conformational isomerism.

The International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. Initially developed by the International Union of Pure and Applied Chemistry (IUPAC) and National Institute of Standards and Technology (NIST) from 2000 to 2005, the format and algorithms are non-proprietary. Since May 2009, it has been developed by the InChI Trust, a nonprofit charity from the United Kingdom which works to implement and promote the use of InChI.

<span class="mw-page-title-main">Proton nuclear magnetic resonance</span> NMR via protons, hydrogen-1 nuclei

Proton nuclear magnetic resonance is the application of nuclear magnetic resonance in NMR spectroscopy with respect to hydrogen-1 nuclei within the molecules of a substance, in order to determine the structure of its molecules. In samples where natural hydrogen (H) is used, practically all the hydrogen consists of the isotope 1H.

<span class="mw-page-title-main">ISIS/Draw</span>

ISIS/Draw was a chemical structure drawing program developed by MDL Information Systems. It introduced a number of file formats for the storage of chemical information that have become industry standards.

In chemistry, a reaction intermediate, or intermediate, is a molecular entity arising within the sequence of a stepwise chemical reaction. It is formed as the reaction product of an elementary step, from the reactants and/or preceding intermediates, but is consumed in a later step. It does not appear in the chemical equation for the overall reaction.

<span class="mw-page-title-main">Methylidyne radical</span> Chemical compound

Methylidyne, or (unsubstituted) carbyne, is an organic compound whose molecule consists of a single hydrogen atom bonded to a carbon atom. It is the parent compound of the carbynes, which can be seen as obtained from it by substitution of other functional groups for the hydrogen.

Line notation is a typographical notation system using ASCII characters, most often used for chemical nomenclature.

SMILES arbitrary target specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.

<span class="mw-page-title-main">Isomer</span> Chemical compounds with the same molecular formula but different atomic arrangements

In chemistry, isomers are molecules or polyatomic ions with identical molecular formula – that is, same number of atoms of each element – but distinct arrangements of atoms in space. Diamond and graphite are a familiar example; they are isomers of carbon. Isomerism refers to the existence or possibility of isomers.

References