QLattice

Last updated
QLattice
Developer(s) Abzu
Initial releaseMarch 4, 2020;3 years ago (2020-03-04)
Written in C, Python
Operating system Linux, macOS, Windows
Type Machine learning
License CC BY-NC-ND 4.0
Website docs.abzu.ai

The QLattice is a software library which provides a framework for symbolic regression in Python. It works on Linux, Windows, and macOS. The QLattice algorithm is developed by the Danish/Spanish AI research company Abzu. [1] Since its creation, the QLattice has attracted significant attention, mainly for the inherent explainability of the models it produces. [2] [3] [4]

Contents

At the GECCO conference in Boston, MA in July 2022, the QLattice was announced as the winner of the synthetic track of the SRBench competition. [5]

Features

The QLattice works with data in categorical and numeric format. It allows the user to quickly generate, plot and inspect mathematical formulae that can potentially explain the generating process of the data. It is designed for easy interaction with the researcher, allowing the user to guide the search based on their preexisting knowledge. [2] [6]

Scientific results

The QLattice mainly targets scientists, and integrates well with the scientific workflow. [2] [6] It has been used in research into many different areas, such as energy consumption in buildings, [3] water potability, [7] heart failure, [8] pre-eclampsia, [4] Alzheimer's disease, [9] hepatocellular carcinoma, [9] and breast cancer. [9]

See also

Related Research Articles

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">Orange (software)</span>

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression.

Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor.

<span class="mw-page-title-main">Biomolecular structure</span> 3D conformation of a biological sequence, like DNA, RNA, proteins

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

<span class="mw-page-title-main">BALL</span>

BALL is a C++ class framework and set of algorithms and data structures for molecular modelling and computational structural bioinformatics, a Python interface to this library, and a graphical user interface to BALL, the molecule viewer BALLView.

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on the other.

In the fields of computational chemistry and molecular modelling, scoring functions are mathematical functions used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor. Scoring functions have also been developed to predict the strength of intermolecular interactions between two proteins or between protein and DNA.

In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

Biology data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

Mathieu Daniel Blanchette is a computational biologist and Director of the School of Computer Science at McGill University. His research focuses on developing new algorithms for the detection of functional regions in DNA sequences.

<span class="mw-page-title-main">Symbolic regression</span> Type of regression analysis

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.

Cell-based models are mathematical models that represent biological cells as discrete entities. Within the field of computational biology they are often simply called agent-based models of which they are a specific application and they are used for simulating the biomechanics of multicellular structures such as tissues. to study the influence of these behaviors on how tissues are organised in time and space. Their main advantage is the easy integration of cell level processes such as cell division, intracellular processes and single-cell variability within a cell population.

Explainable AI (XAI), often overlapping with Interpretable AI, or Explainable Machine Learning (XML), either refers to an AI system over which it is possible for humans to retain intellectual oversight, or to the methods to achieve this. The main focus is usually on the reasoning behind the decisions or predictions made by the AI which are made more understandable and transparent. XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data.

libRoadRunner is a C/C++ software library that supports simulation of SBML based models.. It uses LLVM to generate extremely high-performance code and is the fastest SBML-based simulator currently available. Its main purpose is for use as a reusable library that can be hosted by other applications, particularly on large compute clusters for doing parameter optimization where performance is critical. It also has a set of Python bindings that allow it to be easily used from Python.

References

  1. Kevin René Broløs; Meera Vieira Machado; Chris Cave; Jaan Kasak; Valdemar Stentoft-Hansen; Victor Galindo Batanero; Tom Jelen; Casper Wilstrup (2021-04-12). "An Approach to Symbolic Regression Using Feyn". arXiv: 2104.05417 [cs.LG].
  2. 1 2 3 Abzu (2022-07-22). "What is a QLattice?".
  3. 1 2 Wenninger, Simon; Kaymakci, Can; Wiethe, Christian (2022). "Explainable long-term building energy consumption prediction using QLattice". Applied Energy. Elsevier BV. 308: 118300. doi:10.1016/j.apenergy.2021.118300. ISSN   0306-2619. S2CID   245428233.
  4. 1 2 Wilstrup, Casper; Hedley, Paula L.; Rode, Line; Placing, Sophie; Wøjdemann, Karen R.; Shalmi, Anne-Cathrine; Sundberg, Karin; Christiansen, Michael (2022-06-30), Symbolic regression analysis of interactions between first trimester maternal serum adipokines in pregnancies which develop pre-eclampsia, Cold Spring Harbor Laboratory, doi:10.1101/2022.06.29.22277072, S2CID   250331945
  5. Michael Kommenda; William La Cava; Maimuna Majumder; Fabricio Olivetti de França; Marco Virgolin (2022-07-22). "SRBench Competition 2022: Interpretable Symbolic Regression for Data Science".
  6. 1 2 Bharadi, Vinayak (2021-07-30). "QLattice Environment and Feyn QGraph Models—A New Perspective Toward Deep Learning". Emerging Technologies for Healthcare. Wiley. pp. 69–92. doi:10.1002/9781119792345.ch3. ISBN   9781119792345. S2CID   238793347.
  7. Riyantoko, Prismahardi Aji; Diyasa, I Gede Susrama Mas (2021-10-28). "F.Q.A.M" Feyn-QLattice Automation Modelling: Python Module of Machine Learning for Data Classification in Water Potability. IEEE. pp. 135–141. doi:10.1109/icimcis53775.2021.9699371. ISBN   978-1-6654-2733-3.
  8. Wilstup, Casper; Cave, Chris (2021-01-15), Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths, Cold Spring Harbor Laboratory, doi:10.1101/2021.01.15.21249874, S2CID   231609904
  9. 1 2 3 Christensen, Niels Johan; Demharter, Samuel; Machado, Meera; Pedersen, Lykke; Salvatore, Marco; Stentoft-Hansen, Valdemar; Iglesias, Miquel Triana (2022-06-22). "Identifying interactions in omics data for clinical biomarker discovery using symbolic regression". Bioinformatics. Oxford University Press (OUP). 38 (15): 3749–3758. doi:10.1093/bioinformatics/btac405. ISSN   1367-4803. PMC   9344843 . PMID   35731214.