Logic learning machine

Last updated

Logic learning machine (LLM) is a machine learning method based on the generation of intelligible rules. LLM is an efficient implementation of the Switching Neural Network (SNN) paradigm, [1] developed by Marco Muselli, Senior Researcher at the Italian National Research Council CNR-IEIIT in Genoa.

Contents

LLM has been employed in many different sectors, including the field of medicine (orthopedic patient classification, [2] DNA micro-array analysis [3] and Clinical Decision Support Systems [4] ), financial services and supply chain management.

History

The Switching Neural Network approach was developed in the 1990s to overcome the drawbacks of the most commonly used machine learning methods. In particular, black box methods, such as multilayer perceptron and support vector machine, had good accuracy but could not provide deep insight into the studied phenomenon. On the other hand, decision trees were able to describe the phenomenon but often lacked accuracy. Switching Neural Networks made use of Boolean algebra to build sets of intelligible rules able to obtain very good performance. In 2014, an efficient version of Switching Neural Network was developed and implemented in the Rulex suite with the name Logic Learning Machine. [5] Also, an LLM version devoted to regression problems was developed.

General

Like other machine learning methods, LLM uses data to build a model able to perform a good forecast about future behaviors. LLM starts from a table including a target variable (output) and some inputs and generates a set of rules that return the output value corresponding to a given configuration of inputs. A rule is written in the form:

ifpremisethenconsequence

where consequence contains the output value whereas premise includes one or more conditions on the inputs. According to the input type, conditions can have different forms:

A possible rule is therefore in the form

if AND AND then

Types

According to the output type, different versions of the Logic Learning Machine have been developed:

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> Paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Multiplexer</span> A device that selects between several analog or digital input signals

In electronics, a multiplexer, also known as a data selector, is a device that selects between several analog or digital input signals and forwards the selected input to a single output line. The selection is directed by a separate set of digital inputs known as select lines. A multiplexer of inputs has select lines, which are used to select which input line to send to the output.

Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, where a small portion of the data is tagged, and self-supervision. Some researchers consider self-supervised learning a form of unsupervised learning.

<span class="mw-page-title-main">Artificial neuron</span> Mathematical function conceived as a crude model

An artificial neuron is a mathematical function conceived as a model of biological neurons in a neural network. Artificial neurons are the elementary units of artificial neural networks. The artificial neuron is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output.

The information bottleneck method is a technique in information theory introduced by Naftali Tishby, Fernando C. Pereira, and William Bialek. It is designed for finding the best tradeoff between accuracy and complexity (compression) when summarizing a random variable X, given a joint probability distribution p(X,Y) between X and an observed relevant variable Y - and self-described as providing "a surprisingly rich framework for discussing a variety of problems in signal processing and learning".

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following Jebara (2004):

  1. A generative model is a statistical model of the joint probability distribution on a given observable variable X and target variable Y; A generative model can be used to "generate" random instances (outcomes) of an observation x.
  2. A discriminative model is a model of the conditional probability of the target Y, given an observation x. It can be used to "discriminate" the value of the target variable Y, given an observation x.
  3. Classifiers computed without using a probability model are also referred to loosely as "discriminative".

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

Predictive learning is a machine learning technique where an artificial intelligence model is fed new data to develop an understanding of its environment, capabilities, and limitations. The fields of neuroscience, business, robotics, computer vision, and other fields employ this technique extensively. This concept was developed and expanded by French computer scientist Yann LeCun in 1988 during his career at Bell Labs, where he trained models to detect handwriting so that financial companies could automate check processing.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

There are many types of artificial neural networks (ANN).

Extension neural network is a pattern recognition method found by M. H. Wang and C. P. Hung in 2003 to classify instances of data sets. Extension neural network is composed of artificial neural network and extension theory concepts. It uses the fast and adaptive learning capability of neural network and correlation estimation property of extension theory by calculating extension distance.
ENN was used in:

In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference, as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

<span class="mw-page-title-main">Tsetlin machine</span> Artificial intelligence algorithm

A Tsetlin machine is an artificial intelligence algorithm based on propositional logic.

References

  1. Muselli, Marco (2006). "Switching Neural Networks: A new connectionist model for classification" (PDF). WIRN 2005 and NAIS 2005, Lecture Notes on Computer Science. 3931: 23–30.
  2. Mordenti, M.; Ferrari, E.; Pedrini, E.; Fabbri, N.; Campanacci, L.; Muselli, M.; Sangiorgi, L. (2013). "Validation of a New Multiple Osteochondromas Classification Through Switching Neural Networks". American Journal of Medical Genetics Part A. 161 (3): 556–560. doi:10.1002/ajmg.a.35819. PMID   23401177. S2CID   23983960.
  3. Cangelosi, D.; Muselli, M.; Blengio, F.; Becherini, P.; Versteeg, R.; Conte, M.; Varesio, L. (2013). "Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients". Bits2013. 15 (Suppl 5): S4. doi: 10.1186/1471-2105-15-S5-S4 . PMC   4095004 . PMID   25078098.
  4. Parodi, S.; Filiberti, R.; Marroni, P.; Montani, E.; Muselli, M. (2014). "Differential diagnosis of pleural mesothelioma using Logic Learning Machine". Bits2014. 16 (Suppl 9): S3. doi: 10.1186/1471-2105-16-S9-S3 . PMC   4464205 . PMID   26051106.
  5. "Rulex: a software for knowledge extraction from data". Italian National Research Council. Retrieved 7 March 2015.