Predictive learning

Last updated

Predictive learning is a machine learning technique where an artificial intelligence model is fed new data to develop an understanding of its environment, capabilities, and limitations. The fields of neuroscience, business, robotics, computer vision, and other fields employ this technique extensively. This concept was developed and expanded by French computer scientist Yann LeCun in 1988 during his career at Bell Labs, where he trained models to detect handwriting so that financial companies could automate check processing. [1]

Contents

The mathematical foundation for predictive learning dates back to the 17th century, where the British insurance company Lloyd's used predictive analytics models to make a profit. [2] Starting out as a mathematical concept, this concept expanded the possibilities of artificial intelligence. Predictive learning is an attempt to learn with a minimum of pre-existing mental structure. It was inspired by Piaget's account of children constructing knowledge of the world through interaction. Gary Drescher's book 'Made-up Minds' was crucial to the development of this concept. [3]

The idea that predictions and unconscious inference are used by the brain to construct a model of the world, in which it can identify causes of percepts, goes back even further to Hermann von Helmholtz's iteration of this study. Those ideas were later picked up in the field of predictive coding. Another related predictive learning theory is Jeff Hawkins' memory-prediction framework, which is laid out in his book On Intelligence.

Mathematical procedures

Training process

Similar to machine learning, predictive learning aims to extrapolate the value of an unknown dependent variable y, given independent input data x = (x1, x2, … , xn). A set of attributes can be classified into categorical data (immeasurable factors such as race, sex, or affiliation) and numerical data (measurable values such as temperature, annual income, and average speed). Every set of input values is fed into a neural network to predict a value y. In order to predict the output accurately, the weights of the neural network (representing how much each predictor variable affects the outcome) must be incrementally adjusted using stochastic gradient descent to make estimates closer to the actual data.

Once a machine learning model is given enough adjustments and training to predict values closer to the actual values, it should be able to correctly predict outputs of the new data with little error ε, (usually ε < 0.001) compared to the actual data.

Maximizing accuracy

In order to ensure maximum accuracy for a predictive learning model, the predicted values ŷ = F(x), compared to the actual values y, must not exceed a certain error threshold by the risk formula

,

where L represents the loss function, y is the actual data, and F(x) is the predicted data. This error function is then used to make incremental adjustments to the model's weights to eventually reach a well-trained prediction of

. [4]

Even if you continuously train a machine learning model, it is impossible to achieve zero error. But if the error is negligible enough, then the model is said to be converged and future predictions will be accurate a vast majority of the time.

Ensemble learning

In some cases, using a singular machine learning approach is not enough to create an accurate estimate for certain data. Ensemble learning is a combination of several machine learning algorithms to create a more accurate estimate. Each machine learning model is represented by the function

F(x) = a0+ amfm(x),

where M is the number of methods used, a0 is the bias, am is the weight corresponding to each mth variable, and fm(x) is the activation function corresponding to each variable. An ensemble learning model is represented as a linear combination of the predictions from each constituent approach,

m} = arg min L(yi,a0+amfm(xi)) + λ|am|,

where yi is the actual value, the second parameter is the value predicted by each constituent method, and λ is a coefficient representing each model's variation for a certain predictor variable. [4]

Applications

Cognitive development

Dr. Yukie Nagai's predictive learning architecture for predicting sensorimotor signals. Rstb20180030f01.gif
Dr. Yukie Nagai's predictive learning architecture for predicting sensorimotor signals.

Sensorimotor signals are neural impulses sent to the brain upon physical touch. Using predictive learning to detect sensorimotor signals plays a key role in early cognitive development, as the human brain represents sensorimotor signals in a predictive manner, (it attempts to minimize prediction error between incoming sensory signals and top–down prediction). In order to update an unadjusted predictor, it must be trained through sensorimotor experiences because it does not inherently have prediction ability. [5] In a recent research paper, Dr. Yukie Nagai suggested a new architecture in predictive learning to predict sensorimotor signals based on a two-module approach: a sensorimotor system which interacts with the environment and a predictor which simulates the sensorimotor system in the brain. [5]

Spatiotemporal memory

Computers use predictive learning in spatiotemporal memory to completely create an image given constituent frames. This implementation uses predictive recurrent neural networks, which are neural networks designed to work with sequential data, such as a time series. [6] Using predictive learning in conjunction with computer vision enables computers to create images of their own, which can be helpful when replicating sequential phenomena such as replicating DNA strands, face recognition, or even creating X-ray images.

Social media consumer behavior

In a recent study, data on consumer behavior was collected from various social media platforms such as Facebook, Twitter, LinkedIn, YouTube, Instagram, and Pinterest. The usage of predictive learning analytics led researchers to discover various trends in consumer behavior, such as determining how successful a campaign could be, estimating a fair price for a product to attract consumers, assessing how secure data is, and analyzing the specific audience of the consumers they could target for specific products. [7]

See also

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Pattern recognition</span> Automated recognition of patterns and regularities in data

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.

<span class="mw-page-title-main">Random forest</span> Binary search tree based ensemble machine learning method


Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

For supervised learning applications in machine learning and statistical learning theory, generalization error is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is visualized by plots that show values of estimates of the generalization error through the learning process, which are called learning curves.

Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models.

Demand forecasting refers to the process of predicting the quantity of goods and services that will be demanded by consumers at a future point in time. More specifically, the methods of demand forecasting entail using predictive analytics to estimate customer demand in consideration of key economic conditions. This is an important tool in optimizing business profitability through efficient supply chain management. Demand forecasting methods are divided into two major categories, qualitative and quantitative methods. Qualitative methods are based on expert opinion and information gathered from the field. This method is mostly used in situations when there is minimal data available for analysis such as when a business or product has recently been introduced to the market. Quantitative methods, however, use available data, and analytical tools in order to produce predictions. Demand forecasting may be used in resource allocation, inventory management, assessing future capacity requirements, or making decisions on whether to enter a new market.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

There are many types of artificial neural networks (ANN).

System identification is a method of identifying or measuring the mathematical model of a system from measurements of the system inputs and outputs. The applications of system identification include any system where the inputs and outputs can be measured and include industrial processes, control systems, economic data, biology and the life sciences, medicine, social systems and many more.

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

<span class="mw-page-title-main">Learning curve (machine learning)</span>

In machine learning, a learning curve plots the optimal value of a model's loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function. Synonyms include error curve, experience curve, improvement curve and generalization curve.

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. Examples of these kinds of variable include gender, ethnicity, sexual orientation, disability, language, and more. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumers.

References

  1. "Yann LeCun "Predictive Learning: The Next Frontier in AI"". Nokia Bell Labs. 2017-02-17. Retrieved 2023-11-04.
  2. Corporation, Predictive Success (2019-05-06). "A Brief History of Predictive Analytics". Medium. Retrieved 2023-10-27.
  3. Drescher, Gary L. (1991). Made-up Minds: A Constructivist Approach to Artificial Intelligence. MIT Press. ISBN   978-0-262-04120-1.
  4. 1 2 Friedman, Jerome H.; Popescu, Bogdan E. (2008-09-17). "Predictive learning via rule ensembles". The Annals of Applied Statistics. 2 (3): 916–954. arXiv: 0811.1679 . doi: 10.1214/07-AOAS148 . ISSN   1932-6157.
  5. 1 2 Nagai, Yukie (2019-04-29). "Predictive learning: its key role in early cognitive development". Philosophical Transactions of the Royal Society B: Biological Sciences. 374 (1771): 20180030. doi:10.1098/rstb.2018.0030. ISSN   0962-8436. PMC   6452246 . PMID   30852990.
  6. Onnen, Heiko (2021-11-01). "Temporal Loops: Intro to Recurrent Neural Networks for Time Series Forecasting in Python". Medium. Retrieved 2023-11-02.
  7. Chaudhary, Kiran; Alam, Mansaf; Al-Rakhami, Mabrook S.; Gumaei, Abdu (2021-05-25). "Machine learning-based mathematical modelling for prediction of social media consumer behavior using big data analytics". Journal of Big Data. 8 (1): 73. doi: 10.1186/s40537-021-00466-2 . ISSN   2196-1115.