Uplift modelling

Last updated

Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual's behaviour.

Contents

Uplift modelling has applications in customer relationship management for up-sell, cross-sell and retention modelling. It has also been applied to political election and personalised medicine. Unlike the related Differential Prediction concept in psychology, Uplift Modelling assumes an active agent.

Introduction

Uplift modelling uses a randomised scientific control not only to measure the effectiveness of an action but also to build a predictive model that predicts the incremental response to the action. The response could be a binary variable (for example, a website visit) [1] or a continuous variable (for example, customer revenue). [2] Uplift modelling is a data mining technique that has been applied predominantly in the financial services, telecommunications and retail direct marketing industries to up-sell, cross-sell, churn and retention activities.

Measuring uplift

The uplift of a marketing campaign is usually defined as the difference in response rate between a treated group and a randomized control group. This allows a marketing team to isolate the effect of a marketing action and measure the effectiveness or otherwise of that individual marketing action. Honest marketing teams will only take credit for the incremental effect of their campaign.

However, many marketers define lift (rather than uplift) as the difference in response rate between treatment and control, so uplift modeling can be defined as improving (upping) lift through predictive modeling.

The table below shows the details of a campaign showing the number of responses and calculated response rate for a hypothetical marketing campaign. This campaign would be defined as having a response rate uplift of 5%. It has created 50,000 incremental responses (100,000 - 50,000).

GroupNumber of CustomersResponsesResponse Rate
Treated1,000,000100,00010%
Control1,000,00050,0005%

Traditional response modelling

Traditional response modelling typically takes a group of treated customers and attempts to build a predictive model that separates the likely responders from the non-responders through the use of one of a number of predictive modelling techniques. Typically this would use decision trees or regression analysis.

This model would only use the treated customers to build the model.

In contrast uplift modeling uses both the treated and control customers to build a predictive model that focuses on the incremental response. To understand this type of model it is proposed that there is a fundamental segmentation that separates customers into the following groups (their names were suggested by N. Radcliffe and explained in [3] )

The only segment that provides true incremental responses is the Persuadables.

Uplift modelling provides a scoring technique that can separate customers into the groups described above.

Traditional response modelling often targets the Sure Things being unable to distinguish them from the Persuadables.

Return on investment

Because uplift modelling focuses on incremental responses only, it provides very strong return on investment cases when applied to traditional demand generation and retention activities. For example, by only targeting the persuadable customers in an outbound marketing campaign, the contact costs and hence the return per unit spend can be dramatically improved.

Removal of negative effects

One of the most effective uses of uplift modelling is in the removal of negative effects from retention campaigns. Both in the telecommunications and financial services industries often retention campaigns can trigger customers to cancel a contract or policy. Uplift modelling allows these customers, the Do Not Disturbs, to be removed from the campaign.

Application to A/B and multivariate testing

It is rarely the case that there is a single treatment and control group. Often the "treatment" can be a variety of simple variations of a message or a multi-stage contact strategy that is classed as a single treatment. In the case of A/B or multivariate testing, uplift modelling can help in understanding whether the variations in tests provide any significant uplift compared to other targeting criteria such as behavioural or demographic indicators.

History of uplift modelling

The first appearance of true response modelling appears to be in the work of Radcliffe and Surry. [4]

Victor Lo also published on this topic in The True Lift Model (2002), [5] and later Radcliffe again with Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models (2007). [6]

Radcliffe also provides a very useful frequently asked questions (FAQ) section on his web site, Scientific Marketer. [7] Lo (2008) provides a more general framework, from program design to predictive modeling to optimization, along with future research areas. [8]

Independently uplift modelling has been studied by Piotr Rzepakowski. Together with Szymon Jaroszewicz he adapted information theory to build multi-class uplift decision trees and published the paper in 2010. [9] And later in 2011 they extended the algorithm to multiple treatment case. [10]

Similar approaches have been explored in personalised medicine. [11] [12] Szymon Jaroszewicz and Piotr Rzepakowski (2014) designed uplift methodology for survival analysis and applied it to randomized controlled trial analysis. [13] Yong (2015) combined a mathematical optimization algorithm via dynamic programming with machine learning methods to optimally stratify patients. [14]

Uplift modelling is a special case of the older psychology concept of Differential Prediction. [15] In contrast to differential prediction, uplift modelling assumes an active agent, and uses the uplift measure as an optimization metric.

Uplift modeling has been recently extended and incorporated into diverse machine learning algorithms, like Inductive Logic Programming, [15] Bayesian Network, [16] Statistical relational learning, [12] Support Vector Machines, [17] [18] Survival Analysis [13] and Ensemble learning. [19]

Even though uplift modeling is widely applied in marketing practice (along with political elections), it has rarely appeared in marketing literature. Kane, Lo and Zheng (2014) published a thorough analysis of three data sets using multiple methods in a marketing journal and provided evidence that a newer approach (known as the Four Quadrant Method) worked quite well in practice. [20] Lo and Pachamanova (2015) extended uplift modeling to prescriptive analytics for multiple treatment situations and proposed algorithms to solve large deterministic optimization problems and complex stochastic optimization problems where estimates are not exact. [21]

Recent research analyses the performance of various state-of-the-art uplift models in benchmark studies using large data amounts. [22] [1]

A detailed description of uplift modeling, its history, the way uplift models are built, differences to classical model building as well as uplift-specific evaluation techniques, a comparison of various software solutions and an explanation of different economical scenarios can be found here. [23]

Implementations

In Python

In R

Other languages

Datasets

Notes and references

  1. 1 2 Devriendt, Floris; Moldovan, Darie; Verbeke, Wouter (2018). "A literature survey and experimental evaluation of the state-of-the-art in uplift modeling: A stepping stone toward the development of prescriptive analytics". Big Data. 6 (1): 13–41. doi:10.1089/big.2017.0104. PMID   29570415.
  2. Gubela, Robin M.; Lessmann, Stefan; Jaroszewicz, Szymon (2020). "Response transformation and profit decomposition for revenue uplift modeling". European Journal of Operational Research. 283 (2): 647–661. arXiv: 1911.08729 . doi:10.1016/j.ejor.2019.11.030. S2CID   208175716.
  3. N. Radcliffe (2007). Identifying who can be saved and who will be driven away by retention activity. Stochastic Solution Limited
  4. Radcliffe, N. J.; and Surry, P. D. (1999); Differential response analysis: Modelling true response by isolating the effect of a single action, in Proceedings of Credit Scoring and Credit Control VI, Credit Research Centre, University of Edinburgh Management School
  5. Lo, V. S. Y. (2002); The True Lift Model, ACM SIGKDD Explorations Newsletter, Vol. 4, No. 2, 78–86, available at http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=4FD247B4987CBF2E29186DACE0D40C3D?doi=10.1.1.99.7064&rep=rep1&type=pdf
  6. Radcliffe, N. J. (2007); Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models, Direct Marketing Analytics Journal, Direct Marketing Association
  7. The Scientific Marketer FAQ on Uplift Modelling
  8. Lo, V. S.Y. (2008) “New Opportunities in Marketing Data Mining.” In Encyclopedia of Data Warehousing and Mining, 2nd edition, edited by Wang (2008), Idea Group Publishing.
  9. Rzepakowski, Piotr; Jaroszewicz, Szymon (2010). "Decision Trees for Uplift Modeling". 2010 IEEE International Conference on Data Mining. Sydney, Australia. pp. 441–450. doi:10.1109/ICDM.2010.62. ISBN   978-1-4244-9131-5. S2CID   14362608.{{cite book}}: CS1 maint: location missing publisher (link)
  10. Rzepakowski, Piotr; Jaroszewicz, Szymon (2011). "Decision trees for uplift modeling with single and multiple treatments". Knowledge and Information Systems. 32 (2): 303–327. doi: 10.1007/s10115-011-0434-0 .
  11. Cai, T.; Tian, L.; Wong, P. H.; and Wei, L. J. (2009); Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment Selections, Harvard University Biostatistics Working Paper Series, Paper 97
  12. 1 2 Nassif, Houssam; Kuusisto, Finn; Burnside, Elizabeth S; Page, David; Shavlik, Jude; Santos Costa, Vitor (2013). "Score as You Lift (SAYL): A Statistical Relational Learning Approach to Uplift Modeling". Advanced Information Systems Engineering. Lecture Notes in Computer Science. Vol. 8190. Prague. pp. 595–611. doi:10.1007/978-3-642-40994-3_38. ISBN   978-3-642-38708-1. PMC   4492311 . PMID   26158122.{{cite book}}: CS1 maint: location missing publisher (link)
  13. 1 2 Jaroszewicz, Szymon; Rzepakowski, Piotr (2014). "Uplift modeling with survival data" (PDF). ACM SIGKDD Workshop on Health Informatics (HI KDD'14). New York, USA.
  14. Yong, F.H. (2015), "Quantitative Methods for Stratified Medicine," PhD Dissertation, Department of Biostatistics, Harvard T.H. Chan School of Public Health, http://dash.harvard.edu/bitstream/handle/1/17463130/YONG-DISSERTATION-2015.pdf?sequence=1 .
  15. 1 2 Nassif, Houssam; Santos Costa, Vitor; Burnside, Elizabeth S; Page, David (2012). "Relational Differential Prediction". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 7523. Bristol, UK. pp. 617–632. doi:10.1007/978-3-642-33460-3_45. ISBN   978-3-642-33459-7.{{cite book}}: CS1 maint: location missing publisher (link)
  16. Nassif, Houssam; Wu, Yirong; Page, David; Burnside, Elizabeth (2012). "Logical Differential Prediction Bayes Net, Improving Breast Cancer Diagnosis for Older Women". American Medical Informatics Association Symposium (AMIA'12). 2012: 1330–1339. PMC   3540455 . PMID   23304412.
  17. Kuusisto, Finn; Santos Costa, Vitor; Nassif, Houssam; Burnside, Elizabeth; Page, David; Shavlik, Jude (2014). "Support Vector Machines for Differential Prediction". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. Vol. 8725. Nancy, France. pp. 50–65. doi:10.1007/978-3-662-44851-9_4. ISBN   978-3-662-44850-2. PMC   4492338 . PMID   26158123.{{cite book}}: CS1 maint: location missing publisher (link)
  18. Zaniewicz, Lukasz; Jaroszewicz, Szymon (2013). "Support Vector Machines for Uplift Modeling". The First IEEE ICDM Workshop on Causal Discovery. Dallas, Texas.
  19. Sołtys, Michał; Jaroszewicz, Szymon; Rzepakowski, Piotr (2015). "Ensemble methods for uplift modeling". Data Mining and Knowledge Discovery. 29 (6): 1531–1559. doi: 10.1007/s10618-014-0383-9 .
  20. Kane, K.; Lo, V.S.Y.; Zheng, J. (2014). "Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods". Journal of Marketing Analytics. 2 (4): 218–238. doi:10.1057/jma.2014.18. S2CID   256513132.
  21. Lo, V.S.Y.; Pachamanova, D. (2015). "From Predictive Uplift Modeling to Prescriptive Uplift Analytics: A Practical Approach to Treatment Optimization While Accounting for Estimation Risk". Journal of Marketing Analytics. 3 (2): 79–95. doi:10.1057/jma.2015.5. S2CID   256508939.
  22. Gubela, Robin M.; Bequé, Artem; Lessmann, Stefan; Gebert, Fabian (2019). "Conversion Uplift in E-Commerce: A Systematic Benchmark of Modeling Strategies". International Journal of Information Technology & Decision Making. 18 (3): 747–791. doi:10.1142/S0219622019500172. hdl: 10419/230773 . S2CID   126538764.
  23. R. Michel, I. Schnakenburg, T. von Martens (2019). „Targeting Uplift“. Springer, ISBN   978-3-030-22625-1

See also

Related Research Articles

In marketing, customer lifetime value, lifetime customer value (LCV), or life-time value (LTV) is a prognostication of the net profit contributed to the whole future relationship with a customer. The prediction model can have varying levels of sophistication and accuracy, ranging from a crude heuristic to the use of complex predictive analytics techniques.

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.

Database marketing is a form of direct marketing that uses databases of customers or potential customers to generate personalized communications in order to promote a product or service for marketing purposes. The method of communication can be any addressable medium, as in direct marketing.

Design for Six Sigma (DFSS) is a collection of best-practices for the development of new products and processes. It is sometimes deployed as an engineering design process or business process management method. DFSS originated at General Electric to build on the success they had with traditional Six Sigma; but instead of process improvement, DFSS was made to target new product development. It is used in many industries, like finance, marketing, basic engineering, process industries, waste management, and electronics. It is based on the use of statistical tools like linear regression and enables empirical research similar to that performed in other fields, such as social science. While the tools and order used in Six Sigma require a process to be in place and functioning, DFSS has the objective of determining the needs of customers and the business, and driving those needs into the product solution so created. It is used for product or process design in contrast with process improvement. Measurement is the most important part of most Six Sigma or DFSS tools, but whereas in Six Sigma measurements are made from an existing process, DFSS focuses on gaining a deep insight into customer needs and using these to inform every design decision and trade-off.

Churn rate is a measure of the proportion of individuals or items moving out of a group over a specific period. It is one of two primary factors that determine the steady-state level of customers a business will support.

Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive models are often used to detect crimes and identify suspects, after the crime has taken place.

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and other feedforward neural networks. Version 0.9 was published in 1998. Subsequent versions have been developed by the Data Mining Group.

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications. As such, it encompasses a variety of statistical techniques from predictive modeling and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. It represents a major subset of machine learning applications; in some contexts, it is synonymous with machine learning.

Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.

Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data mining and data analysis algorithms for classification, prediction, regression, associations, feature selection, anomaly detection, feature extraction, and specialized analytics. It provides means for the creation, management and operational deployment of data mining models inside the database environment.

Revenue management is the application of disciplined analytics that predict consumer behaviour at the micro-market levels and optimize product availability, leveraging price elasticity to maximize revenue growth and thereby, profit. The primary aim of revenue management is selling the right product to the right customer at the right time for the right price and with the right pack. The essence of this discipline is in understanding customers' perception of product value and accurately aligning product prices, placement and availability with each customer segment.

Business analytics (BA) refers to the skills, technologies, and practices for iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning. In other words, business intelligence focusses on description, while business analytics focusses on prediction and prescription.

Marketing mix modeling (MMM) is statistical analysis such as multivariate regressions on sales and marketing time series data to estimate the impact of various marketing tactics on sales and then forecast the impact of future sets of tactics. It is often used to optimize advertising mix and promotional tactics with respect to sales revenue or profit.

Customer analytics is a process by which data from customer behavior is used to help make key business decisions via market segmentation and predictive analytics. This information is used by businesses for direct marketing, site selection, and customer relationship management. Marketing provides services in order to satisfy customers. With that in mind, the productive system is considered from its beginning at the production level, to the end of the cycle at the consumer. Customer analytics plays an important role in the prediction of customer behavior.

<span class="mw-page-title-main">Angoss</span> Software company in Canada

Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics systems through software licensing and services. Angoss' customers represent industries including finance, insurance, mutual funds, retail, health sciences, telecom and technology. The company was founded in 1984, and publicly traded on the TSX Venture Exchange from 2008-2013 under the ticker symbol ANC.

Predictive Buying is a marketing industry term describing the use of algorithmic consumer analytics to predict future buying patterns. Predictive buying combines data mining with statistical analysis to predict what a customer wants to buy, and then present that customer with advertisements for that product. It is a type of targeted advertisement. Predictive marketing is used by websites such as Amazon and advertising publishers such as Google and Facebook.

Prescriptive analytics is a form of business analytics which suggests decision options for how to take advantage of a future opportunity or mitigate a future risk, and shows the implication of each decision option. It enables an enterprise to consider "the best course of action to take" in the light of information derived from descriptive and predictive analytics.

In marketing, attribution, also known as multi-touch attribution, is the identification of a set of user actions that contribute to a desired outcome, and then the assignment of a value to each of these events. Marketing attribution provides a level of understanding of what combination of events in what particular order influence individuals to engage in a desired behavior, typically referred to as a conversion.

Data mining, the process of discovering patterns in large data sets, has been used in many applications.

Data-driven marketing is a process used by marketers to gain insights and identify trends about consumers and how they behave — what they buy, the effectiveness of ads, and how they browse. Modern solutions rely on big data strategies and collect information about consumer interactions and engagements to generate predictions about future behaviors. This kind of analysis involves understanding the data that is already present, the data that can be acquired, and how to organize, analyze, and apply that data to better marketing efforts. The intended goal is generally to enhance and personalize the customer experience. The market research allows for a comprehensive study of preferences.