Affinity analysis

Last updated
Frequent Itemsets AffinityAnalysis.png
Frequent Itemsets

Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems and processes, the application of affinity analysis can extract significant knowledge about the unexpected trends [ citation needed ]. In fact, affinity analysis takes advantages of studying attributes that go together which helps uncover the hidden patterns in a big data through generating association rules. Association rules mining procedure is two-fold: first, it finds all frequent attributes in a data set and, then generates association rules satisfying some predefined criteria, support and confidence, to identify the most important relationships in the frequent itemset. The first step in the process is to count the co-occurrence of attributes in the data set. Next, a subset is created called the frequent itemset. The association rules mining takes the form of if a condition or feature (A) is present then another condition or feature (B) exists. The first condition or feature (A) is called antecedent and the latter (B) is known as consequent. This process is repeated until no additional frequent itemsets are found.  There are two important metrics for performing the association rules mining technique: support and confidence. Also, a priori algorithm is used to reduce the search space for the problem. [1]

Contents

The support metric in the association rule learning algorithm is defined as the frequency of the antecedent or consequent appearing together in a data set. Moreover, confidence is expressed as the reliability of the association rules determined by the ratio of the data records containing both A and B. The minimum threshold for support and confidence are inputs to the model. Considering all the above-mentioned definitions, affinity analysis can develop rules that will predict the occurrence of an event based on the occurrence of other events. This data mining method has been explored in different fields including disease diagnosis, market basket analysis, retail industry, higher education, and financial analysis. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans. [2]

Application of affinity analysis techniques in retail

Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other.

Market basket analysis may provide the retailer with information to understand the purchase behavior of a buyer. This information will enable the retailer to understand the buyer's needs and rewrite the store's layout accordingly, develop cross-promotional programs, or even capture new buyers (much like the cross-selling concept). An apocryphal early illustrative example for this was when one super market chain discovered in its analysis that male customers that bought diapers often bought beer as well, have put the diapers close to beer coolers, and their sales increased dramatically. Although this urban legend is only an example that professors use to illustrate the concept to students, the explanation of this imaginary phenomenon might be that fathers that are sent out to buy diapers often buy a beer as well, as a reward. [3] This kind of analysis is supposedly an example of the use of data mining. A widely used example of cross selling on the web with market basket analysis is Amazon.com's use of "customers who bought book A also bought book B", e.g. "People who read History of Portugal were also interested in Naval History".

Market basket analysis can be used to divide customers into groups. A company could look at what other items people purchase along with eggs, and classify them as baking a cake (if they are buying eggs along with flour and sugar) or making omelets (if they are buying eggs along with bacon and cheese). This identification could then be used to drive other programs. Similarly, it can be used to divide products into natural groups. A company could look at what products are most frequently sold together and align their category management around these cliques. [4]

Business use of market basket analysis has significantly increased since the introduction of electronic point of sale. [2] Amazon uses affinity analysis for cross-selling when it recommends products to people based on their purchase history and the purchase history of other people who bought the same item. Family Dollar plans to use market basket analysis to help maintain sales growth while moving towards stocking more low-margin consumable goods. [5]

Application of affinity analysis techniques in clinical diagnosis

Flow chart representation of Knowledge Discovery Process Flow chart representation of Knowledge Discovery Process.jpg
Flow chart representation of Knowledge Discovery Process

An important clinical application of affinity analysis is that it can be performed on medical patient records in order to generate association rules. The obtained association rules can be further assessed to find different conditions and features that coincide on a large block of information. [6] It is crucial to understand whether there is an association between different factors contributing to a condition to be able to administer the effective preventive or therapeutic interventions. In evidence-based medicine, finding the co-occurrence of symptoms that are associated with developing tumors or cancers can help diagnose the disease at its earliest stage. [7] In addition to exploring the association between different symptoms in a patient related to a specific disease, the possible correlations between various diseases contributing to another condition can also be identified using affinity analysis. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Pricing</span> Process of determining what a company will receive in exchange for its products

Pricing is the process whereby a business sets the price at which it will sell its products and services, and may be part of the business's marketing plan. In setting prices, the business will take into account the price at which it could acquire the goods, the manufacturing cost, the marketplace, competition, market condition, brand, and quality of product.

A grey market or dark market is the trade of a commodity through distribution channels that are not authorized by the original manufacturer or trade mark proprietor. Grey market products are products traded outside the authorized manufacturer's channel.

A market basket or commodity bundle is a fixed list of items, in given proportions. Its most common use is to track the progress of inflation in an economy or specific market. That is, to measure the changes in the value of money over time. A market basket is also used with the theory of purchasing price parity to measure the value of money in different places.

A loss leader is a pricing strategy where a product is sold at a price below its market cost to stimulate other sales of more profitable goods or services. With this sales promotion/marketing strategy, a "leader" is any popular article, i.e., sold at a low price to attract customers.

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

<span class="mw-page-title-main">Data management</span> Disciplines related to managing data as a resource

Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization’s data so it can be analyzed for decision making.

<span class="mw-page-title-main">Online shopping</span> Form of electronic commerce

Online shopping is a form of electronic commerce which allows consumers to directly buy goods or services from a seller over the Internet using a web browser or a mobile app. Consumers find a product of interest by visiting the website of the retailer directly or by searching among alternative vendors using a shopping search engine, which displays the same product's availability and pricing at different e-retailers. As of 2020, customers can shop online using a range of different computers and devices, including desktop computers, laptops, tablet computers and smartphones.

Marketing intelligence (MI) is the everyday information relevant to a company's markets, gathered and analyzed specifically for the purpose of accurate and confident decision-making in determining market opportunity, market penetration strategy, and market development metrics. Marketing intelligence is necessary when entering a foreign market.

In marketing, a rebate is a form of buying discount and is an amount paid by way of reduction, return, or refund that is paid retrospectively. It is a type of sales promotion that marketers use primarily as incentives or supplements to product sales. Rebates are also used as a means of enticing price-sensitive consumers into purchasing a product. The mail-in rebate (MIR) is the most common. A MIR entitles the buyer to mail in a coupon, receipt, and barcode in order to receive a check for a particular amount, depending on the particular product, time, and often place of purchase. Rebates are offered by either the retailer or the product manufacturer. Large stores often work in conjunction with manufacturers, usually requiring two or sometimes three separate rebates for each item, and sometimes are valid only at a single store. Rebate forms and special receipts are sometimes printed by the cash register at time of purchase on a separate receipt or available online for download. In some cases, the rebate may be available immediately, in which case it is referred to as an instant rebate. Some rebate programs offer several payout options to consumers, including a paper check, a prepaid card that can be spent immediately without a trip to the bank, or even as a PayPal payout.

Scan-based trading (SBT) is the process where suppliers maintain ownership of inventory within retailers' warehouses or stores until items are scanned at the point of sale. Suppliers, such as manufacturers or farmers, own the product until it is purchased by the customer, with the store or venue then buying the product from the supplier and reselling it to the customer. Analysts in the grocery sector estimate scan-based trading accounted for $21 billion dollars in consumer goods purchased in the grocery industry alone in 2020, or nearly 3% of overall sales.

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequential pattern mining is a special case of structured data mining.

<span class="mw-page-title-main">Visual merchandising</span> Marketing technique emphasizing 3D model displays

Visual merchandising is the practice in the retail industry of optimizing the presentation of products and services to better highlight their features and benefits. The purpose of such visual merchandising is to attract, engage, and motivate the customer towards making a purchase.

<span class="mw-page-title-main">Stockout</span> Depletion of inventory

A stockout, or out-of-stock (OOS) event is an event that causes inventory to be exhausted. While out-of-stocks can occur along the entire supply chain, the most visible kind are retail out-of-stocks in the fast-moving consumer goods industry. Stockouts are the opposite of overstocks, where too much inventory is retained. A backorder is an order placed for an item which is out-of-stock and awaiting fulfillment.

Customer analytics is a process by which data from customer behavior is used to help make key business decisions via market segmentation and predictive analytics. This information is used by businesses for direct marketing, site selection, and customer relationship management. Marketing provides services to satisfy customers. With that in mind, the productive system is considered from its beginning at the production level, to the end of the cycle at the consumer. Customer analytics plays an important role in the prediction of customer behavior.

'Shopper marketing' is "a discipline that focuses on the customer experience and the customer journey."It focuses on the consumer's path to purchasing a product, from first being aware of the product, to consideration and through to the purchase of it. It separates itself from retail marketing which focuses on engaging the customer in-store only.

Customer to customer markets provide a way to allow customers to interact with each other. Traditional markets require business to customer relationships, in which a customer goes to the business in order to purchase a product or service. In customer to customer markets, the business facilitates an environment where customers can sell goods or services to each other. Other types of markets include business to business (B2B) and business to customer (B2C).

Data mining, the process of discovering patterns in large data sets, has been used in many applications.

Data-driven marketing is a process used by marketers to gain insights and identify trends about consumers and how they behave — what they buy, the effectiveness of ads, and how they browse. Modern solutions rely on big data strategies and collect information about consumer interactions and engagements to generate predictions about future behaviors. This kind of analysis involves understanding the data that is already present, the data that can be acquired, and how to organize, analyze, and apply that data to better marketing efforts. The intended goal is generally to enhance and personalize the customer experience. The market research allows for a comprehensive study of preferences.

Frequent pattern discovery is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets that appear in a data set with frequency no less than a user-specified or auto-determined threshold.

References

  1. Larose, Daniel T.; Larose, Chantal D. (2014-06-23). Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ, USA: John Wiley & Sons, Inc. doi:10.1002/9781118874059. ISBN   978-1-118-87405-9.
  2. 1 2 "Demystifying Market Basket Analysi" . Retrieved 28 December 2018.
  3. "The parable of the beer and diapers". The Register . Retrieved 3 September 2009.
  4. Product Network Analysis Archived 2018-11-18 at the Wayback Machine Forte Consultancy Group
  5. "Family Dollar Supports Merchandising with IT". Archived from the original on 6 May 2010. Retrieved 3 November 2009.
  6. Sanida, Theodora; Varlamis, Iraklis (June 2017). "Application of Affinity Analysis Techniques on Diagnosis and Prescription Data". 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS). Thessaloniki: IEEE. pp. 403–408. doi:10.1109/CBMS.2017.114. ISBN   978-1-5386-1710-6.
  7. Dept. of Biotechnology & Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, H.P., India; Sengupta, Dipankar; Sood, Meemansa; Vijayvargia, Poorvika; Hota, Sunil; Naik, Pradeep K (2013-06-29). "Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor". Bioinformation. 9 (11): 555–559. doi:10.6026/97320630009555. PMC   3717182 . PMID   23888095.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  8. Lakshmi, K.S; Vadivu, G. (2017). "Extracting Association Rules from Medical Health Records Using Multi-Criteria Decision Analysis". Procedia Computer Science. 115: 290–295. doi: 10.1016/j.procs.2017.09.137 .

Further reading