Small data

Last updated

Small data is data that is 'small' enough for human comprehension. [1] It is data in a volume and format that makes it accessible, informative and actionable. [2]

Contents

The term "big data" is about machines and "small data" is about people. [3] This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. [4]

A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." [5]

Another definition of small data is:

It was estimated that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” [4] as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations.

Uses in business

Marketing

Bonde has written about the topic for Forbes, [7] Direct Marketing News, [8] CMO.com [9] and other publications.

According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." [10] His approach is based on the combination of the observation of small samples with intuition. [11] Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. [11] In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. [12] It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. [12] According to Lindstrom, relationships among people and customer segments are organized around four criteria:

  1. Climate: It reveals for example how a person's environment affects their diet.
  2. Rulership: The power or government in charge
  3. Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system.
  4. Tradition: Cultural norms influence people's behaviors and interactions.

Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. [11] In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: [12]

  1. Collecting: Understanding the manner in which observations are translated inside a home.
  2. Clues: Uncovering other distinctive emotional reflections that can be observed.
  3. Connecting: Identifying the consequences of emotional behaviour.
  4. Causation: Understanding what emotions are being evoked.
  5. Correlation: Identifying the initial date of appearance of the behaviour or emotion.
  6. Compensation: Identifying the unmet or unfulfilled desire.
  7. Concept: Defining the “big idea” compensation for the identified consumer need.

Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. [4] The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home.

Healthcare

Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, [13] in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin.

The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". [14]

Postal Service

The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. [15]

Aerospace

In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. [16] One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs.

Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review. [17]

Related Research Articles

Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information.

Carnegie Mellon University Private research university in Pittsburgh, Pennsylvania, US (founded 1900)

Carnegie Mellon University (CMU) is a private research university based in Pittsburgh, Pennsylvania. Founded in 1900 by Andrew Carnegie as the Carnegie Technical Schools, the university became the Carnegie Institute of Technology in 1912 and began granting four-year degrees. In 1967, the Carnegie Institute of Technology merged with the Mellon Institute of Industrial Research, formerly a part of the University of Pittsburgh, to form Carnegie Mellon University. With its main campus located 3 miles (5 km) from Downtown Pittsburgh, Carnegie Mellon has grown into an international university with over a dozen degree-granting locations in six continents, including degree-granting campuses in Qatar and Silicon Valley, and more than 20 research partnerships.

Market segmentation is a process of dividing a heterogeneous market into relatively more homogenous segments based on certain parameters like geographic, demographic, psychographic, and behavioural. It is the activity of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers based on some type of shared characteristics.

Personalized marketing, also known as one-to-one marketing or individual marketing, is a marketing strategy by which companies leverage data analysis and digital technology to deliver individualized messages and product offerings to current or prospective customers. Advancements in data collection methods, analytics, digital electronics, and digital economics, have enabled marketers to deploy more effective real-time and prolonged customer experience personalization tactics.

Analytics

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns towards effective decision making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance.

Consumer behaviour The study of individuals, groups, or organizations and all the activities associated with consuming

Consumer behaviour is the study of individuals, groups, or organizations and all the activities associated with the purchase, use and disposal of goods and services, and how the consumer's emotions, attitudes and preferences affect buying behaviour. Consumer behaviour emerged in the 1940–50s as a distinct sub-discipline of marketing, but has become an interdisciplinary social science that blends elements from psychology, sociology, social anthropology, anthropology, ethnography, marketing and economics.

Relationship marketing is a form of marketing developed from direct response marketing campaigns that emphasizes customer retention and satisfaction rather than sales transactions. It differentiates from other forms of marketing in that it recognises the long-term value of customer relationships and extends communication beyond intrusive advertising and sales promotional messages. With the growth of the Internet and mobile platforms, relationship marketing has continued to evolve as technology opens more collaborative and social communication channels such as tools for managing relationships with customers that go beyond demographics and customer service data collection. Relationship marketing extends to include inbound marketing, a combination of search optimization and strategic content, public relations, social media and application development.

Neuromarketing is a commercial marketing communication field that applies neuropsychology to market research, studying consumers' sensorimotor, cognitive, and affective response to marketing stimuli. The potential benefits to marketers include more efficient and effective marketing campaigns and strategies, fewer product and campaign failures, and ultimately the manipulation of the real needs and wants of people to suit the needs and wants of marketing interests.

Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events.

Google Trends is a website by Google that analyzes the popularity of top search queries in Google Search across various regions and languages. The website uses graphs to compare the search volume of different queries over time.

Digital marketing is the component of marketing that utilizes internet and online based digital technologies such as desktop computers, mobile phones and other digital media and platforms to promote products and services. Its development during the 1990s and 2000s, changed the way brands and businesses use technology for marketing. As digital platforms became increasingly incorporated into marketing plans and everyday life, and as people increasingly use digital devices instead of visiting physical shops, digital marketing campaigns have become prevalent, employing combinations of search engine optimization (SEO), search engine marketing (SEM), content marketing, influencer marketing, content automation, campaign marketing, data-driven marketing, e-commerce marketing, social media marketing, social media optimization, e-mail direct marketing, display advertising, e–books, and optical disks and games have become commonplace. Digital marketing extends to non-Internet channels that provide digital media, such as television, mobile phones, callback, and on-hold mobile ring tones. The extension to non-Internet channels differentiates digital marketing from online marketing.

A target market is a group of customers within a business's serviceable available market at which a business aims its marketing efforts and resources. A target market is a subset of the total market for a product or service.

Splunk American technology company

Splunk Inc. is an American public multinational corporation based in San Francisco, California, that produces software for searching, monitoring, and analyzing machine-generated big data via a Web-style interface.

Social media marketing is the use of social media platforms and websites to promote a product or service. Although the terms e-marketing and digital marketing are still dominant in academia, social media marketing is becoming more popular for both practitioners and researchers. Most social media platforms have built-in data analytics tools, enabling companies to track the progress, success, and engagement of ad campaigns. Companies address a range of stakeholders through social media marketing, including current and potential customers, current and potential employees, journalists, bloggers, and the general public. On a strategic level, social media marketing includes the management of a marketing campaign, governance, setting the scope and the establishment of a firm's desired social media "culture" and "tone."

Pegasystems Inc. is an American software company based in Cambridge, Massachusetts. Founded in 1983, Pegasystems develops software for customer relationship management (CRM), digital process automation, and business process management (BPM). The company has been publicly traded since 1996 as PEGA (NASDAQ). The company's core product is the Pega Platform, which is part of its Pega Infinity suite of applications for customer engagement and digital process automation.

Big data Information assets characterized by high volume, velocity, and variety

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many fields (columns) offer greater statistical power, while data with higher complexity may lead to a higher false discovery rate. Big data analysis challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling. Therefore, big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time and value.

Logz.io

Logz.io is an Israel-based company which provides log management and log analysis services. The platform combines ELK as a cloud service and machine learning to derive new insights from machine data.

Chris Harrison (computer scientist)

Chris Harrison is a British-born, American computer scientist and entrepreneur, working in the fields of human-computer interaction, machine learning and sensor-driven interactive systems. He is a professor at Carnegie Mellon University and director of the Future Interfaces Group within the Human-Computer Interaction Institute. He has previously conducted research at AT&T Labs, Microsoft Research, IBM Research and Disney Research. He is also the CTO and co-founder of Qeexo, a machine learning and interaction technology startup.

Psychographic segmentation has been used in marketing research as a form of market segmentation which divides consumers into sub-groups based on shared psychological characteristics, including subconscious or conscious beliefs, motivations, and priorities to explain and predict consumer behavior. Developed in the 1970´s, it applies behavioral and social sciences to explore to understand consumers’ decision-making processes, consumer attitudes, values, personalities, lifestyles, and communication preferences. It complements demographic and socioeconomic segmentation, and enables marketers to target audiences with messaging to market brands, products or services. Some consider lifestyle segmentation to be interchangeable with psychographic segmentation, marketing experts argue that lifestyle relates specifically to overt behaviors while psychographics relate to consumers' cognitive style, which is based on their "patterns of thinking, feeling and perceiving".

Geoffrey J. Gordon is a professor at the Machine Learning Department at Carnegie Mellon University in Pittsburgh and director of research at the Microsoft Montréal lab. He is known for his research in statistical relational learning and on anytime dynamic variants of the A* search algorithm. His research interests include multi-agent planning, reinforcement learning, decision-theoretic planning, statistical models of difficult data, computational learning theory, and game theory.

References

  1. Rufus Pollock. "Forget big data, small data is the real revolution | News". The Guardian . Retrieved 2016-10-02.
  2. "What is small data? - Definition from WhatIs.com". Whatis.techtarget.com. 2016-08-18. Retrieved 2016-10-02.
  3. Eric Lundquist (2013-09-10). "'Small Data' Analysis the Next Big Thing, Advocates Assert". Eweek.com. Retrieved 2016-10-02.
  4. 1 2 3 "Why Small Data Is the New Big Data". knowledge.wharton.upenn.edu. Retrieved 2017-05-09.
  5. "Defining Small Data". Small Data Group. Retrieved 2016-10-02.
  6. "Forget Big Data - Small Data Is Driving The Internet Of Things". Forbes.com. Retrieved 2016-10-02.
  7. "These Smart, Social Apps Bring Big Data Down to Size". Forbes.com. Retrieved 2016-10-02.
  8. "Why Small Data Is the Next Big Thing for Marketers - DMN". Dmnews.com. 2013-08-22. Retrieved 2016-10-02.
  9. Bonde, Allen (2013-12-12). "Think Small: Time For Marketers To Move Beyond The Big Data Hype". Cmo.com. Retrieved 2016-10-02.
  10. "Small Data - Martin Lindstrom - Bestselling Author". Martin Lindstrom. Retrieved 2016-10-02.
  11. 1 2 3 Dooley, Roger (16 February 2016). "Small Data: The Next Big Thing". Forbes. Retrieved 8 May 2017.
  12. 1 2 3 Sarkar, Christian (1 May 2016). ""Small Data, Big Impact!" – An Interview with Martin Lindstrom". The Marketing Journal. Retrieved 8 May 2017.
  13. http://smalldata.io/
  14. "Small Data and Big Health Benefits". research.cornell.edu. Retrieved 2017-05-15.
  15. "Innovative Technologies - Postal Facts". about.usps.com. Retrieved 2017-11-07.
  16. University, Carnegie Mellon (October 2015). "Boeing Establishes Analytics Lab For Aerospace Data at Carnegie Mellon - News - Carnegie Mellon University" . Retrieved 2017-11-07.
  17. "Sometimes "Small Data" Is Enough to Create Smart Products". Harvard Business Review. Retrieved 2017-11-07.