Machine-generated data

Last updated

Machine-generated data is information automatically generated by a computer process, application, or other mechanism without the active intervention of a human. While the term dates back over fifty years, [1] there is some current indecision as to the scope of the term. Monash Research's Curt Monash defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." [2] Meanwhile, Daniel Abadi, CS Professor at Yale, proposes a narrower definition, "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." [3] Regardless of definition differences, both exclude data manually entered by a person. [4] Machine-generated data crosses all industry sectors. Often and increasingly, humans are unaware their actions are generating the data. [5]

Contents

Relevance

Machine-generated data has no single form; rather, the type, format, metadata, and frequency respond to some particular business purpose. Machines often create it on a defined time schedule or in response to a state change, action, transaction, or other event. Since the event is historical, the data is not prone to be updated or modified. Partly because of this quality, the U.S. court systems consider machine-generated data as highly reliable. [6]

Machine-generated data is the lifeblood of the Internet of Things (IoT). [7]

Growth

In 2009, Gartner published that data will grow by 650% over the following five years. [8] Most of the growth in data is the byproduct of machine-generated data. [4] IDC estimated that in 2020, there will be 26 times more connected things than people. [9] Wikibon issued a forecast of $514 billion to be spent on the Industrial Internet in 2020. [10]

Processing

Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine-generated data is unstructured but then derived into a common structure. [4] Typically, these derived structures contain many data points/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.

Examples

Notes

Reference List

  1. Control Systems Functions and Programming Approaches: Applications by Dimitris N Chorafas. Academic Press. 1966-01-01. ISBN   978-0-08-095534-6.
  2. Monash, 12/30/2010
  3. Abadi
  4. 1 2 3 Monash, Three Broad Categories of Data
  5. Deloach, Machine Generated Data
  6. Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay
  7. Seth Grimes [@SethGrimes] (March 8, 2016). "Machine-generated data is the lifeblood of the Internet of Things (#IoT): a key but missing point" (Tweet) via Twitter.
  8. ScienceLogic
  9. , Chuck's Blog
  10. , Wikibon
  11. 1 2 3 4 5 Monash, Examples of Machine Generated Data

Bibliography

Related Research Articles

<span class="mw-page-title-main">Logic in computer science</span> Academic discipline

Logic in computer science covers the overlap between the field of logic and that of computer science. The topic can essentially be divided into three main areas:

In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

<span class="mw-page-title-main">Edge computing</span> Distributed computing paradigm

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth. Edge computing is an architecture rather than a specific technology, and a topology- and location-sensitive form of distributed computing.

Process mining is a family of techniques relating the fields of data science and process management to support the analysis of operational processes based on event logs. The goal of process mining is to turn event data into insights and actions. Process mining is an integral part of data science, fueled by the availability of event data and the desire to improve processes. Process mining techniques use event data to show what people, machines, and organizations are really doing. Process mining provides novel insights that can be used to identify the execution paths taken by operational processes and address their performance and compliance problems.

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

The Internet of things (IoT) describes devices with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. The Internet of things encompasses electronics, communication and computer science engineering. Internet of things has been considered a misnomer because devices do not need to be connected to the public internet, they only need to be connected to a network, and be individually addressable.

A smart object is an object that enhances the interaction with not only people but also with other smart objects. Also known as smart connected products or smart connected things (SCoT), they are products, assets and other things embedded with processors, sensors, software and connectivity that allow data to be exchanged between the product and its environment, manufacturer, operator/user, and other products and systems. Connectivity also enables some capabilities of the product to exist outside the physical device, in what is known as the product cloud. The data collected from these products can be then analyzed to inform decision-making, enable operational efficiencies and continuously improve the performance of the product.

<span class="mw-page-title-main">Splunk</span> American technology company

Splunk Inc. is an American software company based in San Francisco, California, that produces software for searching, monitoring, and analyzing machine-generated data via a web-style interface.

<span class="mw-page-title-main">Network forensics</span>

Network forensics is a sub-branch of digital forensics relating to the monitoring and analysis of computer network traffic for the purposes of information gathering, legal evidence, or intrusion detection. Unlike other areas of digital forensics, network investigations deal with volatile and dynamic information. Network traffic is transmitted and then lost, so network forensics is often a pro-active investigation.

SQLstream is a distributed, SQL standards-compliant plus Java stream processing platform. SQLstream, Inc. is based in San Francisco, California and was launched in 2009 by Damian Black, Edan Kabatchnik and Julian Hyde, author of the open source Mondrian Relational OLAP Server Engine.

<span class="mw-page-title-main">Sumo Logic</span> U.S. information technology company

Sumo Logic, Inc. is a cloud-based machine data analytics company focusing on security, operations and BI use-cases. It provides log management and analytics services that use machine-generated big data. Sumo Logic was founded in April 2010 by ArcSight veterans Kumar Saurabh and Christian Beedgen, and is headquartered in Redwood City, California.

The third platform is a term coined by marketing firm International Data Corporation (IDC) for a model of a computing platform. It was promoted as inter-dependencies between mobile computing, social media, cloud computing, and information / analytics, and possibly the Internet of things. The term was in use in 2013, and possibly earlier. Gartner claimed that these interdependent trends were "transforming the way people and businesses relate to technology" and have since provided a number of reports on the topic.

Wire data is the information that passes over computer and telecommunication networks defining communications between client and server devices. It is the result of decoding wire and transport protocols containing the bi-directional data payload. More precisely, wire data is the information that is communicated in each layer of the OSI model.

Fog computing or fog networking, also known as fogging, is an architecture that uses edge devices to carry out a substantial amount of computation, storage, and communication locally and routed over the Internet backbone.

In the fields of information technology (IT) and systems management, IT operations analytics (ITOA) is an approach or method to retrieve, analyze, and report data for IT operations. ITOA may apply big data analytics to large datasets to produce business insights. In 2014, Gartner predicted its use might increase revenue or reduce costs. By 2017, it predicted that 15% of enterprises will use IT operations analytics technologies.

Online content analysis or online textual analysis refers to a collection of research techniques used to describe and make inferences about online material through systematic coding and interpretation. Online content analysis is a form of content analysis for analysis of Internet-based communication.

Operational technology (OT) is hardware and software that detects or causes a change, through the direct monitoring and/or control of industrial equipment, assets, processes and events. The term has become established to demonstrate the technological and functional differences between traditional information technology (IT) systems and industrial control systems environment, the so-called "IT in the non-carpeted areas".

The industrial internet of things (IIoT) refers to interconnected sensors, instruments, and other devices networked together with computers' industrial applications, including manufacturing and energy management. This connectivity allows for data collection, exchange, and analysis, potentially facilitating improvements in productivity and efficiency as well as other economic benefits. The IIoT is an evolution of a distributed control system (DCS) that allows for a higher degree of automation by using cloud computing to refine and optimize the process controls.

<span class="mw-page-title-main">Thing Description</span>

The Thing Description (TD) (or W3C WoT Thing Description (TD)) is a royalty-free, open information model with a JSON based representation format for the Internet of Things (IoT). A TD provides a unified way to describe the capabilities of an IoT device or service with its offered data model and functions, protocol usage, and further metadata. Using Thing Descriptions help reduce the complexity of integrating IoT devices and their capabilities into IoT applications.

The Internet of Military Things (IoMT) is a class of Internet of things for combat operations and warfare. It is a complex network of interconnected entities, or "things", in the military domain that continually communicate with each other to coordinate, learn, and interact with the physical environment to accomplish a broad range of activities in a more efficient and informed manner. The concept of IoMT is largely driven by the idea that future military battles will be dominated by machine intelligence and cyber warfare and will likely take place in urban environments. By creating a miniature ecosystem of smart technology capable of distilling sensory information and autonomously governing multiple tasks at once, the IoMT is conceptually designed to offload much of the physical and mental burden that warfighters encounter in a combat setting.