Artificial Intelligence for IT Operations

Last updated December 16, 2023

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics.^[1] AIOps^[2]^[3] is the acronym of "Artificial Intelligence Operations".^[4]^[5]^[6] Such operation tasks include automation, performance monitoring and event correlations among others.^[7]^[8]

There are two main aspects of an AIOps platform: machine learning and big data. In order to collect observational data and engagement data that can be found inside a big data platform and requires a shift away from sectionally segregated IT data, a holistic machine learning and analytics strategy is implemented against the combined IT data.^[9]

The goal is to enable IT transformation,^[10] receive continuous insights which provide continuous fixes and improvements via automation. This is why AIOps can be viewed as CI/CD for core IT functions.^[11]

Given the inherent nature of IT operations, which is closely tied to cloud deployment and the management of distributed applications, AIOps has increasingly led to the coalescence of machine learning and cloud research.^[12]^[13]

Process

The normalized data is suitable to be processed through machine learning algorithms to automatically reduce noise and identify the probable root cause of incidents. The main output of such stage is the detection of any abnormal behavior from users, devices or applications.

Noise reduction can be done by various methods, but most of the research in the field points to the following actions:

Analysis of all incoming alerts;
Remove duplicates;
Identify the false positives;
Early anomaly, fault and failure (AFF) detection and analysis.^[14]

Anomaly detection - another step in any AIOps process is based on the analysis of past behavior of users, equipment and applications. Anything that strays from that behavior baseline is considered unusual and flagged as abnormal.

Root cause determination is usually done by passing incoming alerts through algorithms that take into consideration correlated events as well as topology dependencies. The algorithms on which AI are basing their functioning can be influenced directly, essentially by "training" them.^[15]

Use

A very important use of AIOps platforms is related to the analysis of large and unconnected datasets, such as the Johns Hopkins Covid-19's data published through GitHub.^[16] The data in this example is pulled from a large number of un-normalized databases - aggregated data (10 sources), US regional data (113 sources) and Non-US data (37 sources), which are unuseable considering the needed emergency response time by the traditional analysis models.

Generally, the main areas of use for AIOps platforms and principles are^[17]

Automation of tasks (DevOps)
Machine learning platforms
Augmented reality
Agent-based simulations
Internet of things (IoT)
AI Optimized Hardware
Natural language generation
Streaming data platforms
Conversational BI and analytics

Deployment and integration testing^[18]
System configuration ^[18]
Service quality monitoring and anomaly detection ^[18]
Resource scheduling and optimization^[18]
Capacity/workload management and prediction^[18]
Hardware/software failure prediction^[18]
Auto-diagnosis and problem localization^[18]
Incident management ^[18]
Auto service healing^[18]
Data center management ^[18]
Customer support ^[18]
Security ^[18]
Privacy ^[18]

Related Research Articles

Business process automation (BPA), also known as business automation,distinguished from Business Process Management (BPM), is the technology-enabled automation of business processes. It can help a business in simplicity, to increase digital transformation, increase service quality, improve service delivery, or contain costs. BPA consists of integrating applications, restructuring labor resources, and using software applications throughout the organization. Robotic process automation is an emerging field within BPA.

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Splunk Inc. is an American software company based in San Francisco, California, that produces software for searching, monitoring, and analyzing machine-generated data via a web-style interface.

Parasoft is an independent software vendor specializing in automated software testing and application security with headquarters in Monrovia, California. It was founded in 1987 by four graduates of the California Institute of Technology who planned to commercialize the parallel computing software tools they had been working on for the Caltech Cosmic Cube, which was the first working hypercube computer built.

Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.

Nastel Technologies is an information technology (IT) monitoring company that sells software for Artificial Intelligent IT Operations (AIOps), monitoring and managing middleware, transaction tracking and tracing, IT Operational Analytics (ITOA), Decision Support Systems (DSS) business transaction management (BTM) and application performance management (APM).

ScienceLogic is a software and service vendor. It produces information technology (IT) management and monitoring software for IT Operations and cloud computing.

Marketing automation refers to software platforms and technologies designed for marketing departments and organizations to more effectively market on multiple channels online and automate repetitive tasks.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.

Feedzai is a data science company that develops real-time machine learning tools to identify fraudulent payment transactions and minimize risk in the financial services, retail, and e-commerce industries. The company has been classified as a unicorn startup since March 2021, after a Series D funding round pushed its value above $1 billion.

In the fields of information technology (IT) and systems management, IT operations analytics (ITOA) is an approach or method to retrieve, analyze, and report data for IT operations. ITOA may apply big data analytics to large datasets to produce business insights. In 2014, Gartner predicted its use might increase revenue or reduce costs. By 2017, it predicted that 15% of enterprises will use IT operations analytics technologies.

Dynatrace, Inc. is a global technology company listed on the NYSE that provides a software observability platform based on artificial intelligence (AI) and automation. Dynatrace technologies are used to monitor, analyze, and optimize application performance, software development and security practices, IT infrastructure, and user experience for businesses and government agencies throughout the world.

DataOps is a set of practices, processes and technologies that combines an integrated and process-oriented perspective on data with automation and methods from agile software engineering to improve quality, speed, and collaboration and promote a culture of continuous improvement in the area of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

The industrial internet of things (IIoT) refers to interconnected sensors, instruments, and other devices networked together with computers' industrial applications, including manufacturing and energy management. This connectivity allows for data collection, exchange, and analysis, potentially facilitating improvements in productivity and efficiency as well as other economic benefits. The IIoT is an evolution of a distributed control system (DCS) that allows for a higher degree of automation by using cloud computing to refine and optimize the process controls.

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous development practice of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics. According to Gartner, MLOps is a subset of ModelOps. MLOps is focused on the operationalization of ML models, while ModelOps covers the operationalization of all types of AI models.

Augmented Analytics is an approach of data analytics that employs the use of machine learning and natural language processing to automate analysis processes normally done by a specialist or data scientist. The term was introduced in 2017 by Rita Sallam, Cindi Howson, and Carlie Idoine in a Gartner research paper.

Cynthia Diane Rudin is an American computer scientist and statistician specializing in machine learning and known for her work in interpretable machine learning. She is the director of the Interpretable Machine Learning Lab at Duke University, where she is a professor of computer science, electrical and computer engineering, statistical science, and biostatistics and bioinformatics. In 2022, she won the Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the Association for the Advancement of Artificial Intelligence (AAAI) for her work on the importance of transparency for AI systems in high-risk domains.

Anodot is an American data analytics company that uses machine learning and artificial intelligence for business monitoring and anomaly detection.

<span class="mw-page-title-main">ModelOps</span>

ModelOps, as defined by Gartner, "is focused primarily on the governance and lifecycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models". "ModelOps lies at the heart of any enterprise AI strategy". It orchestrates the model lifecycles of all models in production across the entire enterprise, from putting a model into production, then evaluating and updating the resulting application according to a set of governance rules, including both technical and business KPI's. It grants business domain experts the capability to evaluate AI models in production, independent of data scientists.

Automated decision-making (ADM) involves the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM involves large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences.

References

↑ Jerry Bowles (January 28, 2020). "AIOps and service assurance in the age of digital transformation". Diginomica.
↑ "Best practices for taking a hybrid approach to AIOps". 7 June 2021. Retrieved Nov 11, 2022.
↑ "Algorithmic IT Operations Drives Digital Business: Gartner - CXOtoday.com". Cxotoday.com. Archived from the original on January 28, 2018. Retrieved January 28, 2018.
↑ "Market Guide for AIOps Platforms". Gartner . Retrieved January 28, 2018.
↑ "Improve IT systems management productivity, application performance and operational resiliency with AIOps". IBM . Retrieved Nov 11, 2022.
↑ "ITOA to AIOps: The next generation of network analytics". TechTarget . Retrieved January 28, 2018.
↑ "An Introduction to AIOps". The Register . Retrieved January 28, 2018.
↑ "AIOps - The Type of 'AI' with Nothing Artificial About It - Dataconomy". Dataconomy.com. 31 March 2017. Retrieved January 28, 2018.
↑ "AIOps: Managing the Second Law of IT Ops - DevOps.com". devops.com. 22 September 2017. Retrieved 24 January 2018.
↑ "What is AIOps or Artificial Intelligence for IT Operations. Top 10 Common AIOps Use Cases". Archived from the original on 2021-02-12.
↑ Harris, Richard. "Explaining what AIOps is and why it matters to developers". appdevelopermagazine.com. Retrieved 24 January 2018.
↑ Masood, Adnan; Hashmi, Adnan (2019), Masood, Adnan; Hashmi, Adnan (eds.), "AIOps: Predictive Analytics & Machine Learning in Operations", Cognitive Computing Recipes: Artificial Intelligence Solutions Using Microsoft Cognitive Services and TensorFlow, Apress, pp. 359–382, doi:10.1007/978-1-4842-4106-6_7, ISBN 978-1-4842-4106-6, S2CID 108316737
↑ Duc, Thang Le; Leiva, Rafael García; Casari, Paolo; Östberg, Per-Olov (September 2019). "Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey". ACM Comput. Surv. 52 (5): 94:1–94:39. doi: 10.1145/3341145 . hdl: 11572/253114 . ISSN 0360-0300.
↑ WISC.edu - International Conference on Service Oriented Computing
↑ Machine Learning
↑ Importing COVID-19 data into Elasticsearch
↑ UPC.edu - Top 10 Artificial Intelligence Trends in 2019
1 2 3 4 5 6 7 8 9 10 11 12 13 "Call For Papers". cloudintelligenceworkshop.org. Retrieved 2022-12-31.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jerry Bowles (January 28, 2020). "AIOps and service assurance in the age of digital transformation". Diginomica.

[IBM-2] "Best practices for taking a hybrid approach to AIOps". 7 June 2021. Retrieved Nov 11, 2022.

[CXO_Today-3] "Algorithmic IT Operations Drives Digital Business: Gartner - CXOtoday.com". Cxotoday.com. Archived from the original on January 28, 2018. Retrieved January 28, 2018.

[Gartner-4] "Market Guide for AIOps Platforms". Gartner . Retrieved January 28, 2018.

[intelligent_AIOps-5] "Improve IT systems management productivity, application performance and operational resiliency with AIOps". IBM . Retrieved Nov 11, 2022.

[Tech-6] "ITOA to AIOps: The next generation of network analytics". TechTarget . Retrieved January 28, 2018.

[Register-7] "An Introduction to AIOps". The Register . Retrieved January 28, 2018.

[Dataconomy-8] "AIOps - The Type of 'AI' with Nothing Artificial About It - Dataconomy". Dataconomy.com. 31 March 2017. Retrieved January 28, 2018.

[9] "AIOps: Managing the Second Law of IT Ops - DevOps.com". devops.com. 22 September 2017. Retrieved 24 January 2018.

[10] "What is AIOps or Artificial Intelligence for IT Operations. Top 10 Common AIOps Use Cases". Archived from the original on 2021-02-12.

[11] Harris, Richard. "Explaining what AIOps is and why it matters to developers". appdevelopermagazine.com. Retrieved 24 January 2018.

[12] Masood, Adnan; Hashmi, Adnan (2019), Masood, Adnan; Hashmi, Adnan (eds.), "AIOps: Predictive Analytics & Machine Learning in Operations", Cognitive Computing Recipes: Artificial Intelligence Solutions Using Microsoft Cognitive Services and TensorFlow, Apress, pp. 359–382, doi:10.1007/978-1-4842-4106-6_7, ISBN 978-1-4842-4106-6, S2CID 108316737

[13] Duc, Thang Le; Leiva, Rafael García; Casari, Paolo; Östberg, Per-Olov (September 2019). "Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey". ACM Comput. Surv. 52 (5): 94:1–94:39. doi: 10.1145/3341145 . hdl: 11572/253114 . ISSN 0360-0300.

[14] WISC.edu - International Conference on Service Oriented Computing

[15] Machine Learning

[16] Importing COVID-19 data into Elasticsearch

[17] UPC.edu - Top 10 Artificial Intelligence Trends in 2019

[:0-18] 1 2 3 4 5 6 7 8 9 10 11 12 13 "Call For Papers". cloudintelligenceworkshop.org. Retrieved 2022-12-31.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Artificial Intelligence for IT Operations

Contents

Process

Use

Related Research Articles

References