DataOps

Last updated

DataOps is a set of practices, processes and technologies that combines an integrated and process-oriented perspective on data with automation and methods from agile software engineering to improve quality, speed, and collaboration and promote a culture of continuous improvement in the area of data analytics. [1] While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. [2] DataOps applies to the entire data lifecycle [3] from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations. [4]

Contents

DataOps incorporates the Agile methodology to shorten the cycle time of analytics development in alignment with business goals. [3]

DevOps focuses on continuous delivery by leveraging on-demand IT resources and by automating test and deployment of software. This merging of software development and IT operations has improved velocity, quality, predictability and scale of software engineering and deployment. Borrowing methods from DevOps, DataOps seeks to bring these same improvements to data analytics. [4]

DataOps utilizes statistical process control (SPC) to monitor and control the data analytics pipeline. With SPC in place, the data flowing through an operational system is constantly monitored and verified to be working. If an anomaly occurs, the data analytics team can be notified through an automated alert. [5]

DataOps is not tied to a particular technology, architecture, tool, language or framework. Tools that support DataOps promote collaboration, orchestration, quality, security, access and ease of use. [6]

History

DataOps was first introduced by Lenny Liebmann, Contributing Editor, InformationWeek, in a blog post on the IBM Big Data & Analytics Hub titled "3 reasons why DataOps is essential for big data success" on June 19, 2014. [7] The term DataOps was later popularized by Andy Palmer of Tamr and Steph Locke. [8] [4] DataOps is a moniker for "Data Operations." [3] 2017 was a significant year for DataOps with significant ecosystem development, analyst coverage, increased keyword searches, surveys, publications, and open source projects. [9] Gartner named DataOps on the Hype Cycle for Data Management in 2018. [10]

DataOps heritage from DevOps, Agile, and manufacturing Dataops.gif
DataOps heritage from DevOps, Agile, and manufacturing

Goals and philosophy

The volume of data is forecast to grow at a rate of 32% CAGR to 180 Zettabytes by the year 2025 (Source: IDC). [6] DataOps seeks to provide the tools, processes, and organizational structures to cope with this significant increase in data. [6] Automation streamlines the daily demands of managing large integrated databases, freeing the data team to develop new analytics in a more efficient and effective way. [11] [4] DataOps seeks to increase velocity, reliability, and quality of data analytics. [12] It emphasizes communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL (extract, transform, load) engineers, information technology (IT), and quality assurance/governance.

Implementation

Toph Whitmore at Blue Hill Research offers these DataOps leadership principles for the information technology department: [2]

Events

Related Research Articles

Electric Cloud, Inc. was a privately held, DevOps software company based in San Jose, CA. Founded in 2002, Electric Cloud was a provider of application release orchestration (ARO) tools, automating release pipelines and managing application life cycles. Electric Cloud's products included ElectricFlow and ElectricAccelerator.

<span class="mw-page-title-main">Parasoft</span> Software testing framework

Parasoft is an independent software vendor specializing in automated software testing and application security with headquarters in Monrovia, California. It was founded in 1987 by four graduates of the California Institute of Technology who planned to commercialize the parallel computing software tools they had been working on for the Caltech Cosmic Cube, which was the first working hypercube computer built.

RTTS is a professional services organization that provides software quality outsourcing, training, and resources for business applications. With offices in New York City, Philadelphia, Atlanta, and Phoenix, RTTS serves mid-sized to large corporations throughout North America. RTTS uses the software quality and test solutions from IBM, Hewlett Packard Enterprise, Microsoft and other vendors and open source tools to perform software performance testing, functional test automation, big data testing, data warehouse/ETL testing, mobile application testing, security testing and service virtualization.

<span class="mw-page-title-main">Release management</span> Process of software building

Release management is the process of managing, planning, scheduling and controlling a software build through different stages and environments; it includes testing and deploying software releases.

DevOps is a methodology in the software development and IT industry. Used as a set of practices and tools, DevOps integrates and automates the work of software development (Dev) and IT operations (Ops) as a means for improving and shortening the systems development life cycle.

Continuous testing is the process of executing automated tests as part of the software delivery pipeline to obtain immediate feedback on the business risks associated with a software release candidate. Continuous testing was originally proposed as a way of reducing waiting time for feedback to developers by introducing development environment-triggered tests as well as more traditional developer/tester-triggered tests.

Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time and, following a pipeline through a "production-like environment", without doing so manually. It aims at building, testing, and releasing software with greater speed and frequency. The approach helps reduce the cost, time, and risk of delivering changes by allowing for more incremental updates to applications in production. A straightforward and repeatable deployment process is important for continuous delivery.

Marketing automation refers to software platforms and technologies designed for marketing departments and organizations to more effectively market on multiple channels online and automate repetitive tasks.

<span class="mw-page-title-main">BuildMaster</span>

BuildMaster is an application release automation tool, designed by the software development team Inedo. It combines build management and ARA capabilities to manage and automate processes primarily related to continuous integration, database change scripts, and production deployments, overall releasing applications reliably. The tool is browser-based and able to be used "out-of-the-box". Its feature set and scope puts it in line with the DevOps movement, and is marketed as "more than a release automatigs together the people, processes, and practices that allow teams to deliver software rapidly, reliably, and responsibly.” It's a tool that embodies incremental DevOps adoption.

<span class="mw-page-title-main">Dynatrace</span> American technology company

Dynatrace, Inc. is a global technology company that provides a software observability platform based on artificial intelligence (AI) and automation. Dynatrace technologies are used to monitor, analyze, and optimize application performance, software development and security practices, IT infrastructure, and user experience for businesses and government agencies throughout the world.

XebiaLabs is an independent software company specializing in DevOps and continuous delivery for large enterprise organizations. XebiaLabs offers a DevOps Platform for application-release automation (ARO). These components include release orchestration, deployment automation and DevOps intelligence.

Infrastructure as code (IaC) is the process of managing and provisioning computer data center resources through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. The IT infrastructure managed by this process comprises both physical equipment, such as bare-metal servers, as well as virtual machines, and associated configuration resources. The definitions may be in a version control system, rather than maintaining the code through manual processes. The code in the definition files may use either scripts or declarative definitions, but IaC more often employs declarative approaches.

<span class="mw-page-title-main">DevOps toolchain</span> DevOps toolchain release package.

A DevOps toolchain is a set or combination of tools that aid in the delivery, development, and management of software applications throughout the systems development life cycle, as coordinated by an organisation that uses DevOps practices.

<span class="mw-page-title-main">Tricentis</span> Austrian software testing company

Tricentis is a software testing company founded in 2007 and headquartered in Austin, Texas. It provides software testing automation and software quality assurance products for enterprise software.

Continuous configuration automation (CCA) is the methodology or process of automating the deployment and configuration of settings and software for both physical and virtual data center equipment.

<span class="mw-page-title-main">Katalon Studio</span> Automation testing software tool

Katalon Platform is an automation testing software tool developed by Katalon, Inc. The software is built on top of the open-source automation frameworks Selenium, Appium with a specialized IDE interface for web, API, mobile and desktop application testing. Its initial release for internal use was in January 2015. Its first public release was in September 2016. In 2018, the software acquired 9% of market penetration for UI test automation, according to The State of Testing 2018 Report by SmartBear.

<span class="mw-page-title-main">MLOps</span> Approach to machine learning lifecycle management

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of "machine learning" and the continuous development practice of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems. Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation, orchestration, and deployment, to health, diagnostics, governance, and business metrics. According to Gartner, MLOps is a subset of ModelOps. MLOps is focused on the operationalization of ML models, while ModelOps covers the operationalization of all types of AI models.

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics. AIOps is the acronym of "Artificial Intelligence Operations". Such operation tasks include automation, performance monitoring and event correlations among others.

<span class="mw-page-title-main">ModelOps</span>

ModelOps, as defined by Gartner, "is focused primarily on the governance and lifecycle management of a wide range of operationalized artificial intelligence (AI) and decision models, including machine learning, knowledge graphs, rules, optimization, linguistic and agent-based models". "ModelOps lies at the heart of any enterprise AI strategy". It orchestrates the model lifecycles of all models in production across the entire enterprise, from putting a model into production, then evaluating and updating the resulting application according to a set of governance rules, including both technical and business KPI's. It grants business domain experts the capability to evaluate AI models in production, independent of data scientists.

TestOps refers to the discipline of managing the operational aspects of testing within the software delivery lifecycle.

References

  1. Ereth, Julian (2018). "DataOps-Towards a Definition" (PDF). Proceedings of LWDA 2018: 109.
  2. 1 2 "DataOps – It's a Secret". www.datasciencecentral.com. Retrieved 2017-04-05.
  3. 1 2 3 "What is DataOps (data operations)? - Definition from WhatIs.com". SearchDataManagement. Retrieved 2017-04-05.
  4. 1 2 3 4 "From DevOps to DataOps, By Andy Palmer - Tamr Inc". Tamr Inc. 2015-05-07. Archived from the original on 2018-07-12. Retrieved 2017-03-21.
  5. DataKitchen (2017-03-07). "Lean Manufacturing Secrets that You Can Apply to Data Analytics". Medium. Retrieved 2017-08-24.
  6. 1 2 3 "What is DataOps? | Nexla: Scalable Data Operations Platform for the Machine Learning Age". www.nexla.com. Retrieved 2017-09-07.
  7. "3 reasons why DataOps is essential for big data success". IBM Big Data & Analytics Hub. Retrieved 2018-08-10.
  8. Mango Solutions: #DataOps - it's a thing (honest) , retrieved 2021-06-28
  9. DataKitchen (2017-12-19). "2017: The Year of DataOps". data-ops. Retrieved 2018-01-24.
  10. "Gartner Hype Cycle for Data Management Positions Three Technologies in the Innovation Trigger Phase in 2018". Gartner. Retrieved 2019-07-19.
  11. "5 trends driving Big Data in 2017". CIO Dive. Retrieved 2017-09-07.
  12. "Unravel Data Advances Application Performance Management for Big Data". Database Trends and Applications. 2017-03-10. Retrieved 2017-09-07.
  13. "DataOpticon - YouTube". www.youtube.com. Retrieved 2021-06-28.
  14. "DataOps Summit". www.dataopssummit-sf.com. Archived from the original on 2021-07-02. Retrieved 2021-06-28.
  15. Intelligence, Corinium Global. "DataOps Champions Online 2021 | Corinium". dco-dataops.coriniumintelligence.com. Retrieved 2021-06-28.