Continuous analytics

Last updated

Continuous analytics is a data science process that abandons ETLs and complex batch data pipelines in favor of cloud-native and microservices paradigms. Continuous data processing enables real time interactions and immediate insights with fewer resources.

Contents

Defined

Analytics is the application of mathematics and statistics to big data. Data scientists write analytics programs to look for solutions to business problems, like forecasting demand or setting an optimal price. The continuous approach runs multiple stateless engines which concurrently enrich, aggregate, infer and act on the data. Data scientists, dashboards and client apps all access the same raw or real-time data derivatives with proper identity-based security, data masking and versioning in real-time.

Traditionally, data scientists have not been part of IT development teams, like regular Java programmers. This is because their skills set them apart in their own department not normally related to IT, i.e., math, statistics, and data science. So it is logical to conclude that their approach to writing software code does not enjoy the same efficiencies as the traditional programming team. In particular traditional programming has adopted the Continuous Delivery approach to writing code and the agile methodology. That releases software in a continuous circle, called iterations.

Continuous analytics then is the extension of the continuous delivery software development model to the big data a nalytics development team. The goal of the continuous analytics practitioner then is to find ways to incorporate writing analytics code and installing big data software into the agile development model of automatically running unit and functional tests and building the environment system with automated tools.

To make this work means getting data scientists to write their code in the same code repository that regular programmers use so that software can pull it from there and run it through the build process. It also means saving the configuration of the big data cluster (sets of virtual machines) in some kind of repository as well. That facilitates sending out analytics code and big data software and objects in the same automated way as the continuous integration process. [1] [2]

Related Research Articles

Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task. Programming involves tasks such as: analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms in a chosen programming language. The source code of a program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit. The purpose of programming is to find a sequence of instructions that will automate the performance of a task on a computer, often for solving a given problem. Proficient programming thus often requires expertise in several different subjects, including knowledge of the application domain, specialized algorithms, and formal logic.

Programmer Person who writes computer software

A computer programmer, sometimes called a software developer, a programmer or more recently a coder, is a person who creates computer software. The term computer programmer can refer to a specialist in one area of computers, or to a generalist who writes code for many kinds of software.

Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components. Software development is a process of writing and maintaining the source code, but in a broader sense, it includes all that is involved between the conception of the desired software through to the final manifestation of the software, sometimes in a planned and structured process. Therefore, software development may include research, new development, prototyping, modification, reuse, re-engineering, maintenance, or any other activities that result in software products.

The following outline is provided as an overview of and topical guide to software engineering:

Test-driven development (TDD) is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass. This is opposed to software development that allows code to be added that is not proven to meet requirements.

In software development, agile approaches development requirements and solutions through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s). It advocates adaptive planning, evolutionary development, early delivery, and continual improvement, and it encourages flexible responses to change.

In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing.

In software engineering, continuous integration (CI) is the practice of merging all developers' working copies to a shared mainline several times a day. Grady Booch first proposed the term CI in his 1991 method, although he did not advocate integrating several times a day. Extreme programming (XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day.

Lean software development is a translation of lean manufacturing principles and practices to the software development domain. Adapted from the Toyota Production System, it is emerging with the support of a pro-lean subculture within the Agile community. Lean offers a solid conceptual framework, values and principles, as well as good practices, derived from experience, that support agile organizations.

Open-source software development is the process by which open-source software, or similar software whose source code is publicly available, is developed by an open-source software project. These are software products available with its source code under an open-source license to study, change, and improve its design. Examples of some popular open-source software products are Mozilla Firefox, Google Chromium, Android, LibreOffice and the VLC media player. Open-source software development has been a large part of the creation of the World Wide Web as we know it, with Tim Berners-Lee contributing his HTML code development as the original platform upon which the internet is now built.

Extreme programming (XP) is an agile software development methodology used to implement software projects. This article details the practices used in this methodology. Extreme programming has 12 practices, grouped into four areas, derived from the best practices of software engineering.

In software engineering, a software development process is the process of dividing software development work into distinct phases to improve design, product management, and project management. It is also known as a software development life cycle (SDLC). The methodology may include the pre-definition of specific deliverables and artifacts that are created and completed by a project team to develop or maintain an application.

A programming team is a team of people who develop or maintain computer software. They may be organised in numerous ways, but the egoless programming team and chief programmer team have been common structures.

DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology.

Lean integration is a management system that emphasizes creating value for customers, continuous improvement, and eliminating waste as a sustainable data integration and system integration practice. Lean integration has parallels with other lean disciplines such as lean manufacturing, lean IT, and lean software development. It is a specialized collection of tools and techniques that address the unique challenges associated with seamlessly combining information and processes from systems that were independently developed, are based on incompatible data models, and remain independently managed, to achieve a cohesive holistic operation.

Enterprise release management (ERM) is a multi-disciplinary IT governance framework for managing software delivery and software change across multiple departments in a large organization. ERM builds upon release management and combines it with other aspects of IT management including Business-IT alignment, IT service management, IT Governance, and Configuration management. ERM places considerable emphasis on project management and IT portfolio management supporting the orchestration of people, process, and technology across multiple departments and application development teams to deliver large, highly integrated software changes within the context of an IT portfolio.

Software analytics is the analytics specific to the domain of software systems taking into account source code, static and dynamic characteristics as well as related processes of their development and evolution. It aims at describing, monitoring, predicting, and improving efficiency and effectivity of software engineering throughout the software lifecycle, in particular during software development and software maintenance. The data collection is typically done by mining software repositories, but can also be achieved by collecting user actions or production data. One avenue for using the collected data is to augment the integrated development environments (IDEs) with data-driven features.

Agile Business Intelligence (BI) refers to the use of Agile software development for BI projects to reduce the time it takes for traditional BI to show value to the organization, and to help in quickly adapting to changing business needs. Agile BI enables the BI team and managers to make better business decisions, and to start doing this more quickly.

Extreme programming Software development methodology

Extreme programming (XP) is a software development methodology which is intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent "releases" in short development cycles, which is intended to improve productivity and introduce checkpoints at which new customer requirements can be adopted.

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

References

  1. "Continuous Analytics Defined". Southern Pacific Review. Southern Pacific Review. Retrieved 17 May 2016.
  2. Pushkarev, Stepan. "Tear down the Wall between Data Science and DevOps". LinkedIN. LinkedIN. Retrieved 17 May 2016.