Software analytics

Last updated

Software analytics is the analytics specific to the domain of software systems taking into account source code, static and dynamic characteristics (e.g., software metrics) as well as related processes of their development and evolution. It aims at describing, monitoring, predicting, and improving the efficiency and effectiveness of software engineering throughout the software lifecycle, in particular during software development and software maintenance. The data collection is typically done by mining software repositories, but can also be achieved by collecting user actions or production data.

Contents

Definitions

Aims

Software analytics aims at supporting decisions and generating insights, i.e., findings, conclusions, and evaluations about software systems and their implementation, composition, behavior, quality, evolution as well as about the activities of various stakeholders of these processes.

Approach

Methods, techniques, and tools of software analytics typically rely on gathering, measuring, analyzing, and visualizing information found in the manifold data sources stored in software development environments and ecosystems. Software systems are well suited for applying analytics because, on the one hand, mostly formalized and precise data is available and, on the other hand, software systems are extremely difficult to manage ---in a nutshell: "software projects are highly measurable, but often unpredictable." [2]

Core data sources include source code, "check-ins, work items, bug reports and test executions [...] recorded in software repositories such as CVS, Subversion, GIT, and Bugzilla." [4] Telemetry data as well as execution traces or logs can also be taken into account.

Automated analysis, massive data, and systematic reasoning support decision-making at almost all levels. In general, key technologies employed by software analytics include analytical technologies such as machine learning, data mining, statistics, pattern recognition, information visualization as well as large-scale data computing & processing. For example, software analytics tools allow users to map derived analysis results by means of software maps, which support interactively exploring system artifacts and correlated software metrics. There are also software analytics tools using analytical technologies on top of software quality models in agile software development companies, which support assessing software qualities (e.g., reliability), and deriving actions for their improvement. [5]

History

In May 2009, software analytics was first coined and proposed when Dongmei Zhang founded the Software Analytics Group (SA) at Microsoft Research Asia (MSRA). The term has become well known in the software engineering research community after a series of tutorials and talks on software analytics were given by Zhang and her colleagues, in collaboration with Tao Xie from North Carolina State University, at software engineering conferences including a tutorial at the IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), [6] a talk at the International Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), [7] a tutorial and a keynote talk given by Zhang at the IEEE-CS Conference on Software Engineering Education and Training, [8] [9] a tutorial at the International Conference on Software Engineering - Software Engineering in Practice Track, [10] and a keynote talk given by Zhang at the Working Conference on Mining Software Repositories. [11]

In November 2010, Software Development Analytics (Software Analytics with a focus on Software Development) was proposed by Thomas Zimmermann and his colleagues at the Empirical Software Engineering Group (ESE) at Microsoft Research Redmond in their FoSER 2010 paper. [12] A goldfish bowl panel on software development analytics was organized by Zimmermann and Tim Menzies from West Virginia University at the International Conference on Software Engineering, Software Engineering in Practice Track. [13]

See also

Related Research Articles

Software engineering is a systematic engineering approach to software development.

Code review is a software quality assurance activity in which one or several people check a program mainly by viewing and reading parts of its source code, and they do so after implementation or as an interruption of implementation. At least one of the persons must not be the code's author. The persons performing the checking, excluding the author, are called "reviewers".

Requirements engineering (RE) is the process of defining, documenting, and maintaining requirements in the engineering design process. It is a common role in systems engineering and software engineering.

Peter Pin-Shan Chen is a Taiwanese American computer scientist. He is a (retired) distinguished career scientist and faculty member at Carnegie Mellon University and Distinguished Chair Professor Emeritus at LSU. He is known for the development of the entity–relationship model in 1976.

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Rigi is an interactive graph editor tool for software reverse engineering using the white box method, i.e. necessitating source code, thus it is mainly aimed at program comprehension. Rigi is distributed by its main author, Hausi A. Müller and the Rigi research group at the University of Victoria.

A software regression is a type of software bug where a feature that has worked before stops working. This may happen after changes are applied to the software's source code, including the addition of new features and bug fixes. They may also be introduced by changes to the environment in which the software is running, such as system upgrades, system patching or a change to daylight saving time. A software performance regression is a situation where the software still functions correctly, but performs more slowly or uses more memory or resources than before. Various types of software regressions have been identified in practice, including the following:

Requirements traceability is a sub-discipline of requirements management within software development and systems engineering. Traceability as a general term is defined by the IEEE Systems and Software Engineering Vocabulary as (1) the degree to which a relationship can be established between two or more products of the development process, especially products having a predecessor-successor or primary-subordinate relationship to one another; (2) the identification and documentation of derivation paths (upward) and allocation or flowdown paths (downward) of work products in the work product hierarchy; (3) the degree to which each element in a software development product establishes its reason for existing; and (4) discernible association among two or more logical entities, such as requirements, system elements, verifications, or tasks.

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

Carlo Ghezzi is a professor and chair of software engineering at the Politecnico di Milano, Italy and an adjunct professor at the Università della Svizzera italiana (USI), Switzerland. At the Politecnico, he is the Rector's Delegate for research; he has been department chair, head of the PhD program, member of the academic senate and of the board of governors of Politecnico.

In computer science, in-memory processing is an emerging technology for processing of data stored in an in-memory database. In-memory processing is one method of addressing the performance and power bottlenecks caused by the movement of data between the processor and the main memory. Older systems have been based on disk storage and relational databases using SQL query language, but these are increasingly regarded as inadequate to meet business intelligence (BI) needs. Because stored data is accessed much more quickly when it is placed in random-access memory (RAM) or flash memory, in-memory processing allows data to be analysed in real time, enabling faster reporting and decision-making in business.

Within software engineering, the mining software repositories (MSR) field analyzes the rich data available in software repositories, such as version control repositories, mailing list archives, bug tracking systems, issue tracking systems, etc. to uncover interesting and actionable information about software systems, projects and software engineering.

Simcenter Amesim is a commercial simulation software for the modeling and analysis of multi-domain systems. It is part of systems engineering domain and falls into the mechatronic engineering field.

A software map represents static, dynamic, and evolutionary information of software systems and their software development processes by means of 2D or 3D map-oriented information visualization. It constitutes a fundamental concept and tool in software visualization, software analytics, and software diagnosis. Its primary applications include risk analysis for and monitoring of code quality, team activity, or software development progress and, generally, improving effectiveness of software engineering with respect to all related artifacts, processes, and stakeholders throughout the software engineering process and software maintenance.

Software diagnosis refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behaviour, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic.

Software Intelligence is insight into the inner-working and structural condition of software assets produced by software designed to analyze database structure, software framework and source code to better understand and control complex software systems in Information Technology environments. Similarly to Business Intelligence (BI), Software Intelligence is produced by a set of software tools and techniques for the mining of data and software's inner-structure. End results are automatically produced and feed a knowledge base containing technical documentation and make it available to all to be used by business and software stakeholders to make informed decisions, measure the efficiency of software development organizations, communicate about the software health, prevent software catastrophes.

The research project SCALARE is a European ITEA 2 project.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Wei Wang is a Chinese-born American computer scientist. She is the Leonard Kleinrock Chair Professor in Computer Science and Computational Medicine at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). Her research specializes in big data analytics and modeling, database systems, natural language processing, bioinformatics and computational biology, and computational medicine.

References

  1. D. Zhang, S. han, Y. Dan, J.-G. Lou, H Zhang: "Software Analytics in Practice". IEEE Software, Sept./Oct. 2013, pp. 30-35.
  2. 1 2 Raymond P. L. Buse and Thomas Zimmermann. "Information Needs for Software Development Analytics." In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Zurich, Switzerland, June 2012, pp. 987-996.
  3. T. M. Abdellatif, L. F. Capretz, D. Ho. "Software Analytics to Software Practice: A Systematic Literature Review". 1. Int'l Workshop on Big Data Engineering, 2015, pp. 30-36.
  4. Harald Gall, Tim Menzies, Laurie Williams, and Thomas Zimmerman. "Software Development Analytics". Dagstuhl Reports, Vol. 4, Issue 6, pp. 64-83.
  5. Martínez-Fernández, Silverio; Vollmer, Anna Maria; Jedlitschka, Andreas; Franch, Xavier; Lopez, Lidia; Ram, Prabhat; Rodriguez, Pilar; Aaramaa, Sanja; Bagnato, Alessandra (2019). "Continuously assessing and improving software quality with software analytics tools: a case study" (PDF). IEEE Access. 7: 68219–68239. doi: 10.1109/ACCESS.2019.2917403 . ISSN   2169-3536.
  6. Dongmei Zhang and Tao Xie. "xSA: eXtreme Software Analytics - Marriage of eXtreme Computing and Software Analytics." In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), Tutorial, Lawrence, Kansas, November 2011.
  7. Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. "Software Analytics as a Learning Case in Practice: Approaches and Experiences". In Proceedings of International Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), Lawrence, Kansas, November 2011. PDF Slides
  8. Dongmei Zhang. "Software Analytics in Practice and Its Implications for Education and Training." Keynote. In Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T 2012), Tutorial, Nanjing, China, April 2012.
  9. Dongmei Zhang, Yingnong Dang, Shi Han, and Tao Xie. "Teaching and Training for Software Analytics." In Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T 2012), Tutorial, Nanjing, China, April 2012.
  10. Dongmei Zhang and Tao Xie. "Software Analytics in Practice: Mini Tutorial." In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Mini Tutorial, Zurich, Switzerland, June 2012, pp. 997. Slides
  11. Dongmei Zhang. "MSR 2012 keynote: Software Analytics in Practice - Approaches and Experiences." In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR 2012), Zurich, Switzerland, June 2012, pp. 1.
  12. Raymond P. L. Buse and Thomas Zimmermann. "Analytics for Software Development." In Proceedings of the Workshop on Future of Software Engineering Research (FoSER 2010), Santa Fe, NM, USA, November 2010, pp. 77-80.
  13. Tim Menzies and Thomas Zimmermann. "Goldfish Bowl Panel: Software Development Analytics." In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Software Engineering in Practice, Zurich, Switzerland, June 2012, pp. 1032-1033.