Software composition analysis

Last updated

Software composition analysis (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements. [1]

Contents

Background

It is a common software engineering practice to develop software by using different components. [2] Using software components segments the complexity of larger elements into smaller pieces of code and increases flexibility by enabling easier reuse of components to address new requirements. [3] The practice has widely expanded since the late 1990s with the popularization of open-source software (OSS) to help speed up the software development process and reduce time to market. [4]

However, using open-source software introduces many risks for the software applications being developed. These risks can be organized into 5 categories: [5]

Shortly after the foundation of the Open Source Initiative in February 1998, [6] the risks associated with OSS were raised [7] and organizations tried to manage this using spreadsheets and documents to track all the open source components used by their developers. [8]

For organizations using open-source components extensively, there was a need to help automate the analysis and management of open source risk. This resulted in a new category of software products called Software Composition Analysis (SCA) which helps organizations manage open source risk. SCA strives to detect all the 3rd party components in use within a software application to help reduce risks associated with security vulnerabilities, IP licensing requirements, and obsolescence of components being used.

Principle of operation

SCA products typically work as follows: [9]

Usage

As SCA impacts different functions in organizations, different teams may use the data depending on the organization's corporation size and structure. The IT department will often use SCA for implementing and operationalizing the technology with common stakeholders including the chief information officer (CIO), the Chief Technology Officer (CTO), and the Chief Enterprise Architects (EA). [12] Security and license data are often used by roles such as Chief Information Security Officers (CISO) for security risks, and Chief IP / Compliance officer for Intellectual Property risk management. [13]

Depending on the SCA product capabilities, it can be implemented directly within a developer's Integrated Development Environment (IDE) who uses and integrates OSS components, or it can be implemented as a dedicated step in the software quality control process. [14] [15]

SCA products, and particularly their capacity to generate an SBOM is required in some countries such as the United States to enforce the security of software delivered to one of their agencies by a vendor. [16]

Another common use case for SCA is for Technology Due diligence. Prior to a Merger & Acquisition (M&A) transaction, Advisory firms review the risks associated with the software of the target firm. [17]

SCA Strengths

The automatic nature of SCA products is their primary strength. Developers don't have to manually do an extra work when using and integrating OSS components. [18] The automation also applies to indirect references to other OSS components within code and artifacts [19]

SCA Weaknesses

Conversely, some key weaknesses of current SCA products may include:

See also

Related Research Articles

Software engineering is an engineering approach to software development. A practitioner, called a software engineer, applies the engineering design process to develop software.

In computer science, static program analysis is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution in the integrated environment.

<span class="mw-page-title-main">Open-source software</span> Software licensed to ensure source code usage rights

Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open-source software may be developed in a collaborative, public manner. Open-source software is a prominent example of open collaboration, meaning any capable user is able to participate online in development, making the number of possible contributors indefinite. The ability to examine the code facilitates public trust in the software.

<span class="mw-page-title-main">Code review</span> Activity where one or more people check a programs code

Code review is a software quality assurance activity in which one or more people check a program, mainly by viewing and reading parts of its source code, either after implementation or as an interruption of implementation. At least one of the persons must not have authored the code. The persons performing the checking, excluding the author, are called "reviewers".

In the context of software engineering, software quality refers to two related but distinct notions:

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, e.g., in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Architecture description languages (ADLs) are used in several disciplines: system engineering, software engineering, and enterprise modelling and engineering.

Application security includes all tasks that introduce a secure software development life cycle to development teams. Its final goal is to improve security practices and, through that, to find, fix and preferably prevent security issues within applications. It encompasses the whole application life cycle from requirements analysis, design, implementation, verification as well as maintenance.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Software assurance (SwA) is a critical process in software development that ensures the reliability, safety, and security of software products. It involves a variety of activities, including requirements analysis, design reviews, code inspections, testing, and formal verification. One crucial component of software assurance is secure coding practices, which follow industry-accepted standards and best practices, such as those outlined by the Software Engineering Institute (SEI) in their CERT Secure Coding Standards (SCS).

Experimental software engineering involves running experiments on the processes and procedures involved in the creation of software systems, with the intent that the data be used as the basis of theories about the processes involved in software engineering. A number of research groups primarily use empirical and experimental techniques.

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time. It aims at building, testing, and releasing software with greater speed and frequency. The approach helps reduce the cost, time, and risk of delivering changes by allowing for more incremental updates to applications in production. A straightforward and repeatable deployment process is important for continuous delivery.

Software intelligence is insight into the inner workings and structural condition of software assets produced by software designed to analyze database structure, software framework and source code to better understand and control complex software systems in information technology environments. Similarly to business intelligence (BI), software intelligence is produced by a set of software tools and techniques for the mining of data and the software's inner-structure. Results are automatically produced and feed a knowledge base containing technical documentation and blueprints of the innerworking of applications, and make it available to all to be used by business and software stakeholders to make informed decisions, measure the efficiency of software development organizations, communicate about the software health, prevent software catastrophes.

<span class="mw-page-title-main">American Fuzzy Lop (software)</span> Software fuzzer that employs genetic algorithms

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

EvoSuite is a tool that automatically generates unit tests for Java software. EvoSuite uses an evolutionary algorithm to generate JUnit tests. EvoSuite can be run from the command line, and it also has plugins to integrate it in Maven, IntelliJ and Eclipse. EvoSuite has been used on more than a hundred open-source software and several industrial systems, finding thousands of potential bugs.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

A software bot is a type of software agent in the service of software project management and software engineering. A software bot has an identity and potentially personified aspects in order to serve their stakeholders. Software bots often compose software services and provide an alternative user interface, which is sometimes, but not necessarily conversational.

Static application security testing (SAST) is used to secure software by reviewing the source code of the software to identify sources of vulnerabilities. Although the process of statically analyzing the source code has existed as long as computers have existed, the technique spread to security in the late 90s and the first public discussion of SQL injection in 1998 when Web applications integrated new technologies like JavaScript and Flash.

In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code, but has since been employed to analyze web applications, cloud deployments, and smart contracts. Beyond vulnerability discovery, code property graphs find applications in code clone detection, attack-surface detection, exploit generation, measuring code testability, and backporting of security patches.

References

  1. Prana, Gede Artha Azriadi; Sharma, Abhishek; Shar, Lwin Khin; Foo, Darius; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (July 2021). "Out of sight, out of mind? How vulnerable dependencies affect open-source projects". Empirical Software Engineering. 26 (4). Springer: 1–34. doi:10.1007/s10664-021-09959-3. S2CID   197679660.
  2. Nierstrasz, Oscar; Meijler, Theo Dirk (1995). "Research directions in software composition". ACM Computing Surveys. 27 (2). ACM: 262–264. doi: 10.1145/210376.210389 . S2CID   17612128.
  3. Nierstrasz, Oscar; Dami, Laurent (January 1995). Object-oriented software composition. Prentice Hall International (UK) Ltd. pp. 3–28. CiteSeerX   10.1.1.90.8174 .
  4. De Hoon, Michiel JL; Imoto, Seiya; Nolan, John; Miyano, Satoru (February 2004). "Open source clustering software". Bioinformatics. Oxford University Press: 1453–1454. CiteSeerX   10.1.1.114.3335 .
  5. Duc Linh, Nguyen; Duy Hung, Phan; Dipe, Vu Thu (2019). "Risk Management in Projects Based on Open-Source Software". Proceedings of the 2019 8th International Conference on Software and Computer Applications. pp. 178–183. doi:10.1145/3316615.3316648. ISBN   9781450365734. S2CID   153314145.
  6. "History of the OSI". Opensource.org. 19 September 2006.
  7. Payne, Christian (2002). "On the security of open source software" (PDF). Information Systems Journal. 12: 61–78. doi:10.1046/j.1365-2575.2002.00118.x. S2CID   8123076.
  8. Kaur, Sumandeep (April 2020). "Security Issues in Open-Source Software" (PDF). International Journal of Computer Science & Communication: 47–51.
  9. Ombredanne, Philippe (October 2020). "Free and Open Source Software License Compliance: Tools for Software Composition Analysis". Computer. 53 (10). IEEE: 262–264. doi: 10.1109/MC.2020.3011082 . S2CID   222232127.
  10. Duan, Ruian; Bijlani, Ashish; Xu, Meng; Kim, Taesoo; Lee, Wenke (2017). "Identifying Open-Source License Violation and 1-day Security Risk at Large Scale". Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM. pp. 2169–2185. doi:10.1145/3133956.3134048. ISBN   9781450349468. S2CID   7402387.
  11. Arora, Arushi; Wright, Virginia; Garman, Christina (2022). "Strengthening the Security of Operational Technology: Understanding Contemporary Bill of Materials" (PDF). JCIP the Journal of Critical Infrastructure Policy: 111.
  12. Bailey, T.; Greis, J.; Watters, M.; Welle, J. (19 September 2022). "Software bill of materials: Managing software cybersecurity risks". McKinsey & Company . Retrieved 6 January 2024.
  13. Popp, Karl Michael (30 October 2019). Best Practices for commercial use of open source software. BoD – Books on Demand, 2019. p. 10. ISBN   9783750403093.
  14. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (October 2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM. pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  15. Sun, Xiaohan; Cheng, Yunchang; Qu, Xiaojie; Li, Hang (June 2021). "Design and Implementation of Security Test Pipeline based on DevSecOps". 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). Vol. 4. IEEE. pp. 532–535. doi:10.1109/IMCEC51613.2021.9482270. ISBN   978-1-7281-8535-4. S2CID   236193144.
  16. "Software Bill of Materials Elements and Considerations". Federal Register . 6 February 2021. Retrieved 6 January 2024.
  17. Serafini, Daniele; Zacchiroli, Stefano (September 2022). "Efficient Prior Publication Identification for Open Source Code". The 18th International Symposium on Open Collaboration. Vol. 4. ACM. pp. 1–8. arXiv: 2207.11057 . doi:10.1145/3555051.3555068. ISBN   9781450398459. S2CID   251018650.
  18. Chen, Yang; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (September 2020). "Automated identification of libraries from vulnerability data". Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. pp. 90–99. doi:10.1145/3377813.3381360. ISBN   9781450371230. S2CID   211167417.
  19. Kengo Oka, Dennis (2021). "Software Composition Analysis in the Automotive Industry". Building Secure Cars: Assuring the Automotive Software Development Lifecycle. Wiley: 91–110. doi:10.1002/9781119710783. ISBN   9781119710783. S2CID   233582862.
  20. Rajapakse, Roshan Namal; Zahedi, Mansooreh; Babar, Muhammad Ali (2021). "An Empirical Analysis of Practitioners' Perspectives on Security Tool Integration into DevOps". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–12. arXiv: 2107.02096 . doi:10.1145/3475716.3475776. ISBN   9781450386654. S2CID   235731939.
  21. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  22. "Component Analysis". owasp.org.
  23. Foo, Darius; Chua, Hendy; Yeo, Jason; Ang, Ming Yi; Sharma, Asankhaya (2018). "Efficient static checking of library updates". Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 791–796. doi:10.1145/3236024.3275535. ISBN   9781450355735. S2CID   53079466.
  24. Millar, Stuart (November 2017). "Vulnerability Detection in Open Source Software: The Cure and the Cause" (PDF). Queen's University Belfast.