Software composition analysis

Last updated

Software composition analysis (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements. [1]

Contents

Background

It is a common software engineering practice to develop software by using different components. [2] Using software components segments the complexity of larger elements into smaller pieces of code and increases flexibility by enabling easier reuse of components to address new requirements. [3] The practice has widely expanded since the late 1990s with the popularization of open-source software (OSS) to help speed up the software development process and reduce time to market. [4]

However, using open-source software introduces many risks for the software applications being developed. These risks can be organized into 5 categories: [5]

Shortly after the foundation of the Open Source Initiative in February 1998, [6] the risks associated with OSS were raised [7] and organizations tried to manage this using spreadsheets and documents to track all the open source components used by their developers. [8]

For organizations using open-source components extensively, there was a need to help automate the analysis and management of open source risk. This resulted in a new category of software products called Software Composition Analysis (SCA) which helps organizations manage open source risk. SCA strives to detect all the 3rd party components in use within a software application to help reduce risks associated with security vulnerabilities, IP licensing requirements, and obsolescence of components being used.

Principle of operation

SCA products typically work as follows: [9]

Usage

As SCA impacts different functions in organizations, different teams may use the data depending on the organization's corporation size and structure. The IT department will often use SCA for implementing and operationalizing the technology with common stakeholders including the chief information officer (CIO), the Chief Technology Officer (CTO), and the Chief Enterprise Architects (EA). [12] Security and license data are often used by roles such as Chief Information Security Officers (CISO) for security risks, and Chief IP / Compliance officer for Intellectual Property risk management. [13]

Depending on the SCA product capabilities, it can be implemented directly within a developer's Integrated Development Environment (IDE) who uses and integrates OSS components, or it can be implemented as a dedicated step in the software quality control process. [14] [15]

SCA products, and particularly their capacity to generate an SBOM is required in some countries such as the United States to enforce the security of software delivered to one of their agencies by a vendor. [16]

Another common use case for SCA is for Technology Due diligence. Prior to a Merger & Acquisition (M&A) transaction, Advisory firms review the risks associated with the software of the target firm. [17]

Strengths

The automatic nature of SCA products is their primary strength. Developers don't have to manually do an extra work when using and integrating OSS components. [18] The automation also applies to indirect references to other OSS components within code and artifacts. [19]

Weaknesses

Conversely, some key weaknesses of current SCA products may include:

See also

Related Research Articles

Software engineering is a field within computer science focused on designing, developing, testing, and maintaining of software applications. It involves applying engineering principles and computer programming expertise to develop software systems that meet user needs.

In computer science, static program analysis is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution in the integrated environment.

<span class="mw-page-title-main">Open-source software</span> Software licensed to ensure source code usage rights

Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open-source software may be developed in a collaborative, public manner. Open-source software is a prominent example of open collaboration, meaning any capable user is able to participate online in development, making the number of possible contributors indefinite. The ability to examine the code facilitates public trust in the software.

<span class="mw-page-title-main">Code review</span> Activity where one or more people check a programs code

Code review is a software quality assurance activity in which one or more people examine the source code of a computer program, either after implementation or during the development process. The persons performing the checking, excluding the author, are called "reviewers". At least one reviewer must not be the code's author.

Secure by design, in software engineering, means that software products and capabilities have been designed to be foundationally secure.

<span class="mw-page-title-main">Fuzzing</span> Automated software testing technique

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks. Typically, fuzzers are used to test programs that take structured inputs. This structure is specified, such as in a file format or protocol and distinguishes valid from invalid input. An effective fuzzer generates semi-valid inputs that are "valid enough" in that they are not directly rejected by the parser, but do create unexpected behaviors deeper in the program and are "invalid enough" to expose corner cases that have not been properly dealt with.

Architecture description languages (ADLs) are used in several disciplines: system engineering, software engineering, and enterprise modelling and engineering.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Software assurance (SwA) is a critical process in software development that ensures the reliability, safety, and security of software products. It involves a variety of activities, including requirements analysis, design reviews, code inspections, testing, and formal verification. One crucial component of software assurance is secure coding practices, which follow industry-accepted standards and best practices, such as those outlined by the Software Engineering Institute (SEI) in their CERT Secure Coding Standards (SCS).

Experimental software engineering involves running experiments on the processes and procedures involved in the creation of software systems, with the intent that the data be used as the basis of theories about the processes involved in software engineering. A number of research groups primarily use empirical and experimental techniques.

Security testing is a process intended to detect flaws in the security mechanisms of an information system and as such help enable it to protect data and maintain functionality as intended. Due to the logical limitations of security testing, passing the security testing process is not an indication that no flaws exist or that the system adequately satisfies the security requirements.

End-user development (EUD) or end-user programming (EUP) refers to activities and tools that allow end-users – people who are not professional software developers – to program computers. People who are not professional developers can use EUD tools to create or modify software artifacts and complex data objects without significant knowledge of a programming language. In 2005 it was estimated that by 2012 there would be more than 55 million end-user developers in the United States, compared with fewer than 3 million professional programmers. Various EUD approaches exist, and it is an active research topic within the field of computer science and human-computer interaction. Examples include natural language programming, spreadsheets, scripting languages, visual programming, trigger-action programming and programming by example.

Continuous delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time. It aims at building, testing, and releasing software with greater speed and frequency. The approach helps reduce the cost, time, and risk of delivering changes by allowing for more incremental updates to applications in production. A straightforward and repeatable deployment process is important for continuous delivery.

Software intelligence is insight into the inner workings and structural condition of software assets produced by software designed to analyze database structure, software framework and source code to better understand and control complex software systems in information technology environments. Similarly to business intelligence (BI), software intelligence is produced by a set of software tools and techniques for the mining of data and the software's inner-structure. Results are automatically produced and feed a knowledge base containing technical documentation and blueprints of the innerworking of applications, and make it available to all to be used by business and software stakeholders to make informed decisions, measure the efficiency of software development organizations, communicate about the software health, prevent software catastrophes.

<span class="mw-page-title-main">American Fuzzy Lop (software)</span> Software fuzzer that employs genetic algorithms

American Fuzzy Lop (AFL), stylized in all lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server, PHP, OpenSSL, pngcrush, bash, Firefox, BIND, Qt, and SQLite.

EvoSuite is a tool that automatically generates unit tests for Java software. EvoSuite uses an evolutionary algorithm to generate JUnit tests. EvoSuite can be run from the command line, and it also has plugins to integrate it in Maven, IntelliJ and Eclipse. EvoSuite has been used on more than a hundred open-source software and several industrial systems, finding thousands of potential bugs.

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

A software bot is a type of software agent in the service of software project management and software engineering. A software bot has an identity and potentially personified aspects in order to serve their stakeholders. Software bots often compose software services and provide an alternative user interface, which is sometimes, but not necessarily conversational.

Static application security testing (SAST) is used to secure software by reviewing the source code of the software to identify sources of vulnerabilities. Although the process of statically analyzing the source code has existed as long as computers have existed, the technique spread to security in the late 90s and the first public discussion of SQL injection in 1998 when Web applications integrated new technologies like JavaScript and Flash.

In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code, but has since been employed to analyze web applications, cloud deployments, and smart contracts. Beyond vulnerability discovery, code property graphs find applications in code clone detection, attack-surface detection, exploit generation, measuring code testability, and backporting of security patches.

References

  1. Prana, Gede Artha Azriadi; Sharma, Abhishek; Shar, Lwin Khin; Foo, Darius; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (July 2021). "Out of sight, out of mind? How vulnerable dependencies affect open-source projects". Empirical Software Engineering. 26 (4). Springer: 1–34. doi:10.1007/s10664-021-09959-3. S2CID   197679660.
  2. Nierstrasz, Oscar; Meijler, Theo Dirk (1995). "Research directions in software composition". ACM Computing Surveys. 27 (2). ACM: 262–264. doi: 10.1145/210376.210389 . S2CID   17612128.
  3. Nierstrasz, Oscar; Dami, Laurent (January 1995). Object-oriented software composition. Prentice Hall International. pp. 3–28. CiteSeerX   10.1.1.90.8174 .
  4. De Hoon, Michiel JL; Imoto, Seiya; Nolan, John; Miyano, Satoru (February 2004). "Open source clustering software". Bioinformatics. 20 (9): 1453–1454. Bibcode:2004Bioin..20.1453D. CiteSeerX   10.1.1.114.3335 . doi:10.1093/bioinformatics/bth078.
  5. Duc Linh, Nguyen; Duy Hung, Phan; Dipe, Vu Thu (2019). "Risk Management in Projects Based on Open-Source Software". Proceedings of the 2019 8th International Conference on Software and Computer Applications. pp. 178–183. doi:10.1145/3316615.3316648. ISBN   9781450365734. S2CID   153314145.
  6. "History of the OSI". Opensource.org. 19 September 2006.
  7. Payne, Christian (2002). "On the security of open source software" (PDF). Information Systems Journal. 12: 61–78. doi:10.1046/j.1365-2575.2002.00118.x. S2CID   8123076.
  8. Kaur, Sumandeep (April 2020). "Security Issues in Open-Source Software" (PDF). International Journal of Computer Science & Communication: 47–51.
  9. Ombredanne, Philippe (October 2020). "Free and Open Source Software License Compliance: Tools for Software Composition Analysis". Computer. 53 (10): 262–264. doi: 10.1109/MC.2020.3011082 . S2CID   222232127.
  10. Duan, Ruian; Bijlani, Ashish; Xu, Meng; Kim, Taesoo; Lee, Wenke (2017). "Identifying Open-Source License Violation and 1-day Security Risk at Large Scale". Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM. pp. 2169–2185. doi:10.1145/3133956.3134048. ISBN   9781450349468. S2CID   7402387.
  11. Arora, Arushi; Wright, Virginia; Garman, Christina (2022). "Strengthening the Security of Operational Technology: Understanding Contemporary Bill of Materials" (PDF). Journal of Critical Infrastructure Policy. 3: 111–135. doi:10.18278/jcip.3.1.8.
  12. Bailey, T.; Greis, J.; Watters, M.; Welle, J. (19 September 2022). "Software bill of materials: Managing software cybersecurity risks". McKinsey & Company . Retrieved 6 January 2024.
  13. Popp, Karl Michael (30 October 2019). Best Practices for commercial use of open source software. BoD – Books on Demand, 2019. p. 10. ISBN   9783750403093.
  14. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (October 2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM. pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  15. Sun, Xiaohan; Cheng, Yunchang; Qu, Xiaojie; Li, Hang (June 2021). "Design and Implementation of Security Test Pipeline based on DevSecOps". 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). Vol. 4. IEEE. pp. 532–535. doi:10.1109/IMCEC51613.2021.9482270. ISBN   978-1-7281-8535-4. S2CID   236193144.
  16. "Software Bill of Materials Elements and Considerations". Federal Register . 6 February 2021. Retrieved 6 January 2024.
  17. Serafini, Daniele; Zacchiroli, Stefano (September 2022). "Efficient Prior Publication Identification for Open Source Code". The 18th International Symposium on Open Collaboration. Vol. 4. ACM. pp. 1–8. arXiv: 2207.11057 . doi:10.1145/3555051.3555068. ISBN   9781450398459. S2CID   251018650.
  18. Chen, Yang; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (September 2020). "Automated identification of libraries from vulnerability data". Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. pp. 90–99. doi:10.1145/3377813.3381360. ISBN   9781450371230. S2CID   211167417.
  19. Kengo Oka, Dennis (2021). "Software Composition Analysis in the Automotive Industry". Building Secure Cars. Wiley. pp. 91–110. doi:10.1002/9781119710783.ch6. ISBN   9781119710783. S2CID   233582862.
  20. Rajapakse, Roshan Namal; Zahedi, Mansooreh; Babar, Muhammad Ali (2021). "An Empirical Analysis of Practitioners' Perspectives on Security Tool Integration into DevOps". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–12. arXiv: 2107.02096 . doi:10.1145/3475716.3475776. ISBN   9781450386654. S2CID   235731939.
  21. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  22. "Component Analysis". owasp.org.
  23. Foo, Darius; Chua, Hendy; Yeo, Jason; Ang, Ming Yi; Sharma, Asankhaya (2018). "Efficient static checking of library updates". Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 791–796. doi:10.1145/3236024.3275535. ISBN   9781450355735. S2CID   53079466.
  24. Millar, Stuart (November 2017). "Vulnerability Detection in Open Source Software: The Cure and the Cause" (PDF). Queen's University Belfast.