Software composition analysis

Last updated

Software composition analysis (SCA) is a practice in the fields of Information technology and software engineering for analyzing custom-built software applications to detect embedded open-source software and detect if they are up-to-date, contain security flaws, or have licensing requirements. [1]

Contents

Background

It is a common software engineering practice to develop software by using different components. [2] Using software components segments the complexity of larger elements into smaller pieces of code and increases flexibility by enabling easier reuse of components to address new requirements. [3] The practice has widely expanded since the late 1990s with the popularization of open-source software (OSS) to help speed up the software development process and reduce time to market. [4]

However, using open-source software introduces many risks for the software applications being developed. These risks can be organized into 5 categories: [5]

Shortly after the foundation of the Open Source Initiative in February 1998, [6] the risks associated with OSS were raised [7] and organizations tried to manage this using spreadsheets and documents to track all the open source components used by their developers. [8]

For organizations using open-source components extensively, there was a need to help automate the analysis and management of open source risk. This resulted in a new category of software products called Software Composition Analysis (SCA) which helps organizations manage open source risk. SCA strives to detect all the 3rd party components in use within a software application to help reduce risks associated with security vulnerabilities, IP licensing requirements, and obsolescence of components being used.

Principle of operation

SCA products typically work as follows: [9]

Advanced techniques

Since the early 2010s, researchers have developed several advanced techniques to improve the accuracy and efficiency of SCA tools:

Vulnerable method analysis

Vulnerable method analysis addresses the problem of determining whether a vulnerability in a third-party library poses an actual risk to an application. Rather than simply detecting the presence of vulnerable libraries, this technique analyzes whether the specific vulnerable methods within those libraries are reachable from the application's execution paths. The approach involves constructing call graphs that map the relationships between application code and library methods, then determining if there exists a path from application entry points to vulnerability-specific sinks in the libraries. [17]

Machine learning for vulnerability databases

Traditional vulnerability databases rely on manual curation by security researchers, which can be time-intensive and may miss relevant vulnerabilities. Machine learning approaches automate this process by training models to predict whether data items from various sources (such as bug reports, commits, and mailing lists) are vulnerability-related. These systems implement complete pipelines from data collection through model training and prediction, with iterative improvement mechanisms that generate better models as new data becomes available. [18]

Static analysis for library compatibility

As SCA tools increasingly recommend library updates to address vulnerabilities, ensuring compatibility becomes critical. Advanced static analysis techniques can automatically detect API incompatibilities that would be introduced by library upgrades, enabling automated vulnerability remediation without breaking existing functionality. These lightweight analyses are designed to integrate into continuous integration and continuous delivery pipelines. [19]

Usage

As SCA impacts different functions in organizations, different teams may use the data depending on the organization's corporation size and structure. The IT department will often use SCA for implementing and operationalizing the technology with common stakeholders including the chief information officer (CIO), the Chief Technology Officer (CTO), and the Chief Enterprise Architects (EA). [20] Security and license data are often used by roles such as Chief Information Security Officers (CISO) for security risks, and Chief IP / Compliance officer for Intellectual Property risk management. [21]

Depending on the SCA product capabilities, it can be implemented directly within a developer's Integrated Development Environment (IDE) who uses and integrates OSS components, or it can be implemented as a dedicated step in the software quality control process. [22] [23]

SCA products, and particularly their capacity to generate an SBOM is required in some countries such as the United States to enforce the security of software delivered to one of their agencies by a vendor. [24]

Another common use case for SCA is for Technology Due diligence. Prior to a Merger & Acquisition (M&A) transaction, Advisory firms review the risks associated with the software of the target firm. [25]

Strengths

The automatic nature of SCA products is their primary strength. Developers don't have to manually do an extra work when using and integrating OSS components. [26] The automation also applies to indirect references to other OSS components within code and artifacts. [27]

Modern SCA implementations have significantly improved accuracy through advanced analysis techniques. Vulnerable method analysis reduces false positives by determining actual reachability of vulnerable code paths, while machine learning approaches for vulnerability curation help maintain more comprehensive and up-to-date vulnerability databases. These advances address many traditional limitations of metadata-only approaches. [28]

Weaknesses

Conversely, some key weaknesses of current SCA products may include:

See also

References

  1. Prana, Gede Artha Azriadi; Sharma, Abhishek; Shar, Lwin Khin; Foo, Darius; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (July 2021). "Out of sight, out of mind? How vulnerable dependencies affect open-source projects". Empirical Software Engineering. 26 (4) 59. Springer: 1–34. doi:10.1007/s10664-021-09959-3. S2CID   197679660.
  2. Nierstrasz, Oscar; Meijler, Theo Dirk (1995). "Research directions in software composition". ACM Computing Surveys. 27 (2). ACM: 262–264. doi: 10.1145/210376.210389 . S2CID   17612128.
  3. Nierstrasz, Oscar; Dami, Laurent (January 1995). Object-oriented software composition. Prentice Hall International. pp. 3–28. CiteSeerX   10.1.1.90.8174 .
  4. De Hoon, Michiel JL; Imoto, Seiya; Nolan, John; Miyano, Satoru (February 2004). "Open source clustering software". Bioinformatics. 20 (9): 1453–1454. Bibcode:2004Bioin..20.1453D. CiteSeerX   10.1.1.114.3335 . doi:10.1093/bioinformatics/bth078. PMID   14871861.
  5. Duc Linh, Nguyen; Duy Hung, Phan; Dipe, Vu Thu (2019). "Risk Management in Projects Based on Open-Source Software". Proceedings of the 2019 8th International Conference on Software and Computer Applications. pp. 178–183. doi:10.1145/3316615.3316648. ISBN   9781450365734. S2CID   153314145.
  6. "History of the OSI". Opensource.org. 19 September 2006.
  7. Payne, Christian (2002). "On the security of open source software" (PDF). Information Systems Journal. 12: 61–78. doi:10.1046/j.1365-2575.2002.00118.x. S2CID   8123076.
  8. Kaur, Sumandeep (April 2020). "Security Issues in Open-Source Software" (PDF). International Journal of Computer Science & Communication: 47–51.
  9. Ombredanne, Philippe (October 2020). "Free and Open Source Software License Compliance: Tools for Software Composition Analysis". Computer. 53 (10): 262–264. Bibcode:2020Compr..53j.105O. doi: 10.1109/MC.2020.3011082 . S2CID   222232127.
  10. Chen, Yang; Santosa, Andrew E; Yi, Ang Ming; Sharma, Abhishek; Sharma, Asankhaya; Lo, David (2020). A Machine Learning Approach for Vulnerability Curation. Proceedings of the 17th International Conference on Mining Software Repositories. pp. 32–42. doi:10.1145/3379597.3387461.
  11. Duan, Ruian; Bijlani, Ashish; Xu, Meng; Kim, Taesoo; Lee, Wenke (2017). "Identifying Open-Source License Violation and 1-day Security Risk at Large Scale". Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM. pp. 2169–2185. doi:10.1145/3133956.3134048. ISBN   9781450349468. S2CID   7402387.
  12. Foo, Darius; Yeo, Jason; Xiao, Hao; Sharma, Asankhaya (2019). "The Dynamics of Software Composition Analysis". arXiv: 1909.00973 [cs.SE].
  13. Foo, Darius; Yeo, Jason; Xiao, Hao; Sharma, Asankhaya (2019). "The Dynamics of Software Composition Analysis". arXiv: 1909.00973 [cs.SE].
  14. Chen, Yang; Santosa, Andrew E; Yi, Ang Ming; Sharma, Abhishek; Sharma, Asankhaya; Lo, David (2020). A Machine Learning Approach for Vulnerability Curation. Proceedings of the 17th International Conference on Mining Software Repositories. pp. 32–42. doi:10.1145/3379597.3387461.
  15. Zhou, Yaqin; Sharma, Asankhaya (2017). Automated identification of security issues from commit messages and bug reports. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. pp. 914–919. doi:10.1145/3106237.3106293.
  16. Arora, Arushi; Wright, Virginia; Garman, Christina (2022). "Strengthening the Security of Operational Technology: Understanding Contemporary Bill of Materials" (PDF). Journal of Critical Infrastructure Policy. 3: 111–135. doi:10.18278/jcip.3.1.8.
  17. Foo, Darius; Yeo, Jason; Xiao, Hao; Sharma, Asankhaya (2019). "The Dynamics of Software Composition Analysis". arXiv: 1909.00973 [cs.SE].
  18. Chen, Yang; Santosa, Andrew E; Yi, Ang Ming; Sharma, Abhishek; Sharma, Asankhaya; Lo, David (2020). A Machine Learning Approach for Vulnerability Curation. Proceedings of the 17th International Conference on Mining Software Repositories. pp. 32–42. doi:10.1145/3379597.3387461.
  19. Foo, Darius; Chua, Hendy; Yeo, Jason; Ang, Ming Yi; Sharma, Asankhaya (2018). Efficient static checking of library updates. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 791–796. doi:10.1145/3236024.3275535.
  20. Bailey, T.; Greis, J.; Watters, M.; Welle, J. (19 September 2022). "Software bill of materials: Managing software cybersecurity risks". McKinsey & Company . Retrieved 6 January 2024.
  21. Popp, Karl Michael (30 October 2019). Best Practices for commercial use of open source software. BoD – Books on Demand, 2019. p. 10. ISBN   9783750403093.
  22. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (October 2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM. pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  23. Sun, Xiaohan; Cheng, Yunchang; Qu, Xiaojie; Li, Hang (June 2021). "Design and Implementation of Security Test Pipeline based on DevSecOps". 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). Vol. 4. IEEE. pp. 532–535. doi:10.1109/IMCEC51613.2021.9482270. ISBN   978-1-7281-8535-4. S2CID   236193144.
  24. "Software Bill of Materials Elements and Considerations". Federal Register . 6 February 2021. Retrieved 6 January 2024.
  25. Serafini, Daniele; Zacchiroli, Stefano (September 2022). "Efficient Prior Publication Identification for Open Source Code". The 18th International Symposium on Open Collaboration. Vol. 4. ACM. pp. 1–8. arXiv: 2207.11057 . doi:10.1145/3555051.3555068. ISBN   9781450398459. S2CID   251018650.
  26. Chen, Yang; Santosa, Andrew E; Sharma, Asankhaya; Lo, David (September 2020). "Automated identification of libraries from vulnerability data". Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. pp. 90–99. doi:10.1145/3377813.3381360. ISBN   9781450371230. S2CID   211167417.
  27. Kengo Oka, Dennis (2021). "Software Composition Analysis in the Automotive Industry". Building Secure Cars. Wiley. pp. 91–110. doi:10.1002/9781119710783.ch6. ISBN   9781119710783. S2CID   233582862.
  28. Foo, Darius; Yeo, Jason; Xiao, Hao; Sharma, Asankhaya (2019). "The Dynamics of Software Composition Analysis". arXiv: 1909.00973 [cs.SE].
  29. Rajapakse, Roshan Namal; Zahedi, Mansooreh; Babar, Muhammad Ali (2021). "An Empirical Analysis of Practitioners' Perspectives on Security Tool Integration into DevOps". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–12. arXiv: 2107.02096 . doi:10.1145/3475716.3475776. ISBN   9781450386654. S2CID   235731939.
  30. Imtiaz, Nasif; Thorn, Seaver; Williams, Laurie (2021). "A comparative study of vulnerability reporting by software composition analysis tools". Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). pp. 1–11. arXiv: 2108.12078 . doi:10.1145/3475716.3475769. ISBN   9781450386654. S2CID   237346987.
  31. "Component Analysis". owasp.org.
  32. Foo, Darius; Chua, Hendy; Yeo, Jason; Ang, Ming Yi; Sharma, Asankhaya (2018). "Efficient static checking of library updates". Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 791–796. doi:10.1145/3236024.3275535. ISBN   9781450355735. S2CID   53079466.
  33. Millar, Stuart (November 2017). "Vulnerability Detection in Open Source Software: The Cure and the Cause" (PDF). Queen's University Belfast.