Software forensics

Last updated

Software forensics is the science of analyzing software source code or binary code to determine whether intellectual property infringement or theft occurred. It is the centerpiece of lawsuits, trials, and settlements when companies are in dispute over issues involving software patents, copyrights, and trade secrets. Software forensics tools can compare code to determine correlation, a measure that can be used to guide a software forensics expert.

Contents

Past methods of software forensics

Past methods of code comparison included hashing, statistical analysis, text matching, and tokenization. These methods compared software code and produced a single measure indicating whether copying had occurred. However, these measures were not accurate enough to be admissible in court because the results were not accurate, the algorithms could be easily fooled by simple substitutions in the code, and the methods did not take into account the fact that code could be similar for reasons other than copying.

Following the use of software tools to compare code to determine the amount of correlation, an expert can use an iterative filtering process to determine that the correlated code is due to third-party code, code generation tools, commonly used names, common algorithms, common programmers, or copying. If the correlation is due to copying, and the copier did not have the authority from the rights holder, then copyright infringement occurred.

Trade secret protection and infringement

Software can contain trade secrets, which provide a competitive advantage to a business. To determine trade secret theft, the same tools and processes can be used to detect copyright infringement. If code was copied without authority, and that code has the characteristics of a trade secret—it is not generally known, the business keeps it secret, and its secrecy maintains its value to the business—then the copied code constitutes trade secret theft.

Trade secret theft can also involve the taking of code functionality without literally copying the code. Comparing code functionality is a very difficult problem that has yet to be accomplished by any algorithm in reasonable time. For this reason, finding the theft of code functionality is still mostly a manual process.

Patent infringement

As with trade secret functionality, it is not currently possible to scientifically detect software patent infringement, as software patents cover general implementation rather than specific source code. For example, a program that implements a patented invention can be written in many available programming languages, using different function names and variable names and performing operations in different sequences. There are so many combinations of ways to implement inventions in software that even the most powerful modern computers cannot consider all combinations of code that might infringe a patent. This work is still left to human experts using their knowledge and experience, but it is a problem that many in software forensics are trying to automate by finding an algorithm or simplifying process.

Objective facts before subjective evidence

One important rule of any forensic analysis is that the objective facts must be considered first. Reviewing comments in the code or searching the Internet to find information about the companies that distribute the code and the programmers who wrote the code are useful only after the objective facts regarding correlation have been considered. Once an analysis has been performed using forensic tools and procedures, analysts can then begin looking at subjective evidence like comments in the code. If the information in that subjective evidence conflicts with the objective analysis, analysts need to doubt the subjective evidence. Fake copyright notices, open source notifications, or programmer names that were added to source code after copying took place, in order to disguise the copying, are not uncommon in real-world cases of code theft.

Related Research Articles

<span class="mw-page-title-main">Software</span> Non-tangible executable component of a computer

Software is a collection of programs and data that tell a computer how to perform specific tasks. Software often includes associated software documentation. This is in contrast to hardware, from which the system is built and which actually performs the work.

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

In computing, source code, or simply code, is text that conforms to a human-readable programming language and specifies the behavior of a computer. A programmer writes code to produce a program that runs on a computer.

Clean-room design is the method of copying a design by reverse engineering and then recreating it without infringing any of the copyrights associated with the original design. Clean-room design is useful as a defense against copyright infringement because it relies on independent creation. However, because independent invention is not a defense against patents, clean-room designs typically cannot be used to circumvent patent restrictions.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

A key generator (key-gen) is a computer program that generates a product licensing key, such as a serial number, necessary to activate for use of a software application. Keygens may be legitimately distributed by software manufacturers for licensing software in commercial environments where software has been licensed in bulk for an entire site or enterprise, or they may be developed and distributed illegitimately in circumstances of copyright infringement or software piracy.

Copy protection, also known as content protection, copy prevention and copy restriction, describes measures to enforce copyright by preventing the reproduction of software, films, music, and other media.

The software patent debate is the argument about the extent to which, as a matter of public policy, it should be possible to patent software and computer-implemented inventions. Policy debate on software patents has been active for years. The opponents to software patents have gained more visibility with fewer resources through the years than their pro-patent opponents. Arguments and critiques have been focused mostly on the economic consequences of software patents.

<i>Computer Associates International, Inc. v. Altai, Inc.</i> American legal case

Computer Associates International, Inc. v. Altai, Inc., 982 F.2d 693 is a decision from the United States Court of Appeals for the Second Circuit that addressed to what extent non-literal elements of software are protected by copyright law. The court used and recommended a three-step process called the Abstraction-Filtration-Comparison test. The case was an appeal from the United States District Court for the Eastern District of New York in which the district court found that defendant Altai's OSCAR 3.4 computer program had infringed plaintiff Computer Associates' copyrighted computer program entitled CA-SCHEDULER. The district court also found that Altai's OSCAR 3.5 program was not substantially similar to a portion of CA-SCHEDULER 7.0 called SYSTEM ADAPTER, and thus denied relief as to OSCAR 3.5. Finally, the district court concluded that Computer Associates' state law trade secret misappropriation claim against Altai was preempted by the federal Copyright Act. The appeal was heard by Judges Frank Altimari, John Daniel Mahoney, and John M. Walker, Jr. The majority opinion was written by Judge Walker. Judge Altimari concurred in part and dissented in part. The Second Circuit affirmed the district court's ruling as to copyright infringement, but vacated and remanded its holding on trade secret preemption.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994–1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.

Reverse engineering is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomplishes a task with very little insight into exactly how it does so. Depending on the system under consideration and the technologies employed, the knowledge gained during reverse engineering can help with repurposing obsolete objects, doing security analysis, or learning how something works.

<span class="mw-page-title-main">Copyright infringement</span> Illegal usage of copyrighted works

Copyright infringement is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, such as the right to reproduce, distribute, display or perform the protected work, or to make derivative works. The copyright holder is typically the work's creator, or a publisher or other business to whom copyright has been assigned. Copyright holders routinely invoke legal and technological measures to prevent and penalize copyright infringement.

Analytic dissection is a concept in U.S. copyright law analysis of computer software. Analytic dissection is a tool for determining whether a work accused of copyright infringement is substantially similar to a copyright-protected work.

<span class="mw-page-title-main">Substantial similarity</span> Standard in US copyright law

Substantial similarity, in US copyright law, is the standard used to determine whether a defendant has infringed the reproduction right of a copyright. The standard arises out of the recognition that the exclusive right to make copies of a work would be meaningless if copyright infringement were limited to making only exact and complete reproductions of a work. Many courts also use "substantial similarity" in place of "probative" or "striking similarity" to describe the level of similarity necessary to prove that copying has occurred. A number of tests have been devised by courts to determine substantial similarity. They may rely on expert or lay observation or both and may subjectively judge the feel of a work or critically analyze its elements.

<span class="mw-page-title-main">Brand protection</span>

Brand protection is the process and set of actions that a right holder undertakes to prevent third parties from using its intellectual property without permission, as this may cause loss of revenue and, usually more importantly, destroys brand equity, reputation and trust. Brand protection seeks primarily to ensure that trademarks, patents, and copyrights are respected, though other intellectual property rights such as industrial design rights or trade dress can be involved. Counterfeiting is the umbrella term to designate infringements to intellectual property, with the exception of the term piracy which is sometimes (colloquially) used to refer to copyright infringement.

<span class="mw-page-title-main">Structure, sequence and organization</span>

Structure, sequence and organization (SSO) is a term used in the United States to define a basis for comparing one software work to another in order to determine if copying has occurred that infringes on copyright, even when the second work is not a literal copy of the first. The term was introduced in the case of Whelan v. Jaslow in 1986. The method of comparing the SSO of two software products has since evolved in attempts to avoid the extremes of over-protection and under-protection, both of which are considered to discourage innovation. More recently, the concept has been used in Oracle America, Inc. v. Google, Inc.

Google LLC v. Oracle America, Inc., 593 U.S. ___ (2021), was a U.S. Supreme Court decision related to the nature of computer code and copyright law. The dispute centered on the use of parts of the Java programming language's application programming interfaces (APIs) and about 11,000 lines of source code, which are owned by Oracle, within early versions of the Android operating system by Google. Google has since transitioned Android to a copyright-unburdened engine without the source code, and has admitted to using the APIs but claimed this was within fair use.

<i>Brown Bag Software v. Symantec Corp.</i> United States intellectual property law case

Brown Bag Software v. Symantec Corp. is an intellectual property law case in which the United States Court of Appeals for the Ninth Circuit affirmed-in-part and vacated-in-part the previous ruling of the United States District Court for the Northern District of California. Brown Bag Software sued Symantec Corporation and John L. Friend, an individual software developer for Softworks Development, for copyright infringement and several state law claims regarding the similarity of Symantec Corporation's and Brown Bag Software's computer outlining programs.

Code stylometry is the application of stylometry to computer code to attribute authorship to anonymous binary or source code. It often involves breaking down and examining the distinctive patterns and characteristics of the programming code and then comparing them to computer code whose authorship is known. Unlike software forensics, code stylometry attributes authorship for purposes other than intellectual property infringement, including plagiarism detection, copyright investigation, and authorship verification.

References