Source code

Last updated

Simple C-language source code example, a procedural programming language. The resulting program prints "hello, world" on the computer screen. This first known "Hello world" snippet from the seminal book The C Programming Language originates from Brian Kernighan in the Bell Laboratories in 1974. Hello world c.svg
Simple C-language source code example, a procedural programming language. The resulting program prints "hello, world" on the computer screen. This first known "Hello world" snippet from the seminal book The C Programming Language originates from Brian Kernighan in the Bell Laboratories in 1974.

In computing, source code is any collection of code, possibly with comments, written using [1] a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source code. The source code is often transformed by an assembler or compiler into binary machine code that can be executed by the computer. The machine code might then be stored for execution at a later time. Alternatively, source code may be interpreted and thus immediately executed.

Contents

Most application software is distributed in a form that includes only executable files. If the source code were included it would be useful to a user, programmer or a system administrator, any of whom might wish to study or modify the program.

Definitions

The Linux Information Project defines source code as: [2]

Source code (also referred to as source or code) is the version of software as it is originally written (i.e., typed into a computer) by a human in plain text (i.e., human readable alphanumeric characters).

The notion of source code may also be taken more broadly, to include machine code and notations in graphical languages, neither of which are textual in nature. An example from an article presented on the annual IEEE conference and on Source Code Analysis and Manipulation: [3]

For the purpose of clarity "source code" is taken to mean any fully executable description of a software system. It is therefore so construed as to include machine code, very high level languages and executable graphical representations of systems. [4]

Often there are several steps of program translation or minification between the original source code typed by a human and an executable program. While some, like the FSF, argue that an intermediate file "is not real source code and does not count as source code", [5] others find it convenient to refer to each intermediate file as the source code for the next steps.

History

The earliest programs for stored-program computers were entered in binary through the front panel switches of the computer. This first-generation programming language had no distinction between source code and machine code.

When IBM first offered software to work with its machine, the source code was provided at no additional charge. At that time, the cost of developing and supporting software was included in the price of the hardware. For decades, IBM distributed source code with its software product licenses, until 1983. [6]

Most early computer magazines published source code as type-in programs.

Occasionally the entire source code to a large program is published as a hardback book, such as Computers and Typesetting, vol. B: TeX, The Program by Donald Knuth, PGP Source Code and Internals by Philip Zimmermann, PC SpeedScript by Randy Thompson, and µC/OS, The Real-Time Kernel by Jean Labrosse.

Organization

The source code which constitutes a program is usually held in one or more text files stored on a computer's hard disk; usually these files are carefully arranged into a directory tree, known as a source tree. Source code can also be stored in a database (as is common for stored procedures) or elsewhere.

A more complex Java source code example. Written in object-oriented programming style, it demonstrates boilerplate code. With prologue comments indicated in red, inline comments indicated in green, and program statements indicated in blue. CodeCmmt002.svg
A more complex Java source code example. Written in object-oriented programming style, it demonstrates boilerplate code. With prologue comments indicated in red, inline comments indicated in green, and program statements indicated in blue.

The source code for a particular piece of software may be contained in a single file or many files. Though the practice is uncommon, a program's source code can be written in different programming languages. [7] For example, a program written primarily in the C programming language, might have portions written in assembly language for optimization purposes. It is also possible for some components of a piece of software to be written and compiled separately, in an arbitrary programming language, and later integrated into the software using a technique called library linking. In some languages, such as Java, this can be done at run time (each class is compiled into a separate file that is linked by the interpreter at runtime).

Yet another method is to make the main program an interpreter for a programming language,[ citation needed ] either designed specifically for the application in question or general-purpose, and then write the bulk of the actual user functionality as macros or other forms of add-ins in this language, an approach taken for example by the GNU Emacs text editor.

The code base of a computer programming project is the larger collection of all the source code of all the computer programs which make up the project. It has become common practice to maintain code bases in version control systems. Moderately complex software customarily requires the compilation or assembly of several, sometimes dozens or maybe even hundreds, of different source code files. In these cases, instructions for compilations, such as a Makefile, are included with the source code. These describe the programming relationships among the source code files and contain information about how they are to be compiled.

Purposes

Source code is primarily used as input to the process that produces an executable program (i.e., it is compiled or interpreted). It is also used as a method of communicating algorithms between people (e.g., code snippets in books). [8]

Computer programmers often find it helpful to review existing source code to learn about programming techniques. [8] The sharing of source code between developers is frequently cited as a contributing factor to the maturation of their programming skills. [8] Some people consider source code an expressive artistic medium. [9]

Porting software to other computer platforms is usually prohibitively difficult without source code. Without the source code for a particular piece of software, portability is generally computationally expensive.[ citation needed ] Possible porting options include binary translation and emulation of the original platform.

Decompilation of an executable program can be used to generate source code, either in assembly code or in a high-level language.

Programmers frequently adapt source code from one piece of software to use in other projects, a concept known as software reusability.

The situation varies worldwide, but in the United States before 1974, software and its source code was not copyrightable and therefore always public domain software. [10]

In 1974, the US Commission on New Technological Uses of Copyrighted Works (CONTU) decided that "computer programs, to the extent that they embody an author's original creation, are proper subject matter of copyright". [11] [12]

In 1983 in the United States court case Apple v. Franklin it was ruled that the same applied to object code; and that the Copyright Act gave computer programs the copyright status of literary works.

In 1999, in the United States court case Bernstein v. United States it was further ruled that source code could be considered a constitutionally protected form of free speech. Proponents of free speech argued that because source code conveys information to programmers, is written in a language, and can be used to share humor and other artistic pursuits, it is a protected form of communication. [13] [14] [15]

Licensing

Copyright notice example: [16]

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

An author of a non-trivial work like software, [12] has several exclusive rights, among them the copyright for the source code and object code. [17] The author has the right and possibility to grant customers and users of his software some of his exclusive rights in form of software licensing. Software, and its accompanying source code, can be associated with several licensing paradigms; the most important distinction is open source vs proprietary software. This is done by including a copyright notice that declares licensing terms. If no notice is found, then the default of All rights reserved is implied.

Generally speaking, software is open source if the source code is free to use, distribute, modify and study, and proprietary if the source code is kept secret, or is privately owned and restricted. One of the first software licenses to be published and to explicitly grant these freedoms was the GNU General Public License in 1989; the BSD license is another early example from 1990.

For proprietary software, the provisions of the various copyright laws, trade secrecy and patents are used to keep the source code closed. Additionally, many pieces of retail software come with an end-user license agreement (EULA) which typically prohibits decompilation, reverse engineering, analysis, modification, or circumventing of copy protection. Types of source code protection—beyond traditional compilation to object code—include code encryption, code obfuscation or code morphing.

Quality

The way a program is written can have important consequences for its maintainers. Coding conventions, which stress readability and some language-specific conventions, are aimed at the maintenance of the software source code, which involves debugging and updating. Other priorities, such as the speed of the program's execution, or the ability to compile the program for multiple architectures, often make code readability a less important consideration, since code quality generally depends on its purpose.

See also

Related Research Articles

Assembly language Low level programming language

In computer programming, assembly language, often abbreviated asm, is any low-level programming language in which there is a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Because assembly depends on the machine code instructions, every assembler has its own assembly language which is designed for exactly one specific computer architecture. Assembly language may also be called symbolic machine code.

Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result. Programming involves tasks such as: analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms in a chosen programming language. The source code of a program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit. The purpose of programming is to find a sequence of instructions that will automate the performance of a task on a computer, often for solving a given problem. The process of programming thus often requires expertise in several different subjects, including knowledge of the application domain, specialized algorithms, and formal logic.

Computer program Instructions to be executed by a computer

A computer program is a collection of instructions that can be executed by a computer to perform a specific task. Most computer devices require programs to function properly.

Free software software licensed to preserve user freedoms

Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, not price: users—individually or in cooperation with computer programmers—are free to do what they want with their copies of a free software regardless of how much is paid to obtain the program. Computer programs are deemed free if they give users ultimate control over the software and, subsequently, over their devices.

The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and Linux, including the Linux kernel. The Free Software Foundation (FSF) distributes GCC under the GNU General Public License. GCC has played an important role in the growth of free software, as both a tool and an example.

Linker (computing) computer program producing executable programs

In computing, a linker or link editor is a computer System program that takes one or more object files generated by a compiler or an assembler and combines them into a single executable file, library file, or another 'object' file.

Machine code Set of instructions executed directly by a computers central processing unit (CPU)

Machine code is a computer program written in machine language instructions that can be executed directly by a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very specific task, such as a load, a store, a jump, or an arithmetic logic unit (ALU) operation on one or more units of data in the CPU's registers or memory.

In software development, obfuscation is the deliberate act of creating source or machine code that is difficult for humans to understand. Like obfuscation in natural language, it may use needlessly roundabout expressions to compose statements. Programmers may deliberately obfuscate code to conceal its purpose or its logic or implicit values embedded in it, primarily, in order to prevent tampering, deter reverse engineering, or even as a puzzle or recreational challenge for someone reading the source code. This can be done manually or by using an automated tool, the latter being the preferred technique in industry.

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation and immediately execute this;
  3. Explicitly execute stored precompiled code made by a compiler which is part of the interpreter system.
GNU Project Free software project

The GNU Project is a free software, mass collaboration project that Richard Stallman announced on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaboratively developing and publishing software that gives everyone the rights to freely run the software, copy and distribute it, study it, and modify it. GNU software grants these rights in its license.

A backdoor is a typically covert method of bypassing normal authentication or encryption in a computer, product, embedded device, or its embodiment. Backdoors are most often used for securing remote access to a computer, or obtaining access to plaintext in cryptographic systems. From there it may be used to gain access to privileged information like passwords, corrupt or delete data on hard drives, or transfer information within autoschediastic networks.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined together to accomplish a task, much as one might use multiple hand tools to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, and are often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

A GPL linking exception modifies the GNU General Public License (GPL) in a way that enables software projects which provide library code to be "linked to" the programs that use them, without applying the full terms of the GPL to the using program. Linking is the technical process of connecting code in a library to the using code, to produce a single executable file. It is performed either at compile time or run-time in order to produce functional machine-readable code. There is a public perception, so far unsupported by any legal precedent or citation, that without applying the linking exception, a program linked to GPL library code may only be distributed under a GPL-compatible license. The license of the GNU Classpath project explicitly includes a statement to that effect.

patch (Unix) text processing software

The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.

The Ruby License is a Free and Open Source license applied to the Ruby programming language and also available to be used in other projects. It is approved by the Free Software Foundation although it has not been approved Open Source by the Open Source Initiative.

In the 1950s and 1960s, computer operating software and compilers were delivered as a part of hardware purchases without separate fees. At the time, source code, the human-readable form of software, was generally distributed with the software providing the ability to fix bugs or add new functions. Universities were early adopters of computing technology. Many of the modifications developed by universities were openly shared, in keeping with the academic principles of sharing knowledge, and organizations sprung up to facilitate sharing. As large-scale operating systems matured, fewer organizations allowed modifications to the operating software, and eventually such operating systems were closed to modification. However, utilities and other added-function applications are still shared and new organizations have been formed to promote the sharing of software.

Proprietary software, also known as closed-source software, is non-free computer software for which the software's publisher or another person retains intellectual property rights—usually copyright of the source code, but sometimes patent rights.

A decompiler is a computer program that takes an executable file as input, and attempts to create a high level source file which can be recompiled successfully. It is therefore the opposite of a compiler, which takes a source file and makes an executable. Decompilers are usually unable to perfectly reconstruct the original source code, and as such, will frequently produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.

Software categories are groups of software. They allow software to be understood in terms of those categories. Instead of the particularities of each package. Different classification schemes consider different aspects of software.

The following outline is provided as an overview of and topical guide to the Perl programming language:

References

  1. 1 2 "Programming in C: A Tutorial" (PDF). Archived from the original (PDF) on 23 February 2015.
  2. The Linux Information Project. "Source Code Definition".
  3. SCAM Working Conference, 2001–2010.
  4. Why Source Code Analysis and Manipulation Will Always Be Important by Mark Harman, 10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010). Timișoara, Romania, 12–13 September 2010.
  5. "gnu.org". www.gnu.org.
  6. Martin Goetz (8 February 1988). "Object-code only: Is IBM playing fair?". Computerworld . Vol. 22 no. 6. p. 59. It was in 1983 that IBM reversed its 20-year-old policy of distributing source code with its software product licenses.
  7. "Extending and Embedding the Python Interpreter". docs.python.org.
  8. 1 2 3 Spinellis, D: Code Reading: The Open Source Perspective. Addison-Wesley Professional, 2003. ISBN   0-201-79940-5
  9. "Art and Computer Programming" ONLamp.com, (2005)
  10. P., Liu, Joseph; L., Dogan, Stacey (2005). "Copyright Law and Subject Matter Specificity: The Case of Computer Software". New York University Annual Survey of American Law. 61 (2).
  11. Apple Computer, Inc. v. Franklin Computer Corporation Puts the Byte Back into Copyright Protection for Computer Programs in Golden Gate University Law Review Volume 14, Issue 2, Article 3 by Jan L. Nussbaum (January 1984)
  12. 1 2 Lemley, Menell, Merges and Samuelson. Software and Internet Law, p. 34.
  13. "Info" (PDF). cr.yp.to. Retrieved 27 December 2019.
  14. Bernstein v. US Department of Justice on eff.org
  15. EFF at 25: Remembering the Case that established Code as Speech on EFF.org by Alison Dame-Boyle (16 April 2015)
  16. "License". www.apache.org. Retrieved 27 December 2019.
  17. Hancock, Terry (29 August 2008). "What if copyright didn't apply to binary executables?". Free Software Magazine . Retrieved 25 January 2016.

Sources