Source code

Last updated

Simple C-language source code example, a procedural programming language. The resulting program prints "hello, world" on the computer screen. This first known "Hello world" snippet from the seminal book The C Programming Language originates from Brian Kernighan in the Bell Laboratories in 1974. Hello world c.svg
Simple C-language source code example, a procedural programming language. The resulting program prints "hello, world" on the computer screen. This first known "Hello world" snippet from the seminal book The C Programming Language originates from Brian Kernighan in the Bell Laboratories in 1974.

In computing, source code, or simply code or source, is text (usually plain text) that conforms to a human-readable programming language and specifies the behavior of a computer. A programmer writes code to produce a program that runs on a computer.

Contents

Since a computer, at base, only understands machine code, source must be translated in order to be used by the computer and this can be implemented in a variety of ways depending on available technology. Source code can be converted by a compiler or an assembler into machine code that can be directly executed. Alternatively, source code can be processed without conversion to machine code via an interpreter that performs the actions prescribed by the source code via the interpreter's machine code. Other technology (i.e. bytecode) incorporates both mechanisms by converting the source code to an intermediate form that is often not human-readable but also not machine code and an interpreter executes the intermediate form.

Most languages allow for comments. The programmer can add comments to document the source code for themself and for other programmers reading the code. Comments cannot be represented in machine code, and therefore, are ignored by compilers, interpreters and the like.

Often, the source code of application software is not distributed or publicly available since the producer wants to protect their Intellectual property (IP). But, if the source code is available (open source), it can be useful to a user, programmer or a system administrator, any of whom might wish to study or modify the program.

Background

The first programmable computers, which appeared at the end of the 1940s, [2] were programmed in machine language (simple instructions that could be directly executed by the processor). Machine language was difficult to debug and was not portable between different computer systems. [3] Initially, hardware resources were scarce and expensive, while human resources were cheaper. [4] As programs grew more complex, programmer productivity became a bottleneck. This led to the introduction of high-level programming languages such as Fortran in the mid-1950s. These languages abstracted away the details of the hardware, instead being designed to express algorithms that could be understood more easily by humans. [5] [6] As instructions distinct from the underlying computer hardware, software is therefore relatively recent, dating to these early high-level programming languages such as Fortran, Lisp, and Cobol. [6] The invention of high-level programming languages was simultaneous with the compilers needed to translate the source code automatically into machine code that can be directly executed on the computer hardware. [7]

Source code is the form of code that is modified directly by humans, typically in a high-level programming language. Object code can be directly executed by the machine and is generated automatically from the source code, often via an intermediate step, assembly language. While object code will only work on a specific platform, source code can be ported to a different machine and recompiled there. For the same source code, object code can vary significantly—not only based on the machine for which it is compiled, but also based on performance optimization from the compiler. [8] [9]

Organization

Most programs do not contain all the resources needed to run them and rely on external libraries. Part of the compiler's function is to link these files in such a way that the program can be executed by the hardware. [10]

A more complex Java source code example. Written in object-oriented programming style, it demonstrates boilerplate code. With prologue comments indicated in red, inline comments indicated in green, and program statements indicated in blue. CodeCmmt002.svg
A more complex Java source code example. Written in object-oriented programming style, it demonstrates boilerplate code. With prologue comments indicated in red, inline comments indicated in green, and program statements indicated in blue.

Software developers often use configuration management to track changes to source code files (version control). The configuration management system also keeps track of which object code file corresponds to which version of the source code file. [11]

Purposes

Estimation

The number of lines of source code is often used as a metric when evaluating the productivity of computer programmers, the economic value of a code base, effort estimation for projects in development, and the ongoing cost of software maintenance after release. [12]

Communication

Source code is also used to communicate algorithms between people e.g., code snippets online or in books. [13]

Computer programmers may find it helpful to review existing source code to learn about programming techniques. [13] The sharing of source code between developers is frequently cited as a contributing factor to the maturation of their programming skills. [13] Some people consider source code an expressive artistic medium. [14]

Source code often contains comments—blocks of text marked for the compiler to ignore. This content is not part of the program logic, but is instead intended to help readers understand the program. [15]

Companies often keep the source code confidential in order to hide algorithms considered a trade secret. Proprietary, secret source code and algorithms are widely used for sensitive government applications such as criminal justice, which results in black box behavior with a lack of transparency into the algorithm's methodology. The result is avoidance of public scrutiny of issues such as bias. [16]

Modification

Access to the source code (not just the object code) is essential to modifying it. [17] Understanding existing code is necessary to understand how it works [17] and before modifying it. [18] The rate of understanding depends both on the code base as well as the skill of the programmer. [19] Experienced programmers have an easier time understanding what the code does at a high level. [20] Software visualization is sometimes used to speed up this process. [21]

Many software programmers use an integrated development environment (IDE) to improve their productivity. IDEs typically have several features built in, including a source-code editor that can alert the programmer to common errors. [22] Modification often includes code refactoring (improving the structure without changing functionality) and restructuring (improving structure and functionality at the same time). [23] Nearly every change to code will introduce new bugs or unexpected ripple effects, which require another round of fixes. [18]

Code reviews by other developers are often used to scrutinize new code added to a project. [24] The purpose of this phase is often to verify that the code meets style and maintainability standards and that it is a correct implementation of the software design. [25] According to some estimates, code review dramatically reduce the number of bugs persisting after software testing is complete. [24] Along with software testing that works by executing the code, static program analysis uses automated tools to detect problems with the source code. Many IDEs support code analysis tools, which might provide metrics on the clarity and maintainability of the code. [26] Debuggers are tools that often enable programmers to step through execution while keeping track of which source code corresponds to each change of state. [27]

Compilation and execution

Source code files in a high-level programming language must go through a stage of preprocessing into machine code before the instructions can be carried out. [7] After being compiled, the program can be saved as an object file and the loader (part of the operating system) can take this saved file and execute it as a process on the computer hardware. [10] Some programming languages use an interpreter instead of a compiler. An interpreter converts the program into machine code at run time, which makes them 10 to 100 times slower than compiled programming languages. [22] [28]

Quality

Software quality is an overarching term that can refer to a code's correct and efficient behavior, its reusability and portability, or the ease of modification. [29] It is usually more cost-effective to build quality into the product from the beginning rather than try to add it later in the development process. [30] Higher quality code will reduce lifetime cost to both suppliers and customers as it is more reliable and easier to maintain. [31] [32]

Maintainability is the quality of software enabling it to be easily modified without breaking existing functionality. [33] Following coding conventions such as using clear function and variable names that correspond to their purpose makes maintenance easier. [34] Use of conditional loop statements only if the code could execute more than once, and eliminating code that will never execute can also increase understandability. [35] Many software development organizations neglect maintainability during the development phase, even though it will increase long-term costs. [32] Technical debt is incurred when programmers, often out of laziness or urgency to meet a deadline, choose quick and dirty solutions rather than build maintainability into their code. [36] A common cause is underestimates in software development effort estimation, leading to insufficient resources allocated to development. [37] A challenge with maintainability is that many software engineering courses do not emphasize it. [38] Development engineers who know that they will not be responsible for maintaining the software do not have an incentive to build in maintainability. [18]

The situation varies worldwide, but in the United States before 1974, software and its source code was not copyrightable and therefore always public domain software. [39] In 1974, the US Commission on New Technological Uses of Copyrighted Works (CONTU) decided that "computer programs, to the extent that they embody an author's original creation, are proper subject matter of copyright". [40] [41]

Proprietary software is rarely distributed as source code. [42] Although the term open-source software literally refers to public access to the source code, [43] open-source software has additional requirements: free redistribution, permission to modify the source code and release derivative works under the same license, and nondiscrimination between different uses—including commercial use. [44] [45] The free reusability of open-source software can speed up development. [46]

See also

Related Research Articles

<span class="mw-page-title-main">Software</span> Non-tangible executable component of a computer

Software consists of computer programs that instruct the execution of a computer.

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Computer program</span> Instructions to be executed by a computer

A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also includes documentation and other intangible components.

Forth is a stack-oriented programming language and interactive integrated development environment designed by Charles H. "Chuck" Moore and first used by other programmers in 1970. Although not an acronym, the language's name in its early years was often spelled in all capital letters as FORTH. The FORTH-79 and FORTH-83 implementations, which were not written by Moore, became de facto standards, and an official technical standard of the language was published in 1994 as ANS Forth. A wide range of Forth derivatives existed before and after ANS Forth. The free and open-source software Gforth implementation is actively maintained, as are several commercially supported systems.

<span class="mw-page-title-main">Programming language</span> Language for communicating instructions to a machine

A programming language is a system of notation for writing computer programs.

In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

<span class="mw-page-title-main">Interpreter (computing)</span> Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's virtual machine.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program was originally designed for. The term is also used when software/hardware is changed to make them usable in different environments.

Software development is the process used to create software. Programming and maintaining the source code is the central step of this process, but it also includes conceiving the project, evaluating its feasibility, analyzing the business requirements, software design, testing, to release. Software engineering, in addition to development, also includes project management, employee management, and other overhead functions. Software development may be sequential, in which each step is complete before the next begins, but iterative development methods where multiple steps can be executed at once and earlier steps can be revisited have also been devised to improve flexibility, efficiency, and scheduling.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

Execution in computer and software engineering is the process by which a computer or virtual machine interprets and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a "fetch–decode–execute" cycle for each instruction done by the control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

Software maintenance is the modification of a software product after delivery.

In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level.

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.

The Language Server Protocol (LSP) is an open, JSON-RPC-based protocol for use between source code editors or integrated development environments (IDEs) and servers that provide "language intelligence tools": programming language-specific features like code completion, syntax highlighting and marking of warnings and errors, as well as refactoring routines. The goal of the protocol is to allow programming language support to be implemented and distributed independently of any given editor or IDE. In the early 2020s LSP quickly became a "norm" for language intelligence tools providers.

References

  1. Kernighan, Brian W. "Programming in C: A Tutorial" (PDF). Bell Laboratories, Murray Hill, N. J. Archived from the original (PDF) on 23 February 2015.
  2. Gabbrielli & Martini 2023, p. 519.
  3. Gabbrielli & Martini 2023, pp. 520–521.
  4. Gabbrielli & Martini 2023, p. 522.
  5. Gabbrielli & Martini 2023, p. 521.
  6. 1 2 Tracy 2021, p. 1.
  7. 1 2 Tracy 2021, p. 121.
  8. Lin et al. 2001, pp. 238–239.
  9. Katyal 2019, p. 1194.
  10. 1 2 Tracy 2021, pp. 122–123.
  11. O'Regan 2022, pp. 230–231, 233, 377.
  12. Foster 2014, pp. 249, 274, 280, 305.
  13. 1 2 3 Spinellis, D: Code Reading: The Open Source Perspective. Addison-Wesley Professional, 2003. ISBN   0-201-79940-5
  14. "Art and Computer Programming" ONLamp.com Archived 20 February 2018 at the Wayback Machine , (2005)
  15. Kaczmarek et al. 2018, p. 68.
  16. Katyal 2019, pp. 1186–1187.
  17. 1 2 Katyal 2019, p. 1195.
  18. 1 2 3 Offutt, Jeff (January 2018). "Overview of Software Maintenance and Evolution". George Mason University Department of Computer Science. Retrieved 5 May 2024.
  19. Tripathy & Naik 2014, p. 296.
  20. Tripathy & Naik 2014, p. 297.
  21. Tripathy & Naik 2014, pp. 318–319.
  22. 1 2 O'Regan 2022, p. 375.
  23. Tripathy & Naik 2014, p. 94.
  24. 1 2 Dooley 2017, p. 272.
  25. O'Regan 2022, pp. 18, 21.
  26. O'Regan 2022, p. 133.
  27. Kaczmarek et al. 2018, pp. 348–349.
  28. Sebesta 2012, p. 28.
  29. Galin 2018, p. 26.
  30. O'Regan 2022, pp. 68, 117.
  31. O'Regan 2022, pp. 3, 268.
  32. 1 2 Varga 2018, p. 12.
  33. Varga 2018, p. 5.
  34. Tripathy & Naik 2014, pp. 296–297.
  35. Tripathy & Naik 2014, p. 309.
  36. Varga 2018, pp. 6–7.
  37. Varga 2018, p. 7.
  38. Varga 2018, pp. 7–8.
  39. Liu, Joseph P.; Dogan, Stacey L. (2005). "Copyright Law and Subject Matter Specificity: The Case of Computer Software". New York University Annual Survey of American Law. 61 (2). Archived from the original on 25 June 2021.
  40. Apple Computer, Inc. v. Franklin Computer Corporation Puts the Byte Back into Copyright Protection for Computer Programs Archived 7 May 2017 at the Wayback Machine in Golden Gate University Law Review Volume 14, Issue 2, Article 3 by Jan L. Nussbaum (January 1984)
  41. Lemley, Menell, Merges and Samuelson. Software and Internet Law, p. 34.
  42. Boyle 2003, p. 45.
  43. Morin et al. 2012, Open Source versus Closed Source.
  44. Sen et al. 2008, p. 209.
  45. Morin et al. 2012, Free and Open Source Software (FOSS) Licensing.
  46. O'Regan 2022, p. 106.

Sources