This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages) |
Source lines of code (SLOC), also known as lines of code (LOC), is a software metric used to measure the size of a computer program by counting the number of lines in the text of the program's source code. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or maintainability once the software is produced.
Many useful comparisons involve only the order of magnitude of lines of code in a project. Using lines of code to compare a 10,000-line project to a 100,000-line project is far more useful than when comparing a 20,000-line project with a 21,000-line project. While it is debatable exactly how to measure lines of code, discrepancies of an order of magnitude can be clear indicators of software complexity or man-hours.
There are two major types of SLOC measures: physical SLOC (LOC) and logical SLOC (LLOC). Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code excluding comment lines. [1]
Logical SLOC attempts to measure the number of executable "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like programming languages is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are more sensitive to logically irrelevant formatting and style conventions than logical SLOC. However, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC.
Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:
for(i=0;i<100;i++)printf("hello");/* How many lines of code is this? */
In this example we have:
Depending on the programmer and coding standards, the above "line" of code could be written on many separate lines:
/* Now how many lines of code is this? */for(i=0;i<100;i++){printf("hello");}
In this example we have:
Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park (while at the Software Engineering Institute) and others developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure.
At the time when SLOC was introduced as a metric, the most commonly used languages, such as FORTRAN and assembly language, were line-oriented languages. These languages were developed at the time when punched cards were the main form of data entry for programming. One punched card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer, so it made sense to managers to count lines of code as a measurement of a programmer's productivity, even referring to such as "card images". Today, the most commonly used computer languages allow a lot more leeway for formatting. Text lines are no longer limited to 80 or 96 columns, and one line of text no longer necessarily corresponds to one line of code.
This article contains weasel words: vague phrasing that often accompanies biased or unverifiable information.(September 2013) |
SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC[ citation needed ], that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with fewer SLOC may exhibit more functionality than another similar program. Counting SLOC as productivity measure has its caveats, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines (and generally spending more effort). Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Furthermore, inexperienced developers often resort to code duplication, which is highly discouraged as it is more bug-prone and costly to maintain, but it results in higher SLOC.
SLOC counting exhibits further accuracy issues at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various computer languages balance brevity and clarity in different ways; as an extreme example, most assembly languages would require hundreds of lines of code to perform the same task as a few characters in APL. The following example shows a comparison of a "hello world" program written in BASIC, C, and COBOL (a language known for being particularly verbose).
BASIC | C | COBOL |
---|---|---|
PRINT"hello, world" | #include<stdio.h>intmain(){printf("hello, world\n");} | identificationdivision.program-id.hello.proceduredivision.display "hello, world"goback .endprogramhello. |
Lines of code: 1 (no whitespace) | Lines of code: 4 (excluding whitespace) | Lines of code: 6 (excluding whitespace) |
Another increasingly common problem in comparing SLOC metrics is the difference between auto-generated and hand-written code. Modern software tools often have the capability to auto-generate enormous amounts of code with a few clicks of a mouse. For instance, graphical user interface builders automatically generate all the source code for a graphical control elements simply by dragging an icon onto a workspace. The work involved in creating this code cannot reasonably be compared to the work necessary to write a device driver, for instance. By the same token, a hand-coded custom GUI class could easily be more demanding than a simple device driver; hence the shortcoming of this metric.
There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely used Constructive Cost Model (COCOMO) series of models by Barry Boehm et al., PRICE Systems True S and Galorath's SEER-SEM. While these models have shown good predictive power, they are only as good as the estimates (particularly the SLOC estimates) fed to them. Many [2] have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC (and cannot be automatically measured) this is not a universally held view.
According to Vincent Maraia, [3] the SLOC values for various operating systems in Microsoft's Windows NT product line are as follows:
Year | Operating system | SLOC (million) |
---|---|---|
1993 | Windows NT 3.1 | 4–5 [3] |
1994 | Windows NT 3.5 | 7–8 [3] |
1996 | Windows NT 4.0 | 11–12 [3] |
2000 | Windows 2000 | more than 29 [3] |
2001 | Windows XP | 45 [4] [5] |
2003 | Windows Server 2003 | 50 [3] |
David A. Wheeler studied the Red Hat distribution of the Linux operating system, and reported that Red Hat Linux version 7.1 [6] (released April 2001) contained over 30 million physical SLOC. He also extrapolated that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion (in year 2000 U.S. dollars).
A similar study was later made of Debian GNU/Linux version 2.2 (also known as "Potato"); this operating system was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55 million SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost US$1.9 billion to develop. Later runs of the tools used report that the following release of Debian had 104 million SLOC, and as of year 2005 [update] , the newest release is going to include over 213 million SLOC.
Year | Operating system | SLOC (million) |
---|---|---|
2000 | Debian 2.2 | 55–59 [7] [8] |
2002 | Debian 3.0 | 104 [8] |
2005 | Debian 3.1 | 215 [8] |
2007 | Debian 4.0 | 283 [8] |
2009 | Debian 5.0 | 324 [8] |
2012 | Debian 7.0 | 419 [9] |
2009 | OpenSolaris | 9.7 |
FreeBSD | 8.8 | |
2005 | Mac OS X 10.4 | 86 [10] [n 1] |
1991 | Linux kernel 0.01 | 0.010239 |
2001 | Linux kernel 2.4.2 | 2.4 [6] |
2003 | Linux kernel 2.6.0 | 5.2 |
2009 | Linux kernel 2.6.29 | 11.0 |
2009 | Linux kernel 2.6.32 | 12.6 [11] |
2010 | Linux kernel 2.6.35 | 13.5 [12] |
2012 | Linux kernel 3.6 | 15.9 [13] |
2015-06-30 | Linux kernel pre-4.2 | 20.2 [14] |
This section possibly contains original research .(April 2011) |
In the PBS documentary Triumph of the Nerds , Microsoft executive Steve Ballmer criticized the use of counting lines of code:
In IBM there's a religion in software that says you have to count K-LOCs, and a K-LOC is a thousand lines of code. How big a project is it? Oh, it's sort of a 10K-LOC project. This is a 20K-LOCer. And this is 50K-LOCs. And IBM wanted to sort of make it the religion about how we got paid. How much money we made off OS/2, how much they did. How many K-LOCs did you do? And we kept trying to convince them – hey, if we have – a developer's got a good idea and he can get something done in 4K-LOCs instead of 20K-LOCs, should we make less money? Because he's made something smaller and faster, less K-LOC. K-LOCs, K-LOCs, that's the methodology. Ugh! Anyway, that always makes my back just crinkle up at the thought of the whole thing.
According to the Computer History Museum Apple Developer Bill Atkinson in 1982 found problems with this practice:
When the Lisa team was pushing to finalize their software in 1982, project managers started requiring programmers to submit weekly forms reporting on the number of lines of code they had written. Bill Atkinson thought that was silly. For the week in which he had rewritten QuickDraw’s region calculation routines to be six times faster and 2000 lines shorter, he put “-2000″ on the form. After a few more weeks the managers stopped asking him to fill out the form, and he gladly complied. [16] [17]
Free software, libre software, or libreware is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, not price; all users are legally free to do what they want with their copies of a free software regardless of how much is paid to obtain the program. Computer programs are deemed "free" if they give end-users ultimate control over the software and, subsequently, over their devices.
An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, and a debugger. Some IDEs, such as IntelliJ IDEA, Eclipse and Lazarus contain the necessary compiler, interpreter or both; others, such as SharpDevelop and NetBeans, do not.
In computing, source code, or simply code or source, is text that conforms to a human-readable programming language and specifies the behavior of a computer. A programmer writes code to produce a program that runs on a computer.
A software bug is a bug in computer software.
In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement, often the two terms are used as synonyms. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development. The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance, testing, software debugging, software performance optimization, and optimal personnel task assignments.
Delphi is a general-purpose programming language and a software product that uses the Delphi dialect of the Object Pascal programming language and provides an integrated development environment (IDE) for rapid application development of desktop, mobile, web, and console software, currently developed and maintained by Embarcadero Technologies.
In the context of software engineering, software quality refers to two related but distinct notions:
In computer programming, Intentional Programming is a programming paradigm developed by Charles Simonyi that encodes in software source code the precise intention which programmers have in mind when conceiving their work. By using the appropriate level of abstraction at which the programmer is thinking, creating and maintaining computer programs become easier. By separating the concerns for intentions and how they are being operated upon, the software becomes more modular and allows for more reusable software code.
A LAMP is one of the most common software stacks for the web's most popular applications. Its generic software stack model has largely interchangeable components.
SEER for Software (SEER-SEM) is a project management application used to estimate resources required for software development.
The function point is a "unit of measurement" to express the amount of business functionality an information system provides to a user. Function points are used to compute a functional size measurement (FSM) of software. The cost of a single unit is calculated from past projects.
Coding conventions are a set of guidelines for a specific programming language that recommend programming style, practices, and methods for each aspect of a program written in that language. These conventions usually cover file organization, indentation, comments, declarations, statements, white space, naming conventions, programming practices, programming principles, programming rules of thumb, architectural best practices, etc. These are guidelines for software structural quality. Software programmers are highly recommended to follow these guidelines to help improve the readability of their source code and make software maintenance easier. Coding conventions are only applicable to the human maintainers and peer reviewers of a software project. Conventions may be formalized in a documented set of rules that an entire team or company follows, or may be as informal as the habitual coding practices of an individual. Coding conventions are not enforced by compilers.
Software sizing or software size estimation is an activity in software engineering that is used to determine or estimate the size of a software application or component in order to be able to implement other software project management activities. Size is an inherent characteristic of a piece of software just like weight is an inherent characteristic of a tangible material.
Programming productivity describes the degree of the ability of individual programmers or development teams to build and evolve software systems. Productivity traditionally refers to the ratio between the quantity of software produced and the cost spent for it. Here the delicacy lies in finding a reasonable way to define software quantity.
Software measurement is a quantified attribute of a characteristic of a software product or the software process. It is a discipline within software engineering. The process of software measurement is defined and governed by ISO Standard ISO 15939.
Quilt is a software utility for managing a series of changes to the source code of any computer program. Such changes are often referred to as "patches" or "patch sets". Quilt can take an arbitrary number of patches as input and condense them into a single patch. In doing so, Quilt makes it easier for many programmers to test and evaluate the different changes amongst patches before they are permanently applied to the source code.
Weighted Micro Function Points (WMFP) is a modern software sizing algorithm which is a successor to solid ancestor scientific methods as COCOMO, COSYSMO, maintainability index, cyclomatic complexity, function points, and Halstead complexity. It produces more accurate results than traditional software sizing methodologies, while requiring less configuration and knowledge from the end user, as most of the estimation is based on automatic measurements of an existing source code.
The following outline is provided as an overview of and topical guide to the Perl programming language:
perf is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf
, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system.
Parasoft C/C++test is an integrated set of tools for testing C and C++ source code that software developers use to analyze, test, find defects, and measure the quality and security of their applications. It supports software development practices that are part of development testing, including static code analysis, dynamic code analysis, unit test case generation and execution, code coverage analysis, regression testing, runtime error detection, requirements traceability, and code review. It's a commercial tool that supports operation on Linux, Windows, and Solaris platforms as well as support for on-target embedded testing and cross compilers.
86 million lines of source code that was ported to run on an entirely new architecture with zero hiccups.