Copy-and-paste programming

Last updated May 23, 2024

Copy-and-paste programming, sometimes referred to as just pasting, is the production of highly repetitive computer programming code, as produced by copy and paste operations. It is primarily a pejorative term; those who use the term are often implying a lack of programming competence and ability to create abstractions. It may also be the result of technology limitations (e.g., an insufficiently expressive development environment) as subroutines or libraries would normally be used instead. However, there are occasions when copy-and-paste programming is considered acceptable or necessary, such as for boilerplate, loop unrolling (when not supported automatically by the compiler), or certain programming idioms, and it is supported by some source code editors in the form of snippets.

Origins

Copy-and-paste programming is often done by inexperienced or student programmers, who find the act of writing code from scratch difficult or irritating and prefer to search for a pre-written solution or partial solution they can use as a basis for their own problem solving.^[1] (See also Cargo cult programming)

Inexperienced programmers who copy code often do not fully understand the pre-written code they are taking. As such, the problem arises more from their inexperience and lack of courage in programming than from the act of copying and pasting, per se. The code often comes from disparate sources such as friends' or co-workers' code, Internet forums, code provided by the student's professors/TAs, or computer science textbooks. The result risks being a disjointed clash of styles, and may have superfluous code that tackles problems for which new solutions are no longer required.

A further problem is that bugs can easily be introduced by assumptions and design choices made in the separate sources that no longer apply when placed in a new environment.

Such code may also, in effect, be unintentionally obfuscated, as the names of variables, classes, functions and the like are typically left unchanged, even though their purpose may be completely different in the new context.^[1]

Copy-and-paste programming may also be a result of poor understanding of features common in computer languages, such as loop structures, functions and subroutines.

Duplication

Applying library code

Copying and pasting is also done by experienced programmers, who often have their own libraries of well tested, ready-to-use code snippets and generic algorithms that are easily adapted to specific tasks.^[2]

Being a form of code duplication, copy-and-paste programming has some intrinsic problems; such problems are exacerbated if the code doesn't preserve any semantic link between the source text and the copies. In this case, if changes are needed, time is wasted hunting for all the duplicate locations. (This can be partially mitigated if the original code and/or the copy are properly commented; however, even then the problem remains of making the same edits multiple times. Also, because code maintenance often omits updating the comments,^[3] comments describing where to find remote pieces of code are notorious for going out-of-date.)

Adherents of object oriented methodologies further object to the "code library" use of copy and paste. Instead of making multiple mutated copies of a generic algorithm, an object oriented approach would abstract the algorithm into a reusable encapsulated class. The class is written flexibly, with full support of inheritance and overloading, so that all calling code can be interfaced to use this generic code directly, rather than mutating the original.^[4] As additional functionality is required, the library is extended (while retaining backward compatibility). This way, if the original algorithm has a bug to fix or can be improved, all software using it stands to benefit. Generic programming provides additional tools to create abstractions.

Branching code

Branching code is a normal part of large-team software development, allowing parallel development on both branches and hence, shorter development cycles. Classical branching has the following qualities:

Is managed by a version control system that supports branching
Branches are re-merged once parallel development is completed.

Copy and paste is a less formal alternative to classical branching, often used when it is foreseen that the branches will diverge more and more over time, as when a new product is being spun off from an existing product.

As a way of spinning-off a new product, copy-and-paste programming has some advantages. Because the new development initiative does not touch the code of the existing product:

There is no need to regression test the existing product, saving on QA time associated with the new product launch, and reducing time to market.
There is no risk of introduced bugs in the existing product, which might upset the installed user base.

The downsides are:

If the new product does not diverge as much as anticipated from the existing product, two code bases might need to be supported (at twice the cost) where one would have done. This can lead to expensive refactoring and manual merging down the line.
The duplicate code base doubles the time required to implement changes which may be desired across both products; this increases time-to-market for such changes, and may, in fact, wipe out any time gains achieved by branching the code in the first place.

Similar to above, the alternative to a copy-and-paste approach would be a modularized approach:

Start by factoring out code to be shared by both products into libraries.
Use those libraries (rather than a second copy of the code base) as the foundation for the development of the new product.
If an additional third, fourth, or fifth version of the product is envisaged down the line, this approach is far stronger, because the ready-made code libraries dramatically shorten the development life cycle for any additional products after the second.^[5]

Repetitive tasks or variations of a task

One of the most harmful forms of copy-and-paste programming occurs in code that performs a repetitive task, or variations of the same basic task depending on some variable. Each instance is copied from above and pasted in again, with minor modifications. Harmful effects include:

The copy and paste approach often leads to large methods (a bad code smell).
Each instance creates a code duplicate, with all the problems discussed in prior sections, but with a much greater scope. Scores of duplications are common; hundreds are possible. Bug fixes, in particular, become very difficult and costly in such code.^[6]
Such code also suffers from significant readability issues, due to the difficulty of discerning exactly what differs between each repetition. This has a direct impact on the risks and costs of revising the code.
The procedural programming model strongly discourages the copy-and-paste approach to repetitive tasks. Under a procedural model, a preferred approach to repetitive tasks is to create a function or subroutine that performs a single pass through the task; this subroutine is then called by the parent routine, either repetitively or better yet, with some form of looping structure. Such code is termed "well decomposed", and is recommended as being easier to read and more readily extensible.^[7]
The general rule of thumb applicable to this case is "don't repeat yourself".

Deliberate design choice

Copy-and-paste programming is occasionally accepted as a valid programming technique. This is most commonly seen in boilerplate, such as class declarations or importing standard libraries, or in using an existing code template (with empty contents or stub functions) as a framework to fill in.

Use of programming idioms and design patterns are similar to copy-and-paste programming, as they also use formulaic code. In some cases, this can be expressed as a snippet, which can then be pasted in when such code is necessary, though it is often simply recalled from the programmer's mind. In other cases idioms cannot be reduced to a code template. In most cases, however, even if an idiom can be reduced to code, it will be either long enough that it is abstracted into a function or short enough that it can be keyed in directly.

The Subtext programming language is a research project aimed at "decriminalizing" cut and paste. Using this language, cut and paste is the primary interaction model, and hence not considered an anti-pattern.

Example

A simple example is a for loop, which might be expressed as for(inti=0;i!=n;++i){}.

Sample code using such a for-loop might be:

voidfoo(intn){for(inti=0;i!=n;++i){/* body */}}

The looping code could then have been generated by the following snippet (specifying types and variable names):

for($type$loop_var=0;$loop_var!=$stop;++$loop_var){/* body */}

Related Research Articles

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of procedures, by writing code in one or more programming languages. Programmers typically use high-level programming languages that are more easily intelligible to humans than machine code, which is directly executed by the central processing unit. Proficient programming usually requires expertise in several different subjects, including knowledge of the application domain, details of programming languages and generic code libraries, specialized algorithms, and formal logic.

Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection (if/then/else) and repetition, block structures, and subroutines.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. Branch may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

In software development, code reuse, also called software reuse, is the use of existing software, or software knowledge, to build new software, following the reusability principles.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer programming, code bloat is the production of program code that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the programming language in which the code is written, the compiler used to compile it, or the programmer writing it. Thus, while code bloat generally refers to source code size, it can be used to refer instead to the generated code size or even the binary file size.

"Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of information which is likely to change, replacing it with abstractions that are less likely to change, or using data normalization which avoids redundancy in the first place.

A PIGUI package is a software library that a programmer uses to produce GUI code for multiple computer platforms. The package presents subroutines and/or objects which are independent of the GUIs that the programmer is targeting. For software to qualify as PIGUI it must support several GUIs under at least two different operating systems. The package does not necessarily provide any additional portability features. Native look and feel is a desirable feature, but is not essential for PIGUIs.

<span class="mw-page-title-main">Snippet (programming)</span> Small region of re-usable source code, machine code, or text

Snippet is a programming term for a small region of re-usable source code, machine code, or text. Ordinarily, these are formally defined operative units to incorporate into larger programming modules. Snippet management is a feature of some text editors, program source code editors, IDEs, and related software. It allows the user to avoid repetitive typing in the course of routine edit operations.

<span class="mw-page-title-main">Control table</span> Data structures that control the execution order of computer commands

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

<span class="mw-page-title-main">Goto</span> One-way control statement in computer programming

Goto is a statement found in many computer programming languages. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control. The jumped-to locations are usually identified using labels, though some languages use line numbers. At the machine code level, a goto is a form of branch or jump statement, in some cases combined with a stack adjustment. Many languages support the goto statement, and many do not.

In software engineering and programming language theory, the abstraction principle is a basic dictum that aims to reduce duplication of information in a program whenever practical by making use of abstractions provided by the programming language or software libraries. The principle is sometimes stated as a recommendation to the programmer, but sometimes stated as a requirement of the programming language, assuming it is self-understood why abstractions are desirable to use. The origins of the principle are uncertain; it has been reinvented a number of times, sometimes under a different name, with slight variations.

In computer programming, a function, subprogram, procedure, method, routine or subroutine is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times.

Example-centric programming is an approach to software development that helps the user to create software by locating and modifying small examples into a larger whole. That approach can be helped by tools that allow an integrated development environment (IDE) to show code examples or API documentation related to coding behaviors occurring in the IDE. “Borrow” tactics are often employed from online sources, by programmers leaving the IDE to troubleshoot. The purpose of example-centric programming is to reduce the time spent by developers searching online. Ideally, in example-centric programming, the user interface integrates with help module examples for assistance without programmers leaving the IDE. The idea for this type of “instant documentation” is to reduce programming interruptions. The usage of this feature is not limited to experts, as some novices reap the benefits of an integrated knowledge base, without resorting to frequent web searches or browsing.

References

1 2 Yarmish, Gavriel; Kopec, Danny (2007). "Revisiting Novice Programmers Errors". ACM Sigcse Bulletin. 39 (2). acm.org: 131–137. doi:10.1145/1272848.1272896. S2CID 8854303 . Retrieved 2008-06-04.
↑ "Building ASP.NET Web Pages Dynamically in the Code-Behind". codeproject.com. 25 April 2008. Retrieved 2008-06-04.
↑ Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06.
↑ Lewallen, Raymond. "4 major principles of Object-Oriented Programming". codebetter.com. Archived from the original on 2010-11-25. Retrieved 2008-06-04.
↑ Eriksen, Lisa. "Code Reuse In Object-Oriented Software Development" (PDF). Norwegian University of Science and Technology, Department of Computer and Information Science. Retrieved 2008-05-29.
↑ Ashley Marsh. "Coding Standards – The Way to Maintainable Code". MAAN Softwares INC. Retrieved 2018-04-10.
↑ "Stanford University, CS 106X ("Programming Abstractions") Course Handout: "Decomposition"" (PDF). Stanford University. Archived from the original (PDF) on May 16, 2008. Retrieved 2008-06-04.

/*

* The pattern in Listing 8-2 */

import java.util.Scanner; import java.io.File;import java.io.IOException; class SomeClassName {

   public static void main(String args[])                                   throws IOException {        Scanner scannerName =            new Scanner(new File("SomeFileName"));        //Some code goes herescannerName.nextInt();        scannerName.nextDouble();        scannerName.next();        scannerName.nextLine();        //Some code goes herescannerName.close();    }

}

External links

c2:CopyAndPasteProgramming
Andrey Karpov. Consequences of using the Copy-Paste method in C++ programming and how to deal with it
Andrey Karpov. The Last Line Effect
PMD's Copy/Paste Detector, CPD.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[inexperienced-1] 1 2 Yarmish, Gavriel; Kopec, Danny (2007). "Revisiting Novice Programmers Errors". ACM Sigcse Bulletin. 39 (2). acm.org: 131–137. doi:10.1145/1272848.1272896. S2CID 8854303 . Retrieved 2008-06-04.

[2] "Building ASP.NET Web Pages Dynamically in the Code-Behind". codeproject.com. 25 April 2008. Retrieved 2008-06-04.

[3] Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06.

[4] Lewallen, Raymond. "4 major principles of Object-Oriented Programming". codebetter.com. Archived from the original on 2010-11-25. Retrieved 2008-06-04.

[5] Eriksen, Lisa. "Code Reuse In Object-Oriented Software Development" (PDF). Norwegian University of Science and Technology, Department of Computer and Information Science. Retrieved 2008-05-29.

[6] Ashley Marsh. "Coding Standards – The Way to Maintainable Code". MAAN Softwares INC. Retrieved 2018-04-10.

[7] "Stanford University, CS 106X ("Programming Abstractions") Course Handout: "Decomposition"" (PDF). Stanford University. Archived from the original (PDF) on May 16, 2008. Retrieved 2008-06-04.

[1]

[2]

[3]

[4]

[5]

[6]

[7]