S (programming language)

Last updated
S
Paradigm Multi-paradigm: imperative, object oriented
Developer Rick Becker, Allan Wilks, John Chambers, William S. Cleveland, Trevor Hastie
First appeared1976;48 years ago (1976)
Typing discipline dynamic, strong
License depends on implementation
Website ect.bell-labs.com/sl/S/ at the Wayback Machine (archived 2018-10-14)
Major implementations
S-PLUS
Influenced by
C, APL, PPL, Fortran
Influenced
R

S [1] is a statistical programming language developed primarily by John Chambers and (in earlier versions) Rick Becker, Trevor Hastie, William Cleveland and Allan Wilks of Bell Laboratories. The aim of the language, as expressed by John Chambers, is "to turn ideas into software, quickly and faithfully". [1] It is widely used by academic researchers. [2]

Contents

A major implementation of S is S-PLUS, a commercial product that was formerly sold by TIBCO Software.

The modern R, a part of the GNU free software project, was based on S [3] and can run many S programs, although it is not fully backwards compatible. [4]

History

"Old S"

S is one of several statistical computing languages that were designed at Bell Laboratories, and first took form between 1975–1976. Up to that time, much of the statistical computing was done by directly calling Fortran subroutines; however, S was designed to offer an alternate and more interactive approach, motivated in part by exploratory data analysis advocated by John Tukey. [5] Early design decisions that hold even today include interactive graphics devices (printers and character terminals at the time), and providing easily accessible documentation for the functions.[ citation needed ]

Development of the project was led by John Chambers and Trevor Hastie, and included developers Richard Becker, Allan Wilks, John Chambers, and William Cleveland, [6] all of whom were then employees of AT&T. [7] Out of the developers who contributed to S, Chambers is generally agreed to be the most significant contributor. [3] Chambers received the Software System Award from the Association for Computing Machinery for his work on S. [8]

The first working version of S was built in 1976, and operated on the GCOS operating system. At this time, S was unnamed, and suggestions included ISCS (Interactive SCS), SCS (Statistical Computing System), and SAS (Statistical Analysis System) (which was already taken: see SAS System). The name 'S' (used with single quotation marks until 1979) was chosen, as it was a common letter in the suggestions and consistent with other programming languages designed from the same institution at the time (namely the C programming language). [5] It stands for the word "statistics". [9]

When UNIX/32V was ported to the (then new) 32-bit DEC VAX, computing on the Unix platform became feasible for S. In late 1979, S2 was ported from GCOS to UNIX, which would become the new primary platform. [10]

In 1980 the first version of S was distributed outside Bell Laboratories and in 1981 source versions were made available. [5] S was distributed freely in academic circles, and became popular among academic statisticians. [11] In 1984 two books were published by the research team at Bell Laboratories: S: An Interactive Environment for Data Analysis and Graphics [12] (1984 Brown Book) and Extending the S System. [13] Also, in 1984 the source code for S became licensed through AT&T Software Sales for education and commercial purposes.

"New S"

The first version of S-PLUS was released by Statistical Sciences, Inc. in 1988. S-PLUS was later sold to TIBCO Software. [9] By this time, many changes were made to S and the syntax of the language with the release of S3. [10] The New S Language [14] (1988 Blue Book) was published to introduce the new features, such as the transition from macros to functions and how functions can be passed to other functions (such as apply). Many other changes to the S language were to extend the concept of "objects", and to make the syntax more consistent (and strict). However, many users found the transition to New S difficult, since their macros needed to be rewritten. Many other changes to S took hold, such as the use of X11 and PostScript graphics devices, rewriting many internal functions from Fortran to C, and the use of double precision (only) arithmetic. The New S language is very similar to that used in modern versions of S-PLUS and R.

The graphical user interface of S was also updated interactive graphical features after integration with Axum. [9]

In 1991, Statistical Models in S [15] (1991 White Book) was published, which introduced the use of formula-notation [16] (which use the ~ operator), data frame objects, and modifications to the use of object methods and classes.

S4

The latest version of the S standard is S4, released in 1998. [17] It provides advanced object-oriented features. S4 classes differ markedly from S3 classes; S4 formally defines the representation and inheritance for each class, and has multiple dispatch: the generic function can be dispatched to a method based on the class of any number of arguments, not just one. [18]

See also

Related Research Articles

<span class="mw-page-title-main">MATLAB</span> Numerical computing environment and programming language

MATLAB is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages.

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.

<span class="mw-page-title-main">John Tukey</span> American mathematician

John Wilder Tukey was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distribution, the Tukey test of additivity, and the Teichmüller–Tukey lemma all bear his name. He is also credited with coining the term bit and the first published use of the word software.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts with traditional hypothesis testing, in which a model is supposed to be selected before the data is seen. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

IDL, short for Interactive Data Language, is a programming language used for data analysis. It is popular in particular areas of science, such as astronomy, atmospheric physics and medical imaging. IDL shares a common syntax with PV-Wave and originated from the same codebase, though the languages have subsequently diverged in detail. There are also free or costless implementations, such as GNU Data Language (GDL) and Fawlty Language (FL).

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

JMP is a suite of computer programs for statistical analysis and machine learning developed by JMP, a subsidiary of SAS Institute. The program was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available for the Windows operating system.

XLispStat is a statistical scientific package based on the XLISP language.

Statistica is an advanced analytics software package originally developed by StatSoft and currently maintained by TIBCO Software Inc. Statistica provides data analysis, data management, statistics, data mining, machine learning, text analytics and data visualization procedures.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

S-PLUS is a commercial implementation of the S programming language sold by TIBCO Software Inc.

The following tables compare general and technical information for a number of statistical analysis packages.

John McKinley Chambers is the creator of the S programming language, and core member of the R programming language project. He was awarded the 1998 ACM Software System Award for developing S.

ggplot2 Data visualization package for R

ggplot2 is an open-source data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages.

<span class="mw-page-title-main">Trevor Hastie</span> American statistician & computer scientist (born 1953)

Trevor John Hastie is an American statistician and computer scientist. He is currently serving as the John A. Overdeck Professor of Mathematical Sciences and Professor of Statistics at Stanford University. Hastie is known for his contributions to applied statistics, especially in the field of machine learning, data mining, and bioinformatics. He has authored several popular books in statistical learning, including The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Hastie has been listed as an ISI Highly Cited Author in Mathematics by the ISI Web of Knowledge. He also contributed to the development of S.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.

<span class="mw-page-title-main">MLAB</span>

MLAB is a multi-paradigm numerical computing environment and fourth-generation programming language was originally developed at the National Institutes of Health.

References

  1. 1 2 Chambers, John M (1998). Programming with Data: A Guide to the S Language. Springer. ISBN   978-0-387-98503-9.
  2. "S-Plus: An Introduction". www.stat.rice.edu. Retrieved 2024-02-28.
  3. 1 2 Ashwani, Kumar; Satyanarayana, Reddy, Seelam Sai (2020-09-25). Advancements in Security and Privacy Initiatives for Multimedia Images. IGI Global. p. 179. ISBN   978-1-7998-2797-9.{{cite book}}: CS1 maint: multiple names: authors list (link)
  4. Nicholls, Andy; Pugh, Richard; Gott, Aimee (2015-12-16). R in 24 Hours, Sams Teach Yourself. Sams Publishing. ISBN   978-0-13-428880-2.
  5. 1 2 3 Becker, Richard A., A Brief History of S, Murray Hill, New Jersey: AT&T Bell Laboratories, archived from the original (PS) on 2015-07-23, retrieved 2015-07-23
  6. Berry, Kenneth J.; Johnston, Janis E.; Jr, Paul W. Mielke (2014-04-11). A Chronicle of Permutation Statistical Methods: 1920–2000, and Beyond. Springer Science & Business Media. pp. 207–208. ISBN   978-3-319-02744-9.
  7. Encyclopedia of Statistical Sciences, Volume 12. John Wiley & Sons. 2005-12-16. p. 8088. ISBN   978-0-471-74406-1.
  8. Charpentier, Arthur (2014-08-26). Computational Actuarial Science with R. CRC Press. p. 4. ISBN   978-1-4987-5982-3.
  9. 1 2 3 Nicholls, Andy; Pugh, Richard; Gott, Aimee (2015-12-16). R in 24 Hours, Sams Teach Yourself. Sams Publishing. ISBN   978-0-13-428880-2.
  10. 1 2 Chambers, John (2008-06-14). Software for Data Analysis: Programming with R. Springer. pp. 477–478. ISBN   978-0-387-75936-4.
  11. Hardin, James W.; Hilbe, Joseph M. (2002-07-30). Generalized Estimating Equations. CRC Press. p. 12. ISBN   978-1-4200-3528-5.
  12. Becker, R.A.; Chambers, J.M. (1984). S: An Interactive Environment for Data Analysis and Graphics . Pacific Grove, CA, USA: Wadsworth & Brooks/Cole. ISBN   0-534-03313-X.
  13. Becker, R.A.; Chambers, J.M. (1985). Extending the S System. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole. ISBN   0-534-05016-6.
  14. Becker, R.A.; Chambers, J.M.; Wilks, A.R. (1988). The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole. ISBN   0-534-09192-X.
  15. Chambers, J.M.; Hastie, T.J. (1991). Statistical Models in S. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole. p. 624. ISBN   0-412-05291-1.
  16. Wilkinson, G.N.; Rogers, C.E. (1973). "Symbolic description of factorial models for analysis of variance". Applied Statistics. 22 (3): 392–399. doi:10.2307/2346786. JSTOR   2346786.
  17. Chambers, John (January 1, 2001). "The S System". Bell Labs. Archived from the original on 2018-10-14.
  18. Wickham, Hadley (2019). "S4". Advanced R. adv-r.had.co.nz. ISBN   9781466586963 . Retrieved 2020-02-18.