SAS language

Last updated
SAS
Paradigm Multi-paradigm: Data-driven, Procedural programming
Designed by Anthony James Barr
Developer SAS Institute
First appeared1976;48 years ago (1976)
OS Windows and macOS
License Proprietary commercial software
Filename extensions .sas
Website sas.com/en_us/software/base-sas.html

The SAS language is a fourth-generation computer programming language used for statistical analysis, created by Anthony James Barr at North Carolina State University. [1] [2] Its primary applications include data mining and machine learning. The SAS language runs under compilers such as the SAS System that can be used on Microsoft Windows, Linux, UNIX and mainframe computers. [3]

Contents

History

SAS was developed in the 1960s by Anthony James Barr, who built its fundamental structure, [4] and SAS Institute CEO James Goodnight, who developed a number of features including analysis procedures. [5] The language is currently developed and sponsored by the SAS Institute, of which Goodnight is founder and CEO. [6]

Language

Base SAS is a fourth-generation procedural programming language designed for the statistical analysis of data. [7] It is Turing-complete and domain specific, with many of the attributes of a command language. As an interpreted language, it is generally parsed, compiled, and executed step by step. [8] The SAS system was originally a single instruction, single data (SISD) engine, but single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD) functionality was later added. [9] Most base SAS code can be ported between versions, but some are functions and parameters are specific to certain operating systems and interfaces. [10]

All SAS programs are written within the SAS language, although some packages use menu-driven graphical user interfaces on the front-end. [11] Various SAS editors use color coding to identify components like step boundaries, keywords and constants. [12] It can read in data from common spreadsheets and databases and output the results of statistical analyses in tables, graphs, and as RTF, HTML and PDF documents. [13]

Syntax

The language consists of two main types of blocks: DATA blocks and PROC blocks. [14] DATA blocks can be used to read and manipulate input data, and create data sets. PROC blocks are used to perform analyses and operations on these data sets, sort data, and output results in the form of descriptive statistics, tables, results, charts and plots. [15] [16] PROC SQL can be used to work with SQL syntax within SAS. [17]

Users can input both numeric and character data into base SAS. SAS statements must begin with a reserve keyword end with ; [18] but the language is otherwise flexible in terms of formatting and most statements are case insensitive. [19] SAS statements can continue across multiple lines and do not require indenting, although indents can improve readability. [18] Comments are delimited by /* and */. [20]

A standard SAS program typically entails the definition of data, the creation of a data set, and the performance of procedures such as analysis on that data set. [18] SAS scripts have the .sas extension.

A simple example of SAS code is the following

* COMMENT;Data TEMP;    inputX Y Z;    datalines;1 2 35 6 7;run;PROC PRINT DATA = TEMP;RUN;

SAS macro language

The SAS macro language is made available within base SAS software to reduce the amount of code, and create code generators for building more versatile and flexible programs. [21] The macro language can used for functionalities as simple as symbolic substitution and as complex as dynamic programming. [8] SAS macro is considered to be a rich language, [22] although its overall syntax is very similar to that of base SAS. The names of macro variables in SAS are usually preceded by &, while macro program statements are usually preceded by %. [8]

Software

SAS Institute develops a number of tools and software suites, also called SAS, which are used for creating programs in the language. These suites include JMP, SAS Viya, SAS Enterprise Guide and SAS Enterprise Miner. [3] [9] [17] In 2002, World Programming also developed software that allows the execution of most SAS scripts. [17]

Uses

The SAS language is used as a standard in many industries, [17] and was ranked #22 on the TIOBE index in February 2024. [23] It is especially widely used for machine learning, [24] data mining, and data warehousing in the finance, insurance, manufacturing, health care and pharmaceutical industries. [14] It has a high level of documentation and community support, [20] which has contributed to its uptake. [24]

Machine learning

SAS is used for preparing input data, and building and optimizing machine learning algorithms. [25] Various models, such as artificial neural networks (ANN), convolutional neural networks and deep learning models, are developed and trained in SAS. [26] These are applied to areas such as computer vision and fraud detection. [27] SAS has also been noted for its applications in the area of decision intelligence. [28]

Data mining and warehousing

While SAS was originally developed for data analysis, it became an important language for data storage. [5] SAS is one of the primary languages used for data mining in business intelligence and statistics. [29] According to Gartner's Magic Quadrant and Forrester Research, the SAS Institute is one of the largest vendors of data mining software. [24]

See also

Notes

  1. SAS History, SAS Institute, archived from the original on 2013-10-23, retrieved April 4, 2014
  2. Barr & Goodnight, et al. 1976:"The SAS Staff". Attribution of contributions to SAS 72 and SAS 76.
  3. 1 2 Chambers, Michele; Dinsmore, Thomas W. (2015). Advanced Analytics Methodologies: Driving Business Value with Analytics. Pearson Education. p. 203. ISBN   978-0-13-349860-8.
  4. Agresti, Alan; Meng, Xiao-Li (2012-11-02). Strength in Numbers: The Rising of Academic Statistics Departments in the U. S. Springer Science & Business Media. p. 177. ISBN   978-1-4614-3649-2.
  5. 1 2 Wahi, Monika (2020-10-16). Mastering SAS Programming for Data Warehousing: An advanced programming guide to designing and managing Data Warehouses using SAS. Packt Publishing Ltd. pp. 8–10. ISBN   978-1-78953-118-3.
  6. "Pampering The Customers, Pampering The Employees". Forbes. Retrieved 2024-04-29.
  7. "SAS Help Center". documentation.sas.com. Retrieved 2024-04-29.
  8. 1 2 3 Carpenter, Art (2016-08-25). Carpenter's Complete Guide to the SAS Macro Language, Third Edition. SAS Institute. pp. 1–11. ISBN   978-1-62960-237-0.
  9. 1 2 Bequet, Henry (2018-07-20). Deep Learning for Numerical Applications with SAS. SAS Institute. pp. 4–5. ISBN   978-1-63526-677-1.
  10. Hughes, Troy Martin (2016-08-24). SAS Data Analytic Development: Dimensions of Software Quality. John Wiley & Sons. pp. xiii. ISBN   978-1-119-25570-3.
  11. Delwiche, Lora D.; Slaughter, Susan J. (2019-10-11). The Little SAS Book: A Primer, Sixth Edition. SAS Institute. ISBN   978-1-64295-343-5.
  12. Elliott, Alan C.; Woodward, Wayne A. (2015-08-18). SAS Essentials: Mastering SAS for Data Analytics. John Wiley & Sons. p. 12. ISBN   978-1-119-04218-1.
  13. Ohri, Ajay (2019-08-05). SAS for R Users: A Book for Data Scientists. John Wiley & Sons. pp. 151–157. ISBN   978-1-119-25642-7.
  14. 1 2 Bass, N. Jyoti; Solutions, K. Madhavi Lata & Kogent (2007). Base Sas Programming Black Book, 2007 Ed. Dreamtech Press. pp. 3–8. ISBN   978-81-7722-769-7.
  15. Chambers, Michele; Dinsmore, Thomas W. (2015). Advanced Analytics Methodologies: Driving Business Value with Analytics. Pearson Education. p. 203. ISBN   978-0-13-349860-8.
  16. Ohri, Ajay (2019-08-05). SAS for R Users: A Book for Data Scientists. John Wiley & Sons. pp. 51–58. ISBN   978-1-119-25642-7.
  17. 1 2 3 4 Anderson, Raymond A. (2022). Credit Intelligence and Modelling: Many Paths Through the Forest of Credit Rating and Scoring. Oxford University Press. p. 565. ISBN   978-0-19-284419-4.
  18. 1 2 3 Bass, N. Jyoti; Solutions, K. Madhavi Lata & Kogent (2007). Base Sas Programming Black Book, 2007 Ed. Dreamtech Press. pp. 43–44. ISBN   978-81-7722-769-7.
  19. Delwiche, Lora D.; Slaughter, Susan J. (2019-10-11). The Little SAS Book: A Primer, Sixth Edition. SAS Institute. ISBN   978-1-64295-343-5.
  20. 1 2 Ohri, Ajay (2019-08-05). SAS for R Users: A Book for Data Scientists. John Wiley & Sons. pp. 4–6. ISBN   978-1-119-25642-7.
  21. "Introduction to SAS Macro Language". stats.oarc.ucla.edu. Retrieved 2024-04-29.
  22. Stalla, Alessio (2022-04-20). "Challenges in Parsing Legacy Languages: The Case of SAS Macros". Strumenta. Retrieved 2024-04-29.
  23. "TIOBE Index". TIOBE. Archived from the original on 2024-02-23. Retrieved 2024-04-30.
  24. 1 2 3 Dean, Jared (2014-05-07). Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners. John Wiley & Sons. pp. 50–51. ISBN   978-1-118-92070-1.
  25. Kolosova, Tanya; Berestizhevsky, Samuel (2020-09-21). Supervised Machine Learning: Optimization Framework and Applications with SAS and R. CRC Press. pp. 7–8. ISBN   978-1-000-17681-0.
  26. Bequet, Henry (2018-07-20). Deep Learning for Numerical Applications with SAS. SAS Institute. pp. 8–14. ISBN   978-1-63526-677-1.
  27. Blanchard, Robert (2020-06-12). Deep Learning for Computer Vision with SAS: An Introduction. SAS Institute. p. 26. ISBN   978-1-64295-917-8.
  28. "Forrester Reprint". reprints2.forrester.com. Retrieved 2024-04-30.
  29. Shmueli, Galit; Bruce, Peter C.; Gedeck, Peter; Patel, Nitin R. (2019-10-14). Data Mining for Business Analytics: Concepts, Techniques and Applications in Python. John Wiley & Sons. ISBN   978-1-119-54985-7.

Related Research Articles

<span class="mw-page-title-main">Microsoft Excel</span> Spreadsheet editor, part of Microsoft 365

Microsoft Excel is a spreadsheet editor developed by Microsoft for Windows, macOS, Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA). Excel forms part of the Microsoft 365 suite of software.

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

<span class="mw-page-title-main">SPSS</span> Statistical analysis software

SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versions of the software released since 2015 have the brand name IBM SPSS Statistics.

<span class="mw-page-title-main">SAS Institute</span> American IT and analytics company

SAS Institute is an American multinational developer of analytics and artificial intelligence software based in Cary, North Carolina. SAS develops and markets a suite of analytics software, which helps access, manage, analyze and report on data to aid in decision-making. The company's software is used by most of the Fortune 500.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

JMP is a suite of computer programs for statistical analysis developed by JMP, a subsidiary of SAS Institute. It was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available also for the Windows operating system. JMP is used in applications such as Machine Learning, Six Sigma, quality control, design of experiments, as well as for research in science, engineering, biotechnologies, and social sciences.

<span class="mw-page-title-main">Data analysis</span> The process of analyzing data to discover useful information and support decision-making

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

The Cross-industry standard process for data mining, known as CRISP-DM, is an open standard process model that describes common approaches used by data mining experts. It is the most widely-used analytics model.

Predictive analytics is a form of business analytics applying machine learning to generate a predictive model for certain business applications. As such, it encompasses a variety of statistical techniques from predictive modeling and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. It represents a major subset of machine learning applications; in some contexts, it is synonymous with machine learning.

<span class="mw-page-title-main">RapidMiner</span> Data science software

RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022.

<span class="mw-page-title-main">R Commander</span>

R Commander (Rcmdr) is a GUI for the R programming language, licensed under the GNU General Public License, and developed and maintained by John Fox in the sociology department at McMaster University. Rcmdr looks and works similarly to SPSS GUI by providing a menu of analytic and graphical methods. It also displays the underlying R code that runs each analysis.

<span class="mw-page-title-main">Anthony James Barr</span> American programming language designer, software engineer and inventor

Anthony James Barr, aka Tony Barr or Jim Barr, is an American programming language designer, software engineer and inventor. Among his notable contributions are the Statistical Analysis System (SAS), automated lumber yield optimization, and the Automated Classification of Medical Entities (ACME).

<span class="mw-page-title-main">World Programming System</span> Data analysis software

The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming.

<span class="mw-page-title-main">John Sall</span>

John P. Sall is an American billionaire businessman and computer software developer, who co-founded SAS Institute and created the JMP statistical software.

OpenNN is a software library written in the C++ programming language which implements neural networks, a main area of deep learning research. The library is open-source, licensed under the GNU Lesser General Public License.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

References