SPSS

Last updated
SPSS
Developer(s) Norman H. Nie, Dale H. Bent, C. Hadlai Hull
Initial release1968;56 years ago (1968)
Stable release
29 / September 13, 2022;19 months ago (2022-09-13) [1]
Operating system Windows (x86-64), macOS (x86-64), Linux (x86-64, ppc64le, IBM Z) [2]
Platform Java
Size ~1.2 GB
Type Statistical analysis
Numerical analysis
License Subscription or enterprise licensing [3]
Website www.ibm.com/products/spss-statistics

SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versions of the software released since 2015 have the brand name IBM SPSS Statistics.

Contents

The software name originally stood for Statistical Package for the Social Sciences (SPSS), [4] reflecting the original market, then later changed to Statistical Product and Service Solutions. [5] [6]

Overview

SPSS is a widely used program for statistical analysis in social science. [7] It is also used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations, data miners, [8] and others. The original SPSS manual (Nie, Bent & Hull, 1970) [9] has been described as one of "sociology's most influential books" for allowing ordinary researchers to do their own statistical analysis. [10] In addition to statistical analysis, data management (case selection, file reshaping and creating derived data) and data documentation (a metadata dictionary is stored in the datafile) are features of the base software.

The many features of SPSS Statistics are accessible via pull-down menus or can be programmed with a proprietary 4GL command syntax language. Command syntax programming has the benefits of reproducible output, simplifying repetitive tasks, and handling complex data manipulations and analyses. Additionally, some complex applications can only be programmed in syntax and are not accessible through the menu structure. The pull-down menu interface also generates command syntax: this can be displayed in the output, although the default settings have to be changed to make the syntax visible to the user. They can also be pasted into a syntax file using the "paste" button present in each menu. Programs can be run interactively or unattended, using the supplied Production Job Facility.

A "macro" language can be used to write command language subroutines. A Python programmability extension can access the information in the data dictionary and data and dynamically build command syntax programs. This extension, introduced in SPSS 14, replaced the less functional SAX Basic "scripts" for most purposes, although SaxBasic remains available. In addition, the Python extension allows SPSS to run any of the statistics in the free software package R. From version 14 onwards, SPSS can be driven externally by a Python or a VB.NET program using supplied "plug-ins". (From version 20 onwards, these two scripting facilities, as well as many scripts, are included on the installation media and are normally installed by default.)

SPSS Statistics places constraints on internal file structure, data types, data processing, and matching files, which together considerably simplify programming. SPSS datasets have a two-dimensional table structure, where the rows typically represent cases (such as individuals or households) and the columns represent measurements (such as age, sex, or household income). Only two data types are defined: numeric and text (or "string"). All data processing occurs sequentially case-by-case through the file (dataset). Files can be matched one-to-one and one-to-many, but not many-to-many. In addition to that cases-by-variables structure and processing, there is a separate Matrix session where one can process data as matrices using matrix and linear algebra operations.

The graphical user interface has two views which can be toggled. The 'Data View' shows a spreadsheet view of the cases (rows) and variables (columns). Unlike spreadsheets, the data cells can only contain numbers or text, and formulas cannot be stored in these cells. The 'Variable View' displays the metadata dictionary, where each row represents a variable and shows the variable name, variable label, value label(s), print width, measurement type, and a variety of other characteristics. Cells in both views can be manually edited, defining the file structure and allowing data entry without using command syntax. This may be sufficient for small datasets. Larger datasets such as statistical surveys are more often created in data entry software, or entered during computer-assisted personal interviewing, by scanning and using optical character recognition and optical mark recognition software, or by direct capture from online questionnaires. These datasets are then read into SPSS.

SPSS Statistics can read and write data from ASCII text files (including hierarchical files), other statistics packages, spreadsheets and databases. It can also read and write to external relational database tables via ODBC and SQL.

Statistical output is to a proprietary file format (*.spv file, supporting pivot tables) for which, in addition to the in-package viewer, a stand-alone reader can be downloaded. The proprietary output can be exported to text or Microsoft Word, PDF, Excel, and other formats. Alternatively, output can be captured as data (using the OMS command), as text, tab-delimited text, PDF, XLS, HTML, XML, SPSS dataset or a variety of graphic image formats (JPEG, PNG, BMP and EMF).

The SPSS logo used prior to the renaming in January 2010. SPSS logo.svg
The SPSS logo used prior to the renaming in January 2010.

Several variants of SPSS Statistics exist. SPSS Statistics Gradpacks are highly discounted versions sold only to students. [11] SPSS Statistics Server is a version of the software with a client/server architecture. Add-on packages can enhance the base software with additional features (examples include complex samples, which can adjust for clustered and stratified samples, and custom tables, which can create publication-ready tables). SPSS Statistics is available under either an annual or a monthly subscription license.

Version 25 of SPSS Statistics launched on August 8, 2017. This added new and advanced statistics, such as random effects solution results (GENLINMIXED), robust standard errors (GLM/UNIANOVA), and profile plots with error bars within the Advanced Statistics and Custom Tables add-on. V25 also includes new Bayesian statistics capabilities, a method of statistical inference, and publication ready charts, such as powerful new charting capabilities, including new default templates and the ability to share with Microsoft Office applications. [12]

Versions and ownership history

SPSS was released in its first version in 1968 as the Statistical Package for the Social Sciences (SPSS) after being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull. Those principals incorporated as SPSS Inc. in 1975. Early versions of SPSS Statistics were written in Fortran and designed for batch processing on mainframes, including for example IBM and ICL versions, originally using punched cards for data and program input. A processing run read a command file of SPSS commands and either a raw input file of fixed-format data with a single record type, or a 'getfile' of data saved by a previous run. To save precious computer time an 'edit' run could be done to check command syntax without analysing the data. From version 10 (SPSS-X) in 1983, data files could contain multiple record types.

Prior to SPSS 16.0, different versions of SPSS were available for Windows, Mac OS X and Unix.

SPSS Statistics version 13.0 for Mac OS X was not compatible with Intel-based Macintosh computers, due to the Rosetta emulation software causing errors in calculations. SPSS Statistics 15.0 for Windows needed a downloadable hotfix to be installed in order to be compatible with Windows Vista.

From version 16.0, the same version runs under Windows, Mac, and Linux. The graphical user interface is written in Java. The Mac OS version is provided as a Universal binary, making it fully compatible with both PowerPC and Intel-based Mac hardware.

SPSS Inc announced on July 28, 2009, that it was being acquired by IBM for US$1.2 billion. [19] Because of a dispute about ownership of the name "SPSS", between 2009 and 2010, the product was referred to as PASW (Predictive Analytics SoftWare). [20] As of January 2010, it became "SPSS: An IBM Company". Complete transfer of business to IBM was done by October 1, 2010. By that date, SPSS: An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM Corporation, and is one of the brands under IBM Software Group's Business Analytics Portfolio, together with IBM Algorithmics, IBM Cognos and IBM OpenPages.

Companion software in the "IBM SPSS" family are used for data mining and text analytics (IBM SPSS Modeler), realtime credit scoring services (IBM SPSS Collaboration and Deployment Services), and structural equation modeling (IBM SPSS Amos).

SPSS Data Collection and SPSS Dimensions were sold in 2015 to UNICOM Systems, Inc., a division of UNICOM Global, and merged into the integrated software suite UNICOM Intelligence (survey design, survey deployment, data collection, data management and reporting). [21] [22] [23]

IDA (Interactive Data Analysis)

IDA (Interactive Data Analysis) [24] was a software package that originated at what was formerly the National Opinion Research Center (NORC), at the University of Chicago. Initially offered on the HP-2000, [25] somewhat later, under the ownership of SPSS, it was also available on MUSIC/SP. [26] Regression analysis was one of IDA's strong points. [25]

SCSS - Conversational / Columnar SPSS

SCSS was a software product intended for online use of IBM mainframes. [27]

Although the "C" was for "conversational", it also represented a distinction regarding how the data was stored: it used a column-oriented rather than a row-oriented (internal) database.[ citation needed ]

This gave good interactive response time for the SPSS Conversational Statistical System (SCSS), whose strong point, as with SPSS, was Cross-tabulation. [28]

Project NX

In October, 2020 IBM announced the start of an Early Access Program for the "New SPSS Statistics", codenamed Project NX. [29] [30] It contains "many of your favorite SPSS capabilities presented in a new easy to use interface, with integrated guidance, multiple tabs, improved graphs and much more".

In December, 2021, IBM opened up the Early Access Program for the next generation of SPSS Statistics for more users and shared more visuals about it. [31] [32]

See also

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

FOCUS is a fourth-generation programming language (4GL) computer programming language and development environment that is used to build database queries. Produced by Information Builders Inc., it was originally developed for data handling and analysis on the IBM mainframe. Subsequently versions for minicomputers and such as the VAX and other platforms were implemented. FOCUS was later extended to personal computers and to the World Wide Web: the WebFOCUS product.

<span class="mw-page-title-main">SAS (software)</span> Statistical software

SAS is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS' analytical software is built upon artificial intelligence and utilizes machine learning, deep learning and generative AI to manage and model data. The software is widely used in industries such as finance, insurance, health care and education.

gretl

gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for GnuRegression, Econometrics and Time-seriesLibrary.

<span class="mw-page-title-main">RKWard</span> Integrated development environment for R

RKWard is a transparent front-end to the R programming language, a scripting-language with a strong focus on statistics functions. RKWard tries to combine the power of the R language with the ease of use of commercial statistical packages.

<span class="mw-page-title-main">Stata</span> Statistical software package

Stata is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology.

Statistica is an advanced analytics software package originally developed by StatSoft and currently maintained by TIBCO Software Inc. Statistica provides data analysis, data management, statistics, data mining, machine learning, text analytics and data visualization procedures.

<span class="mw-page-title-main">EViews</span>

EViews is a statistical package for Windows, used mainly for time-series oriented econometric analysis. It is developed by Quantitative Micro Software (QMS), now a part of IHS. Version 1.0 was released in March 1994, and replaced MicroTSP. The TSP software and programming language had been originally developed by Robert Hall in 1965. The current version of EViews is 13, released in August 2022.

<span class="mw-page-title-main">World Programming System</span> Data analysis software

The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming.

<span class="mw-page-title-main">PSPP</span> Data analysis software

PSPP is a free software application for analysis of sampled data, intended as a free alternative for IBM SPSS Statistics. It has a graphical user interface and conventional command-line interface. It is written in C and uses GNU Scientific Library for its mathematical routines. The name has "no official acronymic expansion".

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium.

<span class="mw-page-title-main">SPSS Modeler</span> Data analytics software

IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

Free statistical software is a practical alternative to commercial packages. Many of the free to use programs aim to be similar in function to commercial packages, in that they are general statistical packages that perform a variety of statistical analyses. Many other free to use programs were designed specifically for particular functions, like factor analysis, power analysis in sample size calculations, classification and regression trees, or analysis of missing data.

Revolution Analytics is a statistical software company focused on developing open source and "open-core" versions of the free and open source software R for enterprise, academic and analytics customers. Revolution Analytics was founded in 2007 as REvolution Computing providing support and services for R in a model similar to Red Hat's approach with Linux in the 1990s as well as bolt-on additions for parallel processing. In 2009 the company received nine million in venture capital from Intel along with a private equity firm and named Norman H. Nie as their new CEO. In 2010 the company announced the name change as well as a change in focus. Their core product, Revolution R, would be offered free to academic users and their commercial software would focus on big data, large scale multiprocessor computing, and multi-core functionality.

<span class="mw-page-title-main">Anaconda (Python distribution)</span> Distribution of the Python and R languages for scientific computing

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS. It is developed and maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it is also known as Anaconda Distribution or Anaconda Individual Edition, while other products from the company are Anaconda Team Edition and Anaconda Enterprise Edition, neither of which are free.

References

  1. "What's New in SPSS Statistics 29". community.ibm.com. 12 September 2022.
  2. "IBM SPSS Statistics 25.0.0.2 - Detailed System Requirements". www.ibm.com. February 1, 2010.
  3. "Pricing - IBM SPSS Statistics". www.ibm.com.
  4. Quintero, Dino; et al. (30 September 2016). "Workload Optimized Systems: Tuning POWER7 for Analytics". Abstract.
  5. "Statistical Product and Service Solutions (SPSS) Statistics". www.oit.va.gov.
  6. Hejase, A.J., & Hejase, H.J. (2013). Research Methods, A Practical Approach for Business Students (2nd edn.). Philadelphia, PA, USA: Masadir Inc., p. 58
  7. Gunarto, Hary (2019). Parametric & Nonparametric Data Analysis for Social Research: IBM SPSS. LAP Academic Publishing. ISBN   978-6200118721.
  8. "KDnuggets Annual Software Poll: Analytics/Data mining software used?". KDnuggets. May 2013.
  9. Nie, Norman H; Bent, Dale H; Hadlai Hull, C (1970). SPSS: Statistical package for the social sciences. McGraw-Hill. ISBN   9780070465305.
  10. Wellman. 1998. pp. 71–78.
  11. "IBM Products". www.ibm.com. 2020-11-09. Retrieved 2023-04-12.
  12. "What's New in SPSS Statistics 25 & Subscription - SPSS Predictive Analytics". SPSS Predictive Analytics. 18 July 2017. Retrieved 15 December 2017.
  13. "SPSS Statistics - IBM Data Science Community". community.ibm.com. Retrieved 2021-06-30.
  14. "SPSS Statistics - IBM Data Science Community". community.ibm.com. Retrieved 2021-06-30.
  15. "SPSS Statistics 27 - What's New | New Features, Functionality and Packaging overview". community.ibm.com. Retrieved 2021-06-30.
  16. "SPSS Statistics - IBM Data Science Community". community.ibm.com. Retrieved 2021-06-30.
  17. "SPSS Statistics - IBM Data Science Community". community.ibm.com. Retrieved 2021-06-30.
  18. "IBM SPSS Statistics_29.0.x". www.ibm.com. 2022-09-13. Retrieved 2022-10-24.
  19. Larry Dignan (July 28, 2009), "IBM to pay US$1.2 billion for SPSS", zdnet.com
  20. Sachdev, Ameet (September 27, 2009). "IBM's $1.2 billion bid for SPSS Inc. helps resolve trademark dispute". Chicago Tribune.
  21. "IBM SPSS Data Collection Divestiture". 2 February 2016. Retrieved 7 June 2017.
  22. "UNICOM Global Acquires IBM Data Collection Suite from IBM Corp". 31 October 2015. Retrieved 7 June 2017.
  23. "UNICOM Systems TeamBLUE: UNICOM Intelligence". Teamblue.unicomsi.com. Retrieved 19 August 2019.
  24. or Analyzer
  25. 1 2 Ling, Robert F; Roberts, Harry V (1975). "IDA: An Approach to Interactive Data Analysis in Teaching". The Journal of Business. 48 (3): 411–451. doi:10.1086/295765. JSTOR   2352233.
  26. "IDA Statistical Package Available on MUSIC" (PDF). Benchmarks. Vol. 6, no. 5. North Texas State University. May 1985. Archived from the original (PDF) on 2018-08-08.
  27. Nie, Norman H. (1980). SCSS: A User's Guide to the SPSS Conversational Statistical System. McGraw-Hill. ISBN   978-0070465336.
  28. "SCSS from SPSS, Inc". ComputerWorld. September 26, 1977. p. 28.
  29. "SPSS Statistics - IBM Data Science Community". community.ibm.com. Retrieved 2021-06-30.
  30. "IBM SPSS Statistics Subscription Early Access - Project NX". www.surveygizmo.com. Retrieved 2021-06-30.
  31. "Experience the next generation: IBM SPSS Statistics Early Access Program". community.ibm.com. 2021-12-13. Retrieved 2021-12-15.
  32. "SPSS Statistics Early Access Program - Overview". IBM MediaCenter. Retrieved 2021-12-15.

Further reading