List of statistical packages

Last updated August 26, 2019

Statistical software are specialized computer programs for analysis in statistics and econometrics.

A computer program is a collection of instructions that performs a specific task when executed by a computer. Most computer devices require programs to function properly.

Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.

Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference". An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships". The first known use of the term "econometrics" was by Polish economist Paweł Ciompa in 1910. Jan Tinbergen is considered by many to be one of the founding fathers of econometrics. Ragnar Frisch is credited with coining the term in the sense in which it is used today.

Open-source

ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
ADMB – a software suite for non-linear statistical modeling based on C++ which uses automatic differentiation
Bayesian Filtering Library
Chronux – for neurobiological time series data
DAP – free replacement for SAS
Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) a software framework for developing data mining algorithms in Java
Fityk – nonlinear regression software (GUI and command line)
GNU Octave – programming language very similar to MATLAB with statistical features
gretl – gnu regression, econometrics and time-series library
intrinsic Noise Analyzer (iNA) – For analyzing intrinsic fluctuations in biochemical systems
JASP – A free software alternative to IBM SPSS Statistics with additional option for Bayesian methods
Just another Gibbs sampler (JAGS) – a program for analyzing Bayesian hierarchical models using Markov chain Monte Carlo developed by Martyn Plummer. It is similar to WinBUGS
JMulTi – For econometric analysis, specialised in univariate and multivariate time series analysis
KNIME - An open source analytics platform built with Java and Eclipse using modular data pipeline workflows
LIBSVM – C++ support vector machine libraries
mlpack – open-source library for machine learning, exploits C++ language features to provide maximum performance and flexibility while providing a simple and consistent application programming interface (API)
Mondrian – data analysis tool using interactive statistical graphics with a link to R
Neurophysiological Biomarker Toolbox – Matlab toolbox for data-mining of neurophysiological biomarkers
OpenBUGS
OpenEpi – A web-based, open-source, operating-independent series of programs for use in epidemiology and statistics based on JavaScript and HTML
OpenNN – A software library written in the programming language C++ which implements neural networks, a main area of deep learning research
OpenMx – A package for structural equation modeling running in R (programming language)
Orange, a data mining, machine learning, and bioinformatics software
Pandas – High-performance computing (HPC) data structures and data analysis tools for Python in Python and Cython (statsmodels, scikit-learn)
Perl Data Language – Scientific computing with Perl
Ploticus – software for generating a variety of graphs from raw data
PSPP – A free software alternative to IBM SPSS Statistics
R – free implementation of the S (programming language)
- Programming with Big Data in R (pbdR) – a series of R packages enhanced by SPMD parallelism for big data analysis
- R Commander – GUI interface for R
- Rattle GUI – GUI interface for R
- Revolution Analytics – production-grade software for the enterprise big data analytics
- RStudio – GUI interface and development environment for R
Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with big data by using high-performance statistical computation. The pbdR uses the same programming language as R with S3/S4 classes and methods which is used among statisticians and data miners for developing statistical software. The significant difference between pbdR and R code is that pbdR mainly focuses on distributed memory systems, where data are distributed across several processors and analyzed in a batch mode, while communications between processors are based on MPI that is easily used in large high-performance computing (HPC) systems. R system mainly focuses on single multi-core machines for data analysis via an interactive mode such as GUI interface.
In computing, SPMD is a technique employed to achieve parallelism; it is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster. SPMD is the most common style of parallel programming. It is also a prerequisite for research concepts such as active messages and distributed shared memory.
"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. When we handle big data, we may not sample but simply observe and track what happens. Therefore, big data often includes data with sizes that exceed the capacity of traditional usual software to process within an acceptable time and value.
ROOT – an open-source C++ system for data storage, processing and analysis, developed by CERN and used to find the Higgs boson
Salstat – menu-driven statistics software
Scilab – uses GPL-compatible CeCILL license
SciPy – Python library for scientific computing that contains the stats sub-package which is partly based on the venerable |STAT (a.k.a. PipeStat, formerly UNIX|STAT) software
- scikit-learn – extends SciPy with a host of machine learning models (classification, clustering, regression, etc.)
- statsmodels – extends SciPy with statistical models and tests (regression, plotting, example datasets, generalized linear model (GLM), time series analysis, autoregressive–moving-average model (ARMA), vector autoregression (VAR), non-parametric statistics, ANOVA, empirical likelihood)
Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Statsmodels is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. It complements SciPy's stats module.
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Shogun (toolbox) – open-source, large-scale machine learning toolbox that provides several SVM (Support Vector Machine) implementations (like libSVM, SVMlight) under a common framework and interfaces to Octave, MATLAB, Python, R
Simfit – simulation, curve fitting, statistics, and plotting
SOCR
SOFA Statistics – desktop GUI program focused on ease of use, learn as you go, and beautiful output
Stan (software) – open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It is somewhat like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors
Statistical Lab – R-based and focusing on educational purposes
Torch (machine learning) – a deep learning software library written in Lua (programming language)
Weka (machine learning) – a suite of machine learning software written at the University of Waikato

ADaMSoft is a free and open-source statistical software developed in Java and can run on any platform supporting Java.

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine-learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.

ADMB or AD Model Builder is a free and open source software suite for non-linear statistical modeling. It was created by David Fournier and now being developed by the ADMB Project, a creation of the non-profit ADMB Foundation. The "AD" in AD Model Builder refers to the automatic differentiation capabilities that come from the AUTODIF Library, a C++ language extension also created by David Fournier, which implements reverse mode automatic differentiation. A related software package, ADMB-RE, provides additional support for modeling random effects.

Public domain

Freeware

BV4.1
GeoDA
MaxStat Lite – general statistical software
MINUIT
WinBUGS – Bayesian analysis using Markov chain Monte Carlo methods
Winpepi – package of statistical programs for epidemiologists

MaxStat is a statistical analysis software platform specifically designed for students and researchers with little background in statistics. It was developed in Germany by MaxStat Software.

MINUIT, now MINUIT2, is a numerical minimization computer program originally written in the FORTRAN programming language by CERN staff physicist Fred James in the 1970s. The program searches for minima in a user-defined function with respect to one or more parameters using several different methods as specified by the user. The original FORTRAN code was later ported to C++ by the ROOT project; both the FORTRAN and C++ versions are in use today. The program is very widely used in particle physics, and hundreds of published papers cite use of MINUIT. In the early 2000s Fred James started a project to implement MINUIT in C++ using object-oriented programming. The new MINUIT is an optional package (minuit2) in the ROOT release. As of October 2014 the latest version is 5.34.14, released on 24 January 2014. There is also a Java port as well as several Python ports.

WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo (MCMC) methods.

Proprietary

Alteryx - analytics platform with drag and drop statistical models; R and Python integration
Analytica – visual analytics and statistics package
Angoss – products KnowledgeSEEKER and KnowledgeSTUDIO incorporate several data mining algorithms
ASReml – for restricted maximum likelihood analyses
BMDP – general statistics package
DB Lytix – 800+ in-database models
EViews – for econometric analysis
FAME (database) – a system for managing time-series databases
GAUSS – programming language for statistics
Genedata – software solution for integration and interpretation of experimental data in the life science R&D
GenStat – general statistics package
GLIM – early package for fitting generalized linear models
GraphPad InStat – very simple with lots of guidance and explanations
GraphPad Prism – biostatistics and nonlinear regression with clear explanations
IMSL Numerical Libraries – software library with statistical algorithms
JMP – visual analysis and statistics package
Lertap 5- an Excel application used to analyze responses to tests and surveys (free for students)
LIMDEP – comprehensive statistics and econometrics package
LISREL – statistics package used in structural equation modeling
Maple – programming language with statistical features
Mathematica – a software package with statistical particularlyŋ features
MATLAB – programming language with statistical features
MaxStat Pro – general statistical software
MedCalc – for biomedical sciences
Microfit – econometrics package, time series
Minitab – general statistics package
MLwiN – multilevel models (free to UK academics)
NAG Numerical Library – comprehensive math and statistics library
Neural Designer – commercial deep learning package
NCSS – general statistics package
NLOGIT – comprehensive statistics and econometrics package
NMath Stats – statistical package for .NET Framework
nQuery Sample Size Software – Sample Size and Power Analysis Software^[1]
O-Matrix – programming language
OriginPro – statistics and graphing, programming access to NAG library
PASS Sample Size Software (PASS) – power and sample size software from NCSS
Plotly – plotting library and styling interface for analyzing data and creating browser-based graphs. Available for R, Python, MATLAB, Julia, and Perl
Primer-E Primer – environmental and ecological specific
PV-WAVE – programming language comprehensive data analysis and visualization with IMSL statistical package
Qlucore Omics Explorer – interactive and visual data analysis software
Quantum Programming Language – part of the SPSS MR product line, mostly for data validation and tabulation in Marketing and Opinion Research
RapidMiner – machine learning toolbox
Regression Analysis of Time Series (RATS) – comprehensive econometric analysis package
SAS (software) – comprehensive statistical package
SHAZAM (Econometrics and Statistics Software) – comprehensive econometrics and statistics package
Simul – econometric tool for multidimensional (multi-sectoral, multi-regional) modeling
SigmaStat – package for group analysis
SmartPLS – statistics package used in partial least squares path modeling (PLS) and PLS-based structural equation modeling
SOCR – online tools for teaching statistics and probability theory
Speakeasy (computational environment) – numerical computational environment and programming language with many statistical and econometric analysis features
SPSS Modeler – comprehensive data mining and text analytics workbench
SPSS Statistics – comprehensive statistics package
Stata – comprehensive statistics package
StatCrunch – comprehensive statistics package
Statgraphics – general statistics package to include cloud computing and Six Sigma for use in business development, process improvement, data visualization and statistical analysis, design of experiment, point processes, geospatial analysis, regression, and time series analysis are all included within this complete statistical package.
Statistica – comprehensive statistics package
StatsDirect – statistics package designed for biomedical, public health and general health science uses
StatXact – package for exact nonparametric and parametric statistics
Systat – general statistics package
SuperCROSS – comprehensive statistics package with ad-hoc, cross tabulation analysis
S-PLUS – general statistics package
Unistat – general statistics package that can also work as Excel add-in
The Unscrambler – free-to-try commercial multivariate analysis software for Windows
WarpPLS – statistics package used in structural equation modeling
Wolfram Language ^[2] – the computer language that evolved from the program Mathematica. It has similar statistical capabilities as Mathematica.
World Programming System (WPS) – statistical package that supports the use of Python, R and SAS languages within in a single user program.
XploRe

Alteryx is an American computer software company based in Irvine, California, with a development center in Broomfield, Colorado. The company's products are used for data science and analytics. The software is designed to make advanced analytics accessible to any data worker.

Analytica is a visual software package developed by Lumina Decision Systems for creating, analyzing and communicating quantitative decision models. As a modeling environment, it is interesting in the way it combines hierarchical influence diagrams for visual creation and view of models, intelligent arrays for working with multidimensional data, Monte Carlo simulation for analyzing risk and uncertainty, and optimization, including linear and nonlinear programming. Its design, especially its influence diagrams and treatment of uncertainty, is based on ideas from the field of decision analysis. As a computer language, it is notable in combining a declarative (non-procedural) structure for referential transparency, array abstraction, and automatic dependency maintenance for efficient sequencing of computation.

Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, is a provider of predictive analytics systems through software licensing and services. Angoss' customers represent industries including finance, insurance, mutual funds, retail, health sciences, telecom and technology. The company was founded in 1984, and publicly traded on the TSX Venture Exchange from 2008-2013 under the ticker symbol ANC. In June 2013 the private equity firm Peterson Partners acquired Angoss for $8.4 million.

Add-ons

Analyse-it – add-on to Microsoft Excel for statistical analysis
NumXL – add-on to Microsoft Excel for statistical and time series analysis
SigmaXL – add-on to Microsoft Excel for statistical and graphical analysis
SPC XL – add-on to Microsoft Excel for general statistics
Statgraphics Sigma Express – add-on to Microsoft Excel for Six Sigma statistical analysis
SUDAAN – add-on to SAS and SPSS for statistical surveys
XLfit add-on to Microsoft Excel for curve fitting and statistical analysis

References

↑ Hickey, Graeme L.; Grant, Stuart W.; Dunning, Joel; Siepe, Matthias (2018). "Statistical primer: Sample size and power calculations—why, when and how?†". European Journal of Cardio-Thoracic Surgery. 54: 4–9. doi:10.1093/ejcts/ezy169. PMC 6005113 .
↑ Wolfram Language Guide: Statistical Data Analysis

External links

Statistics software at Curlie

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hickey, Graeme L.; Grant, Stuart W.; Dunning, Joel; Siepe, Matthias (2018). "Statistical primer: Sample size and power calculations—why, when and how?†". European Journal of Cardio-Thoracic Surgery. 54: 4–9. doi:10.1093/ejcts/ezy169. PMC 6005113 .

[2] Wolfram Language Guide: Statistical Data Analysis