Sina plot

Last updated
A violin plot (on the left) and a sina plot (on the right) for the same sample. Sina plot ver1.png
A violin plot (on the left) and a sina plot (on the right) for the same sample.

A sina plot is a type of diagram in which numerical data are depicted by points distributed in such a way that the width of the point distribution is proportional to the kernel density. [1] [2] Sina plots are similar to violin plots, but while violin plots depict kernel density, sina plots depict the points themselves. In some situations, sina plots may be preferable to violin plots, because sina plots contain more information. [3] [4]

There are a number of ways to create sina plots, in particular:

See also

Related Research Articles

<span class="mw-page-title-main">Python (programming language)</span> General-purpose programming language

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

<span class="mw-page-title-main">Wolfram Mathematica</span> Computational software program

Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California.

The Graphical Kernel System (GKS) was the first ISO standard for low-level computer graphics, introduced in 1977. A draft international standard was circulated for review in September 1983. Final ratification of the standard was achieved in 1985.

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

<span class="mw-page-title-main">GNU Data Language</span>

The GNU Data Language (GDL) is a free alternative to IDL, achieving full compatibility with IDL 7 and partial compatibility with IDL 8. Together with its library routines, GDL is developed to serve as a tool for data analysis and visualization in such disciplines as astronomy, geosciences, and medical imaging. GDL is licensed under the GPL. Other open-source numerical data analysis tools similar to GDL include Julia, Jupyter Notebook, GNU Octave, NCAR Command Language (NCL), Perl Data Language (PDL), R, Scilab, SciPy, and Yorick.

<span class="mw-page-title-main">Kernel density estimation</span> Estimator

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.

<span class="mw-page-title-main">Origin (data analysis software)</span> Scientific data analysis software

Origin is a proprietary computer program for interactive scientific graphing and data analysis. It is produced by OriginLab Corporation, and runs on Microsoft Windows. It has inspired several platform-independent open-source clones and alternatives like LabPlot and SciDAVis.

In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

<span class="mw-page-title-main">IPython</span> Advanced interactive shell for Python

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions . DBSCAN is one of the most common, and most commonly cited, clustering algorithms.

<span class="mw-page-title-main">Violin plot</span> Method of plotting numeric data

A violin plot is a statistical graphic for comparing probability distributions. It is similar to a box plot, with the addition of a rotated kernel density plot on each side.

<span class="mw-page-title-main">Go (programming language)</span> Programming language

Go is a statically typed, compiled high-level programming language designed at Google by Robert Griesemer, Rob Pike, and Ken Thompson. It is syntactically similar to C, but also has memory safety, garbage collection, structural typing, and CSP-style concurrency. It is often referred to as Golang because of its former domain name, golang.org, but its proper name is Go.

ggplot2 Data visualization package for R

ggplot2 is an open-source data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages.

<span class="mw-page-title-main">Julia (programming language)</span> Dynamic programming language

Julia is a high-level, general-purpose dynamic programming language, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, efficient garbage collection, and a just-in-time (JIT) compiler.

<span class="mw-page-title-main">Plotly</span> Canadian computing company

Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, JavaScript and REST.

In computing, an array of structures (AoS), structure of arrays (SoA) or array of structures of arrays (AoSoA) are contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD and SIMT programming.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

References

  1. Sidiropoulos, Nikos; Sohi, Sina Hadi; Pedersen, Thomas Lin; Porse, Bo Torben; Winther, Ole; Rapin, Nicolas; Bagger, Frederik Otzen (2018-07-03). "SinaPlot: An Enhanced Chart for Simple and Truthful Representation of Single Observations Over Multiple Classes". Journal of Computational and Graphical Statistics. 27 (3): 673–676. doi:10.1080/10618600.2017.1366914. ISSN   1061-8600.
  2. Wilke, Claus (2019). Fundamentals of data visualization: a primer on making informative and compelling figures (First ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN   978-1-4920-3108-6.
  3. "Sinaplot vs Violin plot - Why Sinaplot is better than Violinplot in R". GeeksforGeeks. 2023-02-03. Retrieved 2023-11-18.
  4. datavizpyr (2021-03-28). "Sinaplot vs Violin plot: Why Sinaplot is better than Violinplot - Data Viz with Python and R" . Retrieved 2023-11-18.
  5. https://ggforce.data-imaginist.com/reference/geom_sina.html
  6. https://cran.r-project.org/web/packages/sinaplot/vignettes/SinaPlot.html
  7. https://plotnine.readthedocs.io/en/stable/generated/plotnine.geoms.geom_sina.html