Mondrian (software)

Last updated
Mondrian
Developer Martin Theus
First appeared1997
Stable release
1.2 / January 11, 2011;11 years ago (2011-01-11)
Preview release
1.5b / August 29, 2013;8 years ago (2013-08-29)
OS Windows, macOS, Linux
License GNU GPL 3+
Website www.theusrus.de/Mondrian/

Mondrian is a general-purpose statistical data-visualization system, for interactive data visualization.

Contents

All plots in Mondrian are fully linked, and offer various interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots. Currently implemented plots comprise Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x. [1] Mondrian works with data in standard tab-delimited or comma-separated ASCII files and can load data from R workspaces. There is basic support for working directly on data in databases. Mondrian links to R and offers statistical procedures like interactive density estimation, scatterplot smoothers, multidimensional scaling (MDS) and principal component analysis (PCA).

Overview

Starting in 1997, Mondrian was first developed with a focus on visualization techniques for categorical data and enhanced selection techniques. Over the years, a complete suite of visualizations for univariate and multivariate data measured on any scale were added. The link to R offers well tested statistical procedures, which integrate seamlessly into the interactive graphics. Today, even geographical data is supported with highly interactive maps.

Mondrian details

Last stable and beta versions, help and documentations are available on the developer web site, Martin Theus

Supported data sources

Mondrian works on plain text files with tab-separated columns with variable header, as exported from Microsoft Excel as ".txt". If the Rserve link and R are present, Mondrian also reads data directly from R workspace files (.RData files).

Visualizations

Interaction techniques

Mondrian supports Query, Select, and Modify.

See also

Related Research Articles

Chart Graphical representation of data

A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.

Scatter plot Plot using the dispersal of scattered dots to show the relationship between variables

A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

In statistics, exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

Parallel coordinates Chart displaying multivariate data

Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.

Statistica is an advanced analytics software package originally developed by StatSoft and currently maintained by TIBCO Software Inc. Statistica provides data analysis, data management, statistics, data mining, machine learning, text analytics and data visualization procedures.

Orange (software)

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative rapid qualitative data analysis and interactive data visualization.

Data and information visualization Creation and study of the visual representation of data

Data and information visualization is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is numerous as for example a time series.

The Unistat computer program is a statistical data analysis tool featuring two modes of operation: The stand-alone user interface is a complete workbench for data input, analysis and visualization while the Microsoft Excel add-in mode extends the features of the mainstream spreadsheet application with powerful analytical capabilities.

GeoDa

GeoDa is a free software package that conducts spatial data analysis, geovisualization, spatial autocorrelation and spatial modeling.

GGobi is a free statistical software tool for interactive data visualization. GGobi allows extensive exploration of the data with Interactive dynamic graphics. It is also a tool for looking at multivariate data. R can be used in sync with GGobi. The GGobi software can be embedded as a library in other programs and program packages using an application programming interface (API) or as an add-on to existing languages and scripting environments, e.g., with the R command line or from a Perl or Python scripts. GGobi prides itself on its ability to link multiple graphs together.

Biplot

Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot overlays a score plot with a loading plot. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables.

Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.

Dap is a statistics and graphics program based on the C programming language that performs data management, analysis, and C-style graphical visualization tasks without requiring complex syntax.

Michael Friendly

Michael Louis Friendly is an American-Canadian psychologist, Professor of Psychology at York University in Ontario, Canada, and director of its Statistical Consulting Service, especially known for his contributions to graphical methods for categorical and multivariate data, and on the history of data and information visualisation.

Plot (graphics)

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

Interactive Visual Analysis (IVA) is a set of techniques for combining the computational power of computers with the perceptive and cognitive capabilities of humans, in order to extract knowledge from large and complex datasets. The techniques rely heavily on user interaction and the human visual system, and exist in the intersection between visual analytics and big data. It is a branch of data visualization. IVA is a suitable technique for analyzing high-dimensional data that has a large number of data points, where simple graphing and non-interactive techniques give an insufficient understanding of the information.

Mosaic plot Data visualization

A mosaic plot, Marimekko chart, or sometimes percent stacked bar plot is a graphical visualization of data from two or more qualitative variables. It is the multidimensional extension of spineplots, which graphically display the same information for only one variable. It gives an overview of the data and makes it possible to recognize relationships between different variables. For example, independence is shown when the boxes across categories all have the same areas. Mosaic plots were introduced by Hartigan and Kleiner in 1981 and expanded on by Friendly in 1994. Mosaic plots are also called Marimekko or Mekko charts because they resemble some Marimekko prints. However, in statistical applications, mosaic plots can be colored and shaded according to deviations from independence, whereas Marimekko charts are colored according to the category levels, as in the image at the right.

Heike Hofmann is a statistician and Professor in the Department of Statistics at Iowa State University.

References

  1. Theus, Martin (August 29, 2013). "Mondrian - Interactive Statistical Data Visualization in JAVA". Martin Theus. Retrieved January 3, 2015.

Further reading