Orange (software)

Orange
Developer(s)	University of Ljubljana
Initial release	10 October 1996;28 years ago
Stable release	3.38.1 / 23 December 2024;31 days ago
Repository	Orange Repository
Written in	Python, Cython, C++, C
Operating system	Cross-platform
Type	Machine learning, Data mining, Data visualization, Data analysis
License	GPLv3 or later
Website	orangedatamining.com

Last updated January 24, 2025

A typical workflow in Orange 3 Typical-workflow.png — A typical workflow in Orange 3

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for exploratory qualitative data analysis and interactive data visualization.

Description

Orange is a component-based visual programming software package for data visualization, machine learning, data mining, and data analysis.

Orange components are called widgets. They range from simple data visualization, subset selection, and preprocessing to empirical evaluation of learning algorithms and predictive modeling.

Visual programming is implemented through an interface in which workflows are created by linking predefined or user-designed widgets, while advanced users can use Orange as a Python library for data manipulation and widget alteration.^[5]

Software

Orange is an open-source software package released under GPL and hosted on GitHub. Versions up to 3.0 include core components in C++ with wrappers in Python. From version 3.0 onwards, Orange uses common Python open-source libraries for scientific computing, such as numpy, scipy and scikit-learn, while its graphical user interface operates within the cross-platform Qt framework.

The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets (data, transform, visualize, model, evaluate and unsupervised). Additional functionalities are available as add-ons (text-mining, image analytics, bioinformatics, etc.).

Orange is supported on macOS, Windows and Linux and can also be installed from the Python Package Index repository (pip install Orange3).

Features

Orange consists of a canvas interface onto which the user places widgets and creates a data analysis workflow. Widgets offer basic functionalities such as reading the data, showing a data table, selecting features, training predictors, comparing learning algorithms, visualizing data elements, etc. The user can interactively explore visualizations or feed the selected subset into other widgets.

A decorated dendrogram in Orange 3 Dendrogram-decorated.png — A decorated dendrogram in Orange 3

Canvas: graphical front-end for data analysis
Widgets:
- Data: widgets for data input, data filtering, sampling, imputation, feature manipulation and feature selection
- Visualize: widgets for common visualization (box plot, histograms, scatter plot) and multivariate visualization (mosaic display, sieve diagram).
- Classify: a set of supervised machine learning algorithms for classification
- Regression: a set of supervised machine learning algorithms for regression
- Evaluate: cross-validation, sampling-based procedures, reliability estimation and scoring of prediction methods
- Unsupervised: unsupervised learning algorithms for clustering (k-means, hierarchical clustering) and data projection techniques (multidimensional scaling, principal component analysis, correspondence analysis).

Add-ons

Orange users can extend their core set of components with components in the add-ons. Supported add-ons include:

Associate: components for mining frequent itemsets and association rule learning.
Bioinformatics: components for gene expression analysis, enrichment, and access to expression databases (e.g., Gene Expression Omnibus) and pathway libraries.
Data fusion: components for fusing different data sets, collective matrix factorization, and exploration of latent factors.
Educational: components for teaching machine learning concepts, such as k-means clustering, polynomial regression, stochastic gradient descent, ...
Explain: provides an extension with components for the model explanation, including Shapley value analysis
Geo: components for working with geospatial data.
Image analytics: components for working with images and ImageNet embeddings
Network: components for graph and network analysis.
Text mining: components for natural language processing and text mining.
Time series: widget components for time series analysis and modeling.
Single-cell: support for single-cell gene expression analysis, including components for loading single-cell data, filtering and batch effect removal, marker genes discovery, scoring of cells and genes, and cell type prediction.

Kaplan-Meier Plot from Survival Analysis data plots survival curves and supports interactive selection of cases.
Spectroscopy: components for analyzing and visualization of (hyper)spectral datasets.^[6]
Survival analysis: add-on for data analysis dealing with survival data. It includes widgets for standard survival analysis techniques, such as the Kaplan-Meier plot, the Cox regression model, and several derivative widgets.
World Happiness: support for downloading socioeconomic data from a database, including OECD and World Development Indicators. Provides access to thousands of country indicators from various economic databases.
Fairness: add-on for evaluation and creation of fair machine learning models without discrimination. Widgets range from computing fairness metrics like statistical parity to post-, pre-, in-processing methods to build fair models.^[7]

Objectives

The program provides a platform for experiment selection, recommendation systems, and predictive modelling and is used in biomedicine, bioinformatics, genomic research, and teaching. In science, it is used as a platform for testing new machine learning algorithms and for implementing new techniques in genetics and bioinformatics. In education, it was used for teaching machine learning and data mining methods to students of biology, biomedicine, and informatics.

Extensions

Various projects build on Orange either by extending the core components with add-ons or using only the Orange Canvas to exploit the implemented visual programming features and GUI.

OASYS — ORange SYnchrotron Suite^[8]
scOrange — single cell biostatistics
Quasar — data analysis in natural sciences

History

In 1996, the University of Ljubljana and Jožef Stefan Institute started development of ML*, a machine learning framework in C++, and Python bindings were developed for this framework in 1997, which, together with emerging Python modules, formed a joint framework called Orange. Over the following years, most contemporary major algorithms for data mining and machine learning were implemented in C++ (Orange's core) or Python modules.

In 2002, first prototypes to create a flexible graphical user interface were designed using Pmw Python megawidgets.
In 2003, the graphical user interface was redesigned and re-developed for Qt framework using PyQt Python bindings. The visual programming framework was defined, and the development of widgets (graphical components of the data analysis pipeline) began.
In 2005, extensions for data analysis in bioinformatics was created.
In 2008, Mac OS X DMG and Fink-based installation packages were developed.
In 2009, over 100 widgets were created and maintained.
In 2009, Orange 2.0 beta was released, offering installation packages on the website based on the daily compiling cycle.
In 2012, a new object hierarchy was imposed, replacing the old module-based structure.
In 2013, a significant redesign of the graphical user interface included a new toolbox and depiction of workflows.
In 2015, Orange 3.0 was released. Orange stores the data in NumPy arrays; machine learning algorithms mostly use scikit-learn.
In 2015, a text analysis add-on for Orange3 was released.
In 2016, Orange released version 3.3. Development scheduled a monthly cycle for stable releases.
In 2016, Orange began development and release of an Image Analytics add-on, with server-side deep neural networks for image embedding ^[9]
In 2017, a Spectroscopy add-on for the analysis of spectral data was introduced.^[10]
In 2017, Geo, an add-on for dealing with geo-location data and visualisation of geo maps was introduced ^[11]
In 2018, Orange began development and release of an add-on for single-cell data analysis.^[12]
In 2019, Orange separated its graphical interface for development as a separate project, orange-canvas-core^[13]
In 2020, Orange introduced the Explain add-on with widgets for explaining classification models and regression models, highlighting the strength and contributions specific features make towards predicting a specific class.
In 2022, World Happiness, an add-on for the Orange3 data mining suite, was introduced, providing widgets for accessing socioeconomic data from various databases such as World Happiness Report, World Development Indicators, OECD.
In 2022, Orange extended the Explain add-on with an Individual Conditional Expectation plot and the Permutation Feature Importance technique.
In 2023, Orange introduced the Fairness add-on, including widgets to calculate bias metrics, as well as widgets for pre-, post-, and in-processing methods, allowing the creation of models less susceptible to systematic error due to the vagaries of the data set.

Related Research Articles

JMP is a suite of computer programs for statistical analysis and machine learning developed by JMP, a subsidiary of SAS Institute. The program was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available for the Windows operating system.

Neural network software is used to simulate, research, develop, and apply artificial neural networks, software concepts adapted from biological neural networks, and in some cases, a wider array of adaptive systems such as artificial intelligence and machine learning.

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

SALOME is a multi-platform open source (LGPL-2.1-or-later) scientific computing environment, allowing the realization of industrial studies of physics simulations.

NeuroSolutions is a neural network development environment developed by NeuroDimension. It combines a modular, icon-based (component-based) network design interface with an implementation of advanced learning procedures, such as conjugate gradients, the Levenberg-Marquardt algorithm, and back-propagation through time. The software is used to design, train, and deploy artificial neural network models to perform a wide variety of tasks such as data mining, classification, function approximation, multivariate regression and time-series prediction.

BALL is a C++ class framework and set of algorithms and data structures for molecular modelling and computational structural bioinformatics, a Python interface to this library, and a graphical user interface to BALL, the molecule viewer BALLView.

ParaView is an open-source multiple-platform application for interactive, scientific visualization. It has a client–server architecture to facilitate remote visualization of datasets, and generates level of detail (LOD) models to maintain interactive frame rates for large datasets. It is an application built on top of the Visualization Toolkit (VTK) libraries. ParaView is an application designed for data parallelism on shared-memory or distributed-memory multicomputers and clusters. It can also be run as a single-computer application.

Shogun is a free, open-source machine learning software library written in C++. It offers numerous algorithms and data structures for machine learning problems. It offers interfaces for Octave, Python, R, Java, Lua, Ruby and C# using SWIG.

VisIt is an open-source, interactive parallel visualization, and graphical analysis tool designed for viewing scientific data. It can visualize scalar and vector fields on 2D and 3D structured and unstructured meshes.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.

Waffles is a collection of command-line tools for performing machine learning operations developed at Brigham Young University. These tools are written in C++, and are available under the GNU Lesser General Public License.

mlpy is a Python, open-source, machine learning library built on top of NumPy/SciPy, the GNU Scientific Library and it makes an extensive use of the Cython language. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3.

Pipeline Pilot is a desktop software application developed by Dassault Systèmes. Initially focused on extract, transform, and load (ETL) processes and data analytics, the software has evolved to offer broader capabilities in various scientific and industrial applications.

LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University and both written in C++ though with a C API. LIBSVM implements the sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs), supporting classification and regression. LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm.

Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, JavaScript and REST.

The following outline is provided as an overview of, and topical guide to, machine learning:

Blaž Zupan, is a Slovenian computer scientist and university professor.

References

↑ "orange3/CHANGELOG.md at master . biolab/orange3 . GitHub". GitHub .
↑ "Release 3.38.1". 23 December 2024. Retrieved 26 December 2024.
↑ "Orange - License".
↑ "orange3/LICENSE at master . biolab/orange3 . GitHub". GitHub .
↑ Janez Demšar; Tomaž Curk; Aleš Erjavec; Črt Gorup; Tomaž Hočevar; Mitar Milutinovič; Martin Možina; Matija Polajnar; Marko Toplak; Anže Starič; Miha Stajdohar; Lan Umek; Lan Žagar; Jure Žbontar; Marinka Žitnik; Blaž Zupan (2013). "Orange: data mining toolbox in Python" (PDF). Journal of Machine Learning Research . 14 (1): 2349–2353.
↑ Toplak, M.; Birarda, G.; Read, S.; Sandt, C.; Rosendahl, S. M.; Vaccari, L.; Demšar, J.; Borondics, F. (2017). "Infrared Orange: Connecting Hyperspectral Data with Machine Learning". Synchrotron Radiation News. 30 (4): 40–45. Bibcode:2017SRNew..30...40T. doi:10.1080/08940886.2017.1338424. S2CID 125273654.
↑ Iomids (30 May 2024). "Checking AI for discrimination via GUI using the Orange Fairness Add-On". IOMIDS.
↑ Sanchez Del Rio, Manuel; Rebuffi, Luca (2017). "OASYS (Or Ange SYnchrotron Suite): An open-source graphical environment for x-ray virtual experiments". In Chubar, Oleg; Sawhney, Kawal (eds.). Advances in Computational Methods for X-Ray Optics IV. p. 28. doi:10.1117/12.2274263. ISBN 9781510612334. S2CID 117118973.
↑ Primož Godec; Matjaž Pančur; Nejc Ilenič; Andrej Čopar; Martin Stražar; Aleš Erjavec; Ajda Pretnar; Janez Demšar; Marko Toplak; Anže Starič; Lan Žagar; Jan Hartman; Hamilton Wang; Riccardo Bellazzi; Uroš Petrovič; Silvia Garagna; Maurizio Zuccotti; Dongsu Park; Gad Shaulsky; Blaž Zupan (2019). "Democratized image analytics by visual programming through integration of deep models and small-scale machine learning". Nature Communications . 10 (1): 4551. Bibcode:2019NatCo..10.4551G. doi:10.1038/s41467-019-12397-x. PMC 6779910 . PMID 31591416. S2CID 203782491.
↑ Marko Toplak; Stuart T. Read; Christophe Sandt; Ferenc Borondics (2021). "Quasar: Easy Machine Learning for Biospectroscopy". Cells. 10 (9): 2300. doi: 10.3390/cells10092300 . PMC 8466383 . PMID 34571947.
↑ "Orange3-Geo Documentation — Orange3-Geo documentation".
↑ Martin Stražar; Lan Žagar; Jaka Kokošar; Vesna Tanko; Aleš Erjavec; Pavlin G. Poličar; Anže Starič; Janez Demšar; Gad Shaulsky; Vilas Menon; Andrew Lemire; Anup Parikh; Blaž Zupan (2021). "scOrange—a tool for hands-on training of concepts from single-cell data analytics". Bioinformatics. 35 (14): i4 –i12. doi:10.1093/bioinformatics/btz348. PMC 6612816 . PMID 31510695.
↑ "Orange Canvas Core". GitHub.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "orange3/CHANGELOG.md at master . biolab/orange3 . GitHub". GitHub .

[wikidata-9c22614070f381def49fe02f6f326e9b94240a95-v18-2] "Release 3.38.1". 23 December 2024. Retrieved 26 December 2024.

[3] "Orange - License".

[4] "orange3/LICENSE at master . biolab/orange3 . GitHub". GitHub .

[5] Janez Demšar; Tomaž Curk; Aleš Erjavec; Črt Gorup; Tomaž Hočevar; Mitar Milutinovič; Martin Možina; Matija Polajnar; Marko Toplak; Anže Starič; Miha Stajdohar; Lan Umek; Lan Žagar; Jure Žbontar; Marinka Žitnik; Blaž Zupan (2013). "Orange: data mining toolbox in Python" (PDF). Journal of Machine Learning Research . 14 (1): 2349–2353.

[6] Toplak, M.; Birarda, G.; Read, S.; Sandt, C.; Rosendahl, S. M.; Vaccari, L.; Demšar, J.; Borondics, F. (2017). "Infrared Orange: Connecting Hyperspectral Data with Machine Learning". Synchrotron Radiation News. 30 (4): 40–45. Bibcode:2017SRNew..30...40T. doi:10.1080/08940886.2017.1338424. S2CID 125273654.

[7] Iomids (30 May 2024). "Checking AI for discrimination via GUI using the Orange Fairness Add-On". IOMIDS.

[8] Sanchez Del Rio, Manuel; Rebuffi, Luca (2017). "OASYS (Or Ange SYnchrotron Suite): An open-source graphical environment for x-ray virtual experiments". In Chubar, Oleg; Sawhney, Kawal (eds.). Advances in Computational Methods for X-Ray Optics IV. p. 28. doi:10.1117/12.2274263. ISBN 9781510612334. S2CID 117118973.

[9] Primož Godec; Matjaž Pančur; Nejc Ilenič; Andrej Čopar; Martin Stražar; Aleš Erjavec; Ajda Pretnar; Janez Demšar; Marko Toplak; Anže Starič; Lan Žagar; Jan Hartman; Hamilton Wang; Riccardo Bellazzi; Uroš Petrovič; Silvia Garagna; Maurizio Zuccotti; Dongsu Park; Gad Shaulsky; Blaž Zupan (2019). "Democratized image analytics by visual programming through integration of deep models and small-scale machine learning". Nature Communications . 10 (1): 4551. Bibcode:2019NatCo..10.4551G. doi:10.1038/s41467-019-12397-x. PMC 6779910 . PMID 31591416. S2CID 203782491.

[10] Marko Toplak; Stuart T. Read; Christophe Sandt; Ferenc Borondics (2021). "Quasar: Easy Machine Learning for Biospectroscopy". Cells. 10 (9): 2300. doi: 10.3390/cells10092300 . PMC 8466383 . PMID 34571947.

[11] "Orange3-Geo Documentation — Orange3-Geo documentation".

[12] Martin Stražar; Lan Žagar; Jaka Kokošar; Vesna Tanko; Aleš Erjavec; Pavlin G. Poličar; Anže Starič; Janez Demšar; Gad Shaulsky; Vilas Menon; Andrew Lemire; Anup Parikh; Blaž Zupan (2021). "scOrange—a tool for hands-on training of concepts from single-cell data analytics". Bioinformatics. 35 (14): i4 –i12. doi:10.1093/bioinformatics/btz348. PMC 6612816 . PMID 31510695.

[13] "Orange Canvas Core". GitHub.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]