Orange (software)

Last updated
Orange
Developer(s) University of Ljubljana
Initial release10 October 1996;27 years ago (1996-10-10) [1]
Stable release
3.37.0 [2] / 27 May 2024;9 days ago (27 May 2024)
Repository Orange Repository
Written in Python, Cython, C++, C
Operating system Cross-platform
Type Machine learning, Data mining, Data visualization, Data analysis
License GPLv3 or later [3] [4]
Website orangedatamining.com OOjs UI icon edit-ltr-progressive.svg
A typical workflow in Orange 3. Typical-workflow.png
A typical workflow in Orange 3.

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

Contents

Classification Tree widget in Orange 3. Dendrogram-new.png
Classification Tree widget in Orange 3.

Description

Orange is a component-based visual programming software package for data visualization, machine learning, data mining, and data analysis.

Orange components are called widgets. They range from simple data visualization, subset selection, and preprocessing to empirical evaluation of learning algorithms and predictive modeling.

Visual programming is implemented through an interface in which workflows are created by linking predefined or user-designed widgets, while advanced users can use Orange as a Python library for data manipulation and widget alteration. [5]

Software

Orange is an open-source software package released under GPL and hosted on GitHub. Versions up to 3.0 include core components in C++ with wrappers in Python. From version 3.0 onwards, Orange uses common Python open-source libraries for scientific computing, such as numpy, scipy and scikit-learn, while its graphical user interface operates within the cross-platform Qt framework.

The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets (data, transform, visualize, model, evaluate and unsupervised). Additional functionalities are available as add-ons (text-mining, image analytics, bioinformatics, etc.).

Orange is supported on macOS, Windows and Linux and can also be installed from the Python Package Index repository (pip install Orange3).

Features

Orange consists of a canvas interface onto which the user places widgets and creates a data analysis workflow. Widgets offer basic functionalities such as reading the data, showing a data table, selecting features, training predictors, comparing learning algorithms, visualizing data elements, etc. The user can interactively explore visualizations or feed the selected subset into other widgets.

A decorated dendrogram in Orange 3. Dendrogram-decorated.png
A decorated dendrogram in Orange 3.


Add-ons

Orange users can extend their core set of components with components in the add-ons. Supported add-ons include:

Objectives

The program provides a platform for experiment selection, recommendation systems, and predictive modelling and is used in biomedicine, bioinformatics, genomic research, and teaching. In science, it is used as a platform for testing new machine learning algorithms and for implementing new techniques in genetics and bioinformatics. In education, it was used for teaching machine learning and data mining methods to students of biology, biomedicine, and informatics.

Extensions

Various projects build on Orange either by extending the core components with add-ons or using only the Orange Canvas to exploit the implemented visual programming features and GUI.

History

In 1996, the University of Ljubljana and Jožef Stefan Institute started development of ML*, a machine learning framework in C++, and Python bindings were developed for this framework in 1997, which, together with emerging Python modules, formed a joint framework called Orange. Over the following years, most contemporary major algorithms for data mining and machine learning were implemented in C++ (Orange's core) or Python modules.

Related Research Articles

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

JMP is a suite of computer programs for statistical analysis and machine learning developed by JMP, a subsidiary of SAS Institute. The program was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available for the Windows operating system.

Neural network software is used to simulate, research, develop, and apply artificial neural networks, software concepts adapted from biological neural networks, and in some cases, a wider array of adaptive systems such as artificial intelligence and machine learning.

<span class="mw-page-title-main">Weka (software)</span> Suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka) is a collection of machine learning and data analysis free software licensed under the GNU General Public License. It was developed at the University of Waikato, New Zealand and is the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

SALOME is a multi-platform open source (LGPL-2.1-or-later) scientific computing environment, allowing the realization of industrial studies of physics simulations.

<span class="mw-page-title-main">NeuroSolutions</span> Neural network development environment

NeuroSolutions is a neural network development environment developed by NeuroDimension. It combines a modular, icon-based (component-based) network design interface with an implementation of advanced learning procedures, such as conjugate gradients, the Levenberg-Marquardt algorithm, and back-propagation through time. The software is used to design, train, and deploy artificial neural network models to perform a wide variety of tasks such as data mining, classification, function approximation, multivariate regression and time-series prediction.

<span class="mw-page-title-main">BALL</span>

BALL is a C++ class framework and set of algorithms and data structures for molecular modelling and computational structural bioinformatics, a Python interface to this library, and a graphical user interface to BALL, the molecule viewer BALLView.

<span class="mw-page-title-main">ParaView</span> Scientific visualization software

ParaView is an open-source multiple-platform application for interactive, scientific visualization. It has a client–server architecture to facilitate remote visualization of datasets, and generates level of detail (LOD) models to maintain interactive frame rates for large datasets. It is an application built on top of the Visualization Toolkit (VTK) libraries. ParaView is an application designed for data parallelism on shared-memory or distributed-memory multicomputers and clusters. It can also be run as a single-computer application.

<span class="mw-page-title-main">Shogun (toolbox)</span>

Shogun is a free, open-source machine learning software library written in C++. It offers numerous algorithms and data structures for machine learning problems. It offers interfaces for Octave, Python, R, Java, Lua, Ruby and C# using SWIG.

<span class="mw-page-title-main">VisIt</span>

VisIt is an open-source interactive parallel visualization and graphical analysis tool designed for viewing scientific data. It can visualize scalar and vector fields on 2D and 3D structured and unstructured meshes. VisIt was created to handle data sets ranging from terascale sizes to smaller kilobyte ranges.

KNIME (/naɪm/), the Konstanz Information Miner, is a global computer software company, originally founded in Konstanz (Germany), now headquartered in Zurich (Switzerland) with offices in Germany, the U.S. and Switzerland.

Waffles is a collection of command-line tools for performing machine learning operations developed at Brigham Young University. These tools are written in C++, and are available under the GNU Lesser General Public License.

mlpy is a Python, open-source, machine learning library built on top of NumPy/SciPy, the GNU Scientific Library and it makes an extensive use of the Cython language. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3.

Pipeline Pilot is a desktop software program sold by Dassault Systèmes for processing and analyzing data. It was originally used for its basic ETL and analytics capabilities, which have broadened over time.

LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University and both written in C++ though with a C API. LIBSVM implements the sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs), supporting classification and regression. LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm.

The following outline is provided as an overview of and topical guide to machine learning:

<span class="mw-page-title-main">Blaž Zupan</span> Slovenian computer scientist

Blaž Zupan, is a Slovenian computer scientist and university professor.

References

  1. "orange3/CHANGELOG.md at master . biolab/orange3 . GitHub". GitHub .
  2. "Release 3.37.0". 27 May 2024. Retrieved 2 June 2024.
  3. "Orange - License".
  4. "orange3/LICENSE at master . biolab/orange3 . GitHub". GitHub .
  5. Janez Demšar; Tomaž Curk; Aleš Erjavec; Črt Gorup; Tomaž Hočevar; Mitar Milutinovič; Martin Možina; Matija Polajnar; Marko Toplak; Anže Starič; Miha Stajdohar; Lan Umek; Lan Žagar; Jure Žbontar; Marinka Žitnik; Blaž Zupan (2013). "Orange: data mining toolbox in Python" (PDF). Journal of Machine Learning Research . 14 (1): 2349–2353.
  6. Toplak, M.; Birarda, G.; Read, S.; Sandt, C.; Rosendahl, S. M.; Vaccari, L.; Demšar, J.; Borondics, F. (2017). "Infrared Orange: Connecting Hyperspectral Data with Machine Learning". Synchrotron Radiation News. 30 (4): 40–45. Bibcode:2017SRNew..30...40T. doi:10.1080/08940886.2017.1338424. S2CID   125273654.
  7. Iomids. "Checking AI for discrimination via GUI using the Orange Fairness Add-On". IOMIDS.
  8. Sanchez Del Rio, Manuel; Rebuffi, Luca (2017). "OASYS (Or Ange SYnchrotron Suite): An open-source graphical environment for x-ray virtual experiments". In Chubar, Oleg; Sawhney, Kawal (eds.). Advances in Computational Methods for X-Ray Optics IV. p. 28. doi:10.1117/12.2274263. ISBN   9781510612334. S2CID   117118973.
  9. Primož Godec; Matjaž Pančur; Nejc Ilenič; Andrej Čopar; Martin Stražar; Aleš Erjavec; Ajda Pretnar; Janez Demšar; Marko Toplak; Anže Starič; Lan Žagar; Jan Hartman; Hamilton Wang; Riccardo Bellazzi; Uroš Petrovič; Silvia Garagna; Maurizio Zuccotti; Dongsu Park; Gad Shaulsky; Blaž Zupan (2019). "Democratized image analytics by visual programming through integration of deep models and small-scale machine learning". Nature Communications . 10 (1): 4551. Bibcode:2019NatCo..10.4551G. doi:10.1038/s41467-019-12397-x. PMC   6779910 . PMID   31591416. S2CID   203782491.
  10. Marko Toplak; Stuart T. Read; Christophe Sandt; Ferenc Borondics (2021). "Quasar: Easy Machine Learning for Biospectroscopy". Cells. 10 (9): 2300. doi: 10.3390/cells10092300 . PMC   8466383 . PMID   34571947.
  11. "Orange3-Geo Documentation — Orange3-Geo documentation".
  12. Martin Stražar; Lan Žagar; Jaka Kokošar; Vesna Tanko; Aleš Erjavec; Pavlin G. Poličar; Anže Starič; Janez Demšar; Gad Shaulsky; Vilas Menon; Andrew Lemire; Anup Parikh; Blaž Zupan (2021). "scOrange—a tool for hands-on training of concepts from single-cell data analytics". Bioinformatics. 35 (14): i4–i12. doi:10.1093/bioinformatics/btz348. PMC   6612816 . PMID   31510695.
  13. "Orange Canvas Core". GitHub.

Further reading