Data Analytics Library

Last updated
Data Analytics Library
Developer(s) Intel
Initial releaseAugust 25, 2015;9 years ago (2015-08-25)
Stable release
2021 Update 4 / 2021;3 years ago (2021) [1]
Repository
Written in C++, Java, Python [2]
Operating system Microsoft Windows, Linux, macOS [2]
Platform Intel Atom, Intel Core, Intel Xeon [2]
Type Library or framework
License Apache License 2.0 [3]
Website software.intel.com/content/www/us/en/develop/tools/data-analytics-acceleration-library.html

oneAPI Data Analytics Library (oneDAL; formerly Intel Data Analytics Acceleration Library or Intel DAAL), is a library of optimized algorithmic building blocks for data analysis stages most commonly associated with solving Big Data problems. [4] [5] [6] [7]

Contents

The library supports Intel processors and is available for Windows, Linux and macOS operating systems. [2] The library is designed for use popular data platforms including Hadoop, Spark, R, and MATLAB. [4] [8]

History

Intel launched the Intel Data Analytics Library(oneDAL) on December 8, 2020. It also launched the Data Analytics Acceleration Library on August 25, 2015 and called it Intel Data Analytics Acceleration Library 2016 (Intel DAAL 2016). [9] oneDAL is bundled with Intel oneAPI Base Toolkit as a commercial product. A standalone version is available commercially or freely, [3] [10] the only difference being support and maintenance related.

License

Apache License 2.0

Details

Functional categories

Intel DAAL has the following algorithms: [11] [4] [12]

Intel DAAL supported three processing modes:

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are N elements, this matrix will have size N×N. In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language, hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License. Commercial support is available from Driven, Inc.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

X-Video Motion Compensation (XvMC), is an extension of the X video extension (Xv) for the X Window System. The XvMC API allows video programs to offload portions of the video decoding process to the GPU video-hardware. In theory this process should also reduce bus bandwidth requirements. Currently, the supported portions to be offloaded by XvMC onto the GPU are motion compensation and inverse discrete cosine transform (iDCT) for MPEG-2 video. XvMC also supports offloading decoding of mo comp, iDCT, and VLD for not only MPEG-2 but also MPEG-4 ASP video on VIA Unichrome hardware.

VTune Profiler is a performance analysis tool for x86-based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but the advanced hardware-based sampling features require an Intel-manufactured CPU.

Intel Integrated Performance Primitives is an extensive library of ready-to-use, domain-specific functions that are highly optimized for diverse Intel architectures. Its royalty-free APIs help developers take advantage of Single Instruction, Multiple Data (SIMD) instructions.

Distance matrices are used in phylogeny as non-parametric distance methods and were originally applied to phenetic data using a matrix of pairwise distances. These distances are then reconciled to produce a tree. The distance matrix can come from a number of different sources, including measured distance or morphometric analysis, various pairwise distance formulae applied to discrete morphological characters, or genetic distance from sequence, restriction fragment, or allozyme data. For phylogenetic character data, raw distance values can be calculated by simply counting the number of pairwise differences in character states.

Video Acceleration API (VA-API) is an open source application programming interface that allows applications such as VLC media player or GStreamer to use hardware video acceleration capabilities, usually provided by the graphics processing unit (GPU). It is implemented by the free and open-source library libva, combined with a hardware-specific driver, usually provided together with the GPU driver.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

The following outline is provided as an overview of and topical guide to regression analysis:

Intel oneAPI Math Kernel Library, formerly known as Intel Math Kernel Library, is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math.

mlpack

mlpack is a free, open-source and header-only software library for machine learning and artificial intelligence written in C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack has also a light deployment infrastructure with minimum dependencies, making it perfect for embedded systems and low resource devices. Its intended target users are scientists and engineers.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

The following outline is provided as an overview of and topical guide to machine learning:

References

  1. "Intel® Data Analytics Acceleration Library Release Notes". software.intel.com.
  2. 1 2 3 4 Intel® Data Analytics Acceleration Library (Intel® DAAL) | Intel® Software
  3. 1 2 "Open Source Project: Intel Data Analytics Acceleration Library (DAAL)".
  4. 1 2 3 "DAAL github".
  5. "Intel Updates Developer Toolkit with Data Analytics Acceleration Library".
  6. "Intel adds big data functions to math libraries".
  7. "Intel Leverages HPC Core for Analytics Tooling Push". nextplatform.com. 2015-08-25.
  8. "Try Out Intel DAAL to Process Big Data".
  9. "Intel Data Analytics Acceleration Library".
  10. "Community Licensing of Intel Performance Libraries".
  11. Developer Guide for Intel(R) Data Analytics Acceleration Library 2020
  12. "Introduction to Intel DAAL, Part 1: Polynomial Regression with Batch Mode Computation".