Colt (libraries)

Last updated
Colt
Original author(s) NIST
Stable release
1.2.0 / September 9, 2004 (2004-09-09)
Operating system Cross-platform
Type Library
License CERN and LGPL
Website acs.lbl.gov/software/colt/

Colt is a set of open-source Libraries for High Performance Scientific and Technical Computing written in Java and developed at CERN. Colt was developed with a focus on High Energy Physics, but is applicable to many other problems. Colt was last updated in 2004 (when Java 1.4 was the current release) and its code base has been incorporated into the Parallel Colt code base, which has received more recent development.

Contents

Colt provides an infrastructure for scalable scientific and technical computing in Java. It is particularly useful in the domain of High Energy Physics at CERN. It contains, among others, efficient and usable data structures and algorithms for Off-line and On-line Data Analysis, Linear Algebra, Multi-dimensional arrays, Statistics, Histogramming, Monte Carlo Simulation, Parallel & Concurrent Programming. It summons some of the best concepts, designs and implementations thought up over time by the community, ports or improves them and introduces new approaches where need arises.

Capabilities

The following is an overview of Colt's capabilities, as listed on the project's website: [1]

FeatureDescription
Templated Lists and MapsDynamically resizing lists holding objects or primitive data types such as int, double, etc. Operations on primitive arrays, algorithms on Colt lists and JAL algorithms (see below) can freely be mixed at zero copy overhead. More details. Automatically growing and shrinking maps holding objects or primitive data types such as int, double, etc.
Templated Multi-dimensional matricesDense and sparse fixed sized (non-resizable) 1,2, 3 and d-dimensional matrices holding objects or primitive data types such as int, double, etc.; Also known as multi-dimensional arrays or Data Cubes.
Linear AlgebraStandard matrix operations and decompositions. LU, QR, Cholesky, Eigenvalue, Singular value.
HistogrammingCompact, extensible, modular and performant histogramming functionality. AIDA offers the histogramming features of HTL and HBOOK.
MathematicsTools for basic and advanced mathematics: Arithmetics and Algebra, Polynomials and Chebyshev series, Bessel and Airy functions, Constants and Units, Trigonometric functions, etc.
StatisticsTools for basic and advanced statistics: Estimators, Gamma functions, Beta functions, Probabilities, Special integrals, etc.
Random Numbers and Random SamplingStrong yet quick. Partly a port of CLHEP.
util.concurrentEfficient utility classes commonly encountered in parallel & concurrent programming.

Usage Example

Example of Singular Value Decomposition (SVD):

SingularValueDecompositions=newSingularValueDecomposition(matA);DoubleMatrix2DU=s.getU();DoubleMatrix2DS=s.getS();DoubleMatrix2DV=s.getV();

Example of matrix multiplication:

Algebraalg=newAlgebra();DoubleMatrix2Dresult=alg.mult(matA,matB);

Related Research Articles

Numerical analysis Field of mathematics

Numerical analysis is the study of algorithms that use numerical approximation for the problems of mathematical analysis. Numerical analysis naturally finds application in all fields of engineering and the physical sciences, but in the 21st century also the life sciences, social sciences, medicine, business and even the arts have adopted elements of scientific computations. The growth in computing power has revolutionized the use of realistic mathematical models in science and engineering, and subtle numerical analysis is required to implement these detailed models of the world. For example, ordinary differential equations appear in celestial mechanics ; numerical linear algebra is important for data analysis; stochastic differential equations and Markov chains are essential in simulating living cells for medicine and biology.

Principal component analysis Conversion of a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components

The principal components of a collection of points in a real p-space are a sequence of direction vectors, where the vector is the direction of a line that best fits the data while being orthogonal to the first vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.

Singular value decomposition Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any matrix via an extension of the polar decomposition.

LAPACK Software library for numerical linear algebra

LAPACK is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision.

ROOT

ROOT is an object-oriented program and library developed by CERN. It was originally designed for particle physics data analysis and contains several features specific to this field, but it is also used in other applications such as astronomy and data mining. The latest release is 6.22.00, as of 2020-07-02.

In numerical analysis, a multigrid method is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhibiting multiple scales of behavior. For example, many basic relaxation methods exhibit different rates of convergence for short- and long-wavelength components, suggesting these different scales be treated differently, as in a Fourier analysis approach to multigrid. MG methods can be used as solvers as well as preconditioners.

JAMA is a software library for performing numerical linear algebra tasks created at National Institute of Standards and Technology in 1998 similar in functionality to LAPACK.

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

CLHEP is a C++ library that provides utility classes for general numerical programming, vector arithmetic, geometry, pseudorandom number generation, and linear algebra, specifically targeted for high energy physics simulation and analysis software. The project is hosted by CERN and currently managed by a collaboration of researchers from CERN and other physics research laboratories and academic institutions. According to the project's website, CLHEP is in maintenance mode.

Computational particle physics refers to the methods and computing tools developed in and used by particle physics research. Like computational chemistry or computational biology, it is, for particle physics both a specific branch and an interdisciplinary field relying on computer science, theoretical and experimental particle physics and mathematics. The main fields of computational particle physics are: lattice field theory, automatic calculation of particle interaction or decay and event generators.

The following tables provide a comparison of linear algebra software libraries, either specialized or general purpose libraries with significant linear algebra coverage.

Ateji PX is an object-oriented programming language extension for Java. It is intended to facilliate parallel computing on multi-core processors, GPU, Grid and Cloud.

Matrix Toolkit Java (MTJ) is an open-source Java software library for performing numerical linear algebra. The library contains a full set of standard linear algebra operations for dense matrices based on BLAS and LAPACK code. Partial set of sparse operations is provided through the Templates project. The library can be configured to run as a pure Java library or use BLAS machine-optimized code through the Java Native Interface.

Parallel Colt is a set of multithreaded version of Colt. It is a collection of open-source libraries for High Performance Scientific and Technical Computing written in Java. It contains all the original capabilities of Colt and adds several new ones, with a focus on multi-threaded algorithms.

Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with big data by using high-performance statistical computation. The pbdR uses the same programming language as R with S3/S4 classes and methods which is used among statisticians and data miners for developing statistical software. The significant difference between pbdR and R code is that pbdR mainly focuses on distributed memory systems, where data are distributed across several processors and analyzed in a batch mode, while communications between processors are based on MPI that is easily used in large high-performance computing (HPC) systems. R system mainly focuses on single multi-core machines for data analysis via an interactive mode such as GUI interface.

oj! Algorithms or ojAlgo, is an open source Java library for mathematics, linear algebra and optimisation. It was first released in 2003 and is 100% pure Java source code and free from external dependencies. Its feature set make it particularly suitable for use within the financial domain.

Efficient Java Matrix Library (EJML) is a linear algebra library for manipulating real/complex/dense/sparse matrices. Its design goals are; 1) to be as computationally and memory efficient as possible for both small and large matrices, and 2) to be accessible to both novices and experts. These goals are accomplished by dynamically selecting the best algorithms to use at runtime, clean API, and multiple interfaces. EJML is free, written in 100% Java and has been released under an Apache v2.0 license.

jblas is a linear algebra library, created by Mikio Braun, for the Java programming language built upon BLAS and LAPACK. Unlike most other Java linear algebra libraries, jblas is designed to be used with native code through the Java Native Interface (JNI) and comes with precompiled binaries. When used on one of the targeted architectures, it will automatically select the correct binary to use and load it. This allows it to be used out of the box and avoid a potentially tedious compilation process. jblas provides an easier to use high level API on top of the archaic API provided by BLAS and LAPACK, removing much of the tediousness.

References

  1. "Colt Project Page". Colt. Retrieved June 15, 2013.