Perl Data Language

Last updated
Perl Data Language (PDL)
Paradigm Array
Developer Karl Glazebrook, Jarle Brinchmann, Tuomas Lukka, and Christian Soeller
First appeared1996 (1996)
Stable release
2.080 [1]   OOjs UI icon edit-ltr-progressive.svg / 28 May 2022;7 months ago (28 May 2022)
OS Cross-platform
License GNU General Public License, Artistic License
Website pdl.perl.org
Influenced by
APL, IDL, Perl

Perl Data Language (abbreviated PDL) is a set of free software array programming extensions to the Perl programming language. PDL extends the data structures built into Perl, to include large multidimensional arrays, and adds functionality to manipulate those arrays as vector objects. It also provides tools for image processing, machine learning, computer modeling of physical systems, and graphical plotting and presentation. Simple operations are automatically vectorized across complete arrays, and higher-dimensional operations (such as matrix multiplication) are supported.

Contents

Language design

PDL is a vectorized array programming language: the expression syntax is a variation on standard mathematical vector notation, so that the user can combine and operate on large arrays with simple expressions. In this respect, PDL follows in the footsteps of the APL programming language, and it has been compared to commercial languages such as MATLAB and Interactive Data Language, and to other free languages such as NumPy and Octave. [2] Unlike MATLAB and IDL, PDL allows great flexibility in indexing and vectorization: for example, if a subroutine normally operates on a 2-D matrix array, passing it a 3-D data cube will generally cause the same operation to happen to each 2-D layer of the cube. [3]

PDL borrows from Perl at least three basic types of program structure: imperative programming, functional programming, and pipeline programming forms may be combined. Subroutines may be loaded either via a built-in autoload mechanism or via the usual Perl module mechanism.

Graphics

A plot generated using PDL Pdl-plot.png
A plot generated using PDL

True to the glue language roots of Perl, PDL borrows from several different modules for graphics and plotting support. NetPBM provides image file I/O (though FITS is supported natively). Gnuplot, PLplot, PGPLOT, and Prima modules are supported for 2-D graphics and plotting applications, and Gnuplot and OpenGL are supported for 3-D plotting and rendering.

I/O

PDL provides facilities to read and write many open data formats, including JPEG, PNG, GIF, PPM, MPEG, FITS, NetCDF, GRIB, raw binary files, and delimited ASCII tables. PDL programmers can use the CPAN Perl I/O libraries to read and write data in hundreds of standard and niche file formats.

Machine learning

PDL can be used for machine learning. It includes modules that are used to perform classic k-means clustering or general and generalized linear modeling methods such as ANOVA, linear regression, PCA, and logistic regression. Examples of PDL usage for regression modelling tasks include evaluating association between education attainment and ancestry differences of parents, [4] comparison of RNA-protein interaction profiles that needs regression-based normalization [5] and analysis of spectra of galaxies. [6]

perldl

An installation of PDL usually comes with an interactive shell known as perldl, which can be used to perform simple calculations without requiring the user to create a Perl program file. A typical session of perldl would look something like the following:

perldl>$x=pdl[[1,2],[3,4]];perldl>$y=pdl[[5,6,7],[8,9,0]];perldl>$z=$xx$y;perldl>p$z;[[21247][475421]]

The commands used in the shell are Perl statements that can be used in a program with PDL module included. x is an overloaded operator for matrix multiplication, and p in the last command is a shortcut for print.

Implementation

The core of PDL is written in C. Most of the functionality is written in PP, a PDL-specific metalanguage that handles the vectorization of simple C snippets and interfaces them with the Perl host language via Perl's XS compiler. Some modules are written in Fortran, with a C/PP interface layer. Many of the supplied functions are written in PDL itself. PP is available to the user to write C-language extensions to PDL. There is also an Inline module (Inline::Pdlpp) that allows PP function definitions to be inserted directly into a Perl script; the relevant code is low-level compiled and made available as a Perl subroutine.

The PDL API uses the basic Perl 5 object-oriented functionality: PDL defines a new type of Perl scalar object (eponymously called a "PDL", or "ndarray") that acts as a Perl scalar, but that contains a conventional typed array of numeric or character values. All of the standard Perl operators are overloaded so that they can be used on PDL objects transparently, and PDLs can be mixed-and-matched with normal Perl scalars. Several hundred object methods for operating on PDLs are supplied by the core modules.

See also

Related Research Articles

gnuplot Command-line and GUI plotting program

gnuplot is a command-line and GUI program that can generate two- and three-dimensional plots of functions, data, and data fits. The program runs on all major computers and operating systems . Originally released in 1986, its listed authors are Thomas Williams, Colin Kelley, Russell Lang, Dave Kotz, John Campbell, Gershon Elber, Alexander Woo "and many others." Despite its name, this software is not part of the GNU Project.

<span class="mw-page-title-main">R (programming language)</span> Programming language for statistics

R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software. Users have created packages to augment the functions of the R language.

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

IDL, short for Interactive Data Language, is a programming language used for data analysis. It is popular in particular areas of science, such as astronomy, atmospheric physics and medical imaging. IDL shares a common syntax with PV-Wave and originated from the same codebase, though the languages have subsequently diverged in detail. There are also free or costless implementations, such as GNU Data Language (GDL) and Fawlty Language (FL).

In computer science, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings.

<span class="mw-page-title-main">LAPACK</span> Software library for numerical linear algebra

LAPACK is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines.

Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C and Fortran. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions.

<span class="mw-page-title-main">Raku (programming language)</span> Programming language derived from Perl

Raku is a member of the Perl family of programming languages. Formerly known as Perl 6, it was renamed in October 2019. Raku introduces elements of many modern and historical languages. Compatibility with Perl was not a goal, though a compatibility mode is part of the specification. The design process for Raku began in 2000.

<span class="mw-page-title-main">Matplotlib</span> Library for creating static, animated, and interactive visualizations in Python.

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK. There is also a procedural "pylab" interface based on a state machine, designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.

<span class="mw-page-title-main">Euler (software)</span>

Euler is a free and open-source numerical software package. It contains a matrix language, a graphical notebook style interface, and a plot window. Euler is designed for higher level math such as calculus, optimization, and statistics.

This is an overview of Fortran 95 language features. Included are the additional features of TR-15581:Enhanced Data Type Facilities, which have been universally implemented. Old features that have been superseded by new ones are not described – few of those historic features are used in modern programs although most have been retained in the language to maintain backward compatibility. The current standard is Fortran 2018; many of its new features are still being implemented in compilers. The additional features of Fortran 2003, Fortran 2008 and Fortran 2018 are described by Metcalf, Reid and Cohen.

The following tables provide a comparison of numerical-analysis software.

This comparison of programming languages (array) compares the features of array data structures or matrix processing for various computer programming languages.

<span class="mw-page-title-main">Speakeasy (computational environment)</span> Computer software environment with own programming language

Speakeasy was a numerical computing interactive environment also featuring an interpreted programming language. It was initially developed for internal use at the Physics Division of Argonne National Laboratory by the theoretical physicist Stanley Cohen. He eventually founded Speakeasy Computing Corporation to make the program available commercially.

The Perl virtual machine is a stack-based process virtual machine implemented as an opcodes interpreter which runs previously compiled programs written in the Perl language. The opcodes interpreter is a part of the Perl interpreter, which also contains a compiler in one executable file, commonly /usr/bin/perl on various Unix-like systems or perl.exe on Microsoft Windows systems.

The structure of the Perl programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "there's more than one way to do it". As a multi-paradigm, dynamically typed language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots, and is responsible for the size of the CPAN archive, a community-maintained repository of more than 100,000 modules.

References

  1. https://github.com/PDLPorters/pdl/releases/tag/2.080; publication date: 28 May 2022; retrieved: 6 June 2022.
  2. "Putting Perl Back on Top in the Fields of Scientific and Financial Computing".
  3. "PDL online documentation (PDL::Threading section)".
  4. Abdellaoui A, Hottenga JJ, Willemsen G, Bartels M, van Beijsterveldt T, Ehli EA, Davies GE, Brooks A, Sullivan PF, Penninx BW, de Geus EJ, Boomsma DI (Mar 2015). "Educational Attainment Influences Levels of Homozygosity through Migration and Assortative Mating". PLOS ONE. 10 (3): e0118935. Bibcode:2015PLoSO..1018935A. doi: 10.1371/journal.pone.0118935 . PMC   4347978 . PMID   25734509.
  5. Wang T, Xie Y, Xiao G (Jan 2014). "dCLIP: a computational approach for comparative CLIP-seq analyses". Genome Biology. 15 (1): R11. doi:10.1186/gb-2014-15-1-r11. PMC   4054096 . PMID   24398258.
  6. Sánchez SF, Pérez E, Sánchez-Blázquez P, González JJ, Rosález-Ortega FF, Cano-Dí az M, López-Cobá C, Marino RA, Gil de Paz A, Mollá M, López-Sánchez AR, Ascasibar Y, Barrera-Ballesteros J (April 2016). "Pipe3D, a pipeline to analyze Integral Field Spectroscopy Data: I. New fitting philosophy of FIT3D". Revista Mexicana de Astronomía y Astrofísica. 52: 21–53. arXiv: 1509.08552 . Bibcode:2016RMxAA..52...21S.