Portable Format for Analytics

Portable Format for Analytics
Developed by	Jim Pivarski; Data Mining Group
Latest release	0.8.1; November 10, 2015;7 years ago
Type of format	Predictive modelling
Extended from	JSON
Website	dmg.org/pfa/

Last updated February 13, 2023

The Portable Format for Analytics (PFA) is a JSON-based predictive model interchange format conceived and developed by Jim Pivarski.^{[ citation needed ]} PFA provides a way for analytic applications to describe and exchange predictive models produced by analytics and machine learning algorithms. It supports common models such as logistic regression and decision trees. Version 0.8 was published in 2015. Subsequent versions have been developed by the Data Mining Group.^[1]

Release history

Version	Release date
Version 0.8.1	November 2015

Data Mining Group

The Data Mining Group is a consortium managed by the Center for Computational Science Research, Inc., a nonprofit founded in 2008.^[3]

Examples

reverse array:

 # reverse input array of doubles  input: {"type": "array", "items": "double"}  output: {"type": "array", "items": "double"}  action:    - let: { x : input}    - let: { z : input}    - let: { l : {a.len: [x]}}    - let: { i : l}    - while : { ">=" : [i,0]}      do:        - set : {z : {attr: z, path : [i] , to: {attr : x ,path : [ {"-":[{"-" : [l ,i]},1]}]  } } }        - set : {i : {-:[i,1]}}    - z

Bubblesort

 input: {"type": "array", "items": "double"}  output: {"type": "array", "items": "double"}  action:    - let: { A : input}    - let: { N : {a.len: [A]}}    - let: { n : {-:[N,1]}}    - let: { i : 0}    - let: { s : 0.0}    - while : { ">=" : [n,0]}      do :        - set : { i : 0 }        - while : { "<=" : [i,{-:[n,1]}]}          do :            - if: {">": [ {attr: A, path : [i]} , {attr: A, path:[{+:[i,1]}]} ]}              then :                 - set : {s : {attr: A, path: [i]}}                - set : {A : {attr: A, path: [i], to: {attr: A, path:[{+:[i,1]}]} } }                - set : {A : {attr: A, path: [{+:[i,1]}], to: s }}            - set : {i : {+:[i,1]}}        - set : {n : {-:[n,1]}}                     - A

Implementations

Hadrian (Java/Scala/JVM) - Hadrian is a complete implementation of PFA in Scala, which can be accessed through any JVM language, principally Java. It focuses on model deployment, so it is flexible (can run in restricted environments) and fast. ^[4]
Titus (Python 2.x) - Titus is a complete, independent implementation of PFA in pure Python. It focuses on model development, so it includes model producers and PFA manipulation tools in addition to runtime execution. Currently, it works for Python 2. ^[4]
Titus 2 (Python 3.x) - Titus 2 is a fork of Titus which supports PFA implementation for Python 3. ^[5]
Aurelius (R) - Aurelius is a toolkit for generating PFA in the R programming language. It focuses on porting models to PFA from their R equivalents. To validate or execute scoring engines, Aurelius sends them to Titus through rPython (so both must be installed).^[4]
Antinous (Model development in Jython) - Antinous is a model-producer plugin for Hadrian that allows Jython code to be executed anywhere a PFA scoring engine would go. It also has a library of model producing algorithms.^[4]

Related Research Articles

In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.

Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation as distinct from the use of map and filter functions.

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

PFA or Pfa may refer to:

Java Pathfinder (JPF) is a system to verify executable Java bytecode programs. JPF was developed at the NASA Ames Research Center and open sourced in 2005. The acronym JPF is not to be confused with the unrelated Java Plugin Framework project.

Answer set programming (ASP) is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers—programs for generating stable models—are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

<span class="mw-page-title-main">Orange (software)</span>

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative qualitative data analysis and interactive data visualization.

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and other feedforward neural networks. Version 0.9 was published in 1998. Subsequent versions have been developed by the Data Mining Group.

Neural network software is used to simulate, research, develop, and apply artificial neural networks, software concepts adapted from biological neural networks, and in some cases, a wider array of adaptive systems such as artificial intelligence and machine learning.

Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such. For clarity, attributes should more correctly be considered metadata. An attribute is frequently and generally a property of a property. However, in actual usage, the term attribute can and is often treated as equivalent to a property depending on the technology being discussed. An attribute of an object usually consists of a name and a value; of an element, a type or class name; of a file, a name and extension.

Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data mining and data analysis algorithms for classification, prediction, regression, associations, feature selection, anomaly detection, feature extraction, and specialized analytics. It provides means for the creation, management and operational deployment of data mining models inside the database environment.

D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It makes use of Scalable Vector Graphics (SVG), HTML5, and Cascading Style Sheets (CSS) standards. It is the successor to the earlier Protovis framework. Its development was noted in 2011, as version 2.0.0 was released in August 2011. With the release of version 4.0.0 in June 2016, D3 was changed from a single library into a collection of smaller, modular libraries that can be used independently.

In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, see List of JVM languages.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Robert Lee Grossman is an American computer scientist and bioinformatician at the University of Chicago. His primary research interests are data science and data-intensive computing.

The syntax of the Ruby programming language is broadly similar to that of Perl and Python. Class and method definitions are signaled by keywords, whereas code blocks can be defined by either keywords or braces. In contrast to Perl, variables are not obligatorily prefixed with a sigil. When used, the sigil changes the semantics of scope of the variable. For practical purposes there is no distinction between expressions and statements. Line breaks are significant and taken as the end of a statement; a semicolon may be equivalently used. Unlike Python, indentation is not significant.

References

↑ "Data Mining Group" . Retrieved December 14, 2017. The DMG is proud to host the working groups that develop the Predictive Model Markup Language (PMML) and the Portable Format for Analytics (PFA), two complementary standards that simplify the deployment of analytic models.
↑ "Portable Format for Analytics: moving models to production" . Retrieved April 25, 2016.
↑ "2008 EO 990" . Retrieved 16 Oct 2014.
1 2 3 4 Implementations of the Portable Format for Analytics (PFA): opendatagroup/hadrian, Open Data Group, 2019-08-15, retrieved 2019-11-22
↑ Mahato, Ankit (2019-11-21), Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+: animator/titus2 , retrieved 2019-11-22

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Data Mining Group" . Retrieved December 14, 2017. The DMG is proud to host the working groups that develop the Predictive Model Markup Language (PMML) and the Portable Format for Analytics (PFA), two complementary standards that simplify the deployment of analytic models.

[2] "Portable Format for Analytics: moving models to production" . Retrieved April 25, 2016.

[3] "2008 EO 990" . Retrieved 16 Oct 2014.

[:0-4] 1 2 3 4 Implementations of the Portable Format for Analytics (PFA): opendatagroup/hadrian, Open Data Group, 2019-08-15, retrieved 2019-11-22

[5] Mahato, Ankit (2019-11-21), Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+: animator/titus2 , retrieved 2019-11-22

[1]

[2]

[3]

[4]

[5]


Developed by	Jim Pivarski Data Mining Group
Latest release	0.8.1 November 10, 2015;7 years ago (2015-11-10)
Type of format	Predictive modelling
Extended from	JSON
Website	dmg.org/pfa/