| TabPFN | |
|---|---|
| Developers | Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir [1] |
| Initial release | September 16, 2023 [2] [3] |
| Written in | Python [3] |
| Operating system | Linux, macOS, Microsoft Windows [3] |
| Type | Machine learning |
| License | Apache License 2.0 |
| Website | github |
TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture. [1] It is intended for supervised classification and regression analysis on small- to medium-sized datasets, e.g., up to 10,000 samples. [1]
TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023. [2] TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors. [1] The source code is published on GitHub under a modified Apache License and on PyPi. [4] Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks. [5]
Prior Labs, founded in 2024, aims to commercialize TabPFN. [6]
TabPFN supports classification, regression and generative tasks. [1] It leverages "Prior-Data Fitted Networks" [7] models to model tabular data. [1] By using a transformer pre-trained on synthetic tabular datasets, [2] [5] TabPFN avoids benchmark contamination and costs of curating real-world data. [2]
TabPFN v2 was pre-trained on approximately 130 million such datasets. [1] Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise. [1] Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. [1] During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass. [1] The new dataset is then processed in a single forward pass without retraining. [2] The model's transformer encoder processes features and labels by alternating attention across rows and columns. [8] TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation. [1]
Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization. [8]
TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics, [9] insurance risk classification, [10] and metagenomics. [11]
TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems. [5] Further, its performance is limited in high-dimensional and large-scale datasets. [12]