CatBoost

CatBoost
CatBoost
Original author(s)	Andrey Gulin: / Yandex
Developer(s)	Yandex and CatBoost Contributors
Initial release	July 18, 2017;8 years ago
Stable release	1.2.3 / February 23, 2024;18 months ago
Written in	Python, R, C++, Java
Operating system	Linux, macOS, Windows
Type	Machine learning
License	Apache License 2.0
Website	catboost.ai

Last updated September 16, 2025

CatBoost^[6] is an open-source software library developed by Yandex. It provides a gradient boosting framework which, among other features, attempts to solve for categorical features using a permutation-driven alternative to the classical algorithm.^[7] It works on Linux, Windows, macOS, and is available in Python,^[8] R,^[9] and models built using CatBoost can be used for predictions in C++, Java,^[10] C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.^[6]

Features

CatBoost has gained popularity compared to other gradient boosting algorithms primarily due to the following features^[15]

Native handling for categorical features^[16]
Fast GPU training^[17]
Visualizations and tools for model and feature analysis
Using oblivious trees or symmetric trees for faster execution
Ordered boosting to overcome overfitting^[7]

History

In 2009 Andrey Gulin developed MatrixNet, a proprietary gradient boosting library that was used in Yandex to rank search results. Since 2009 MatrixNet has been used in different projects at Yandex, including recommendation systems and weather prediction.

In 2014–2015 Andrey Gulin worked with a team of researchers to start a new project called Tensornet which was aimed at solving the problem of "how to work with categorical data". Their work resulted in several proprietary Gradient Boosting libraries with different approaches to handling categorical data.

In 2016 the Machine Learning Infrastructure team led by Anna Dorogush started working on Gradient Boosting in Yandex, including Matrixnet and Tensornet. They implemented and open-sourced the next version of Gradient Boosting library called CatBoost, which has support for categorical and text data, GPU training, model analysis, and visualization tools.

CatBoost was open-sourced in July 2017 and is under active development in Yandex and the open-source community.

Application

JetBrains uses CatBoost for code completion^[18]
Cloudflare uses CatBoost for bot detection^[19]
Careem uses CatBoost to predict future destinations of the rides^[20]

References

↑ "Andrey Gulin - People - Research at Yandex". research.yandex.com.
↑ "catboost/catboost". GitHub.
↑ "Yandex open sources CatBoost, a gradient boosting machine learning library". TechCrunch. 18 July 2017. Retrieved 2020-08-30.
↑ Yegulalp, Serdar (2017-07-18). "Yandex open sources CatBoost machine learning library". InfoWorld. Retrieved 2020-08-30.
↑ "Releases · catboost/catboost". GitHub. Retrieved 2024-03-14.
1 2 "catboost/catboost". August 30, 2020 – via GitHub.
1 2 Prokhorenkova, Liudmila; Gusev, Gleb; Vorobev, Aleksandr; Dorogush, Anna Veronika; Gulin, Andrey (2019-01-20). "CatBoost: unbiased boosting with categorical features". arXiv: 1706.09516 [cs.LG].
↑ "Python Package Index PYPI: catboost" . Retrieved 2020-08-20.
↑ "Conda force package catboost-r" . Retrieved 2020-08-30.
↑ "Maven Repository: ai.catboost » catboost-prediction". mvnrepository.com. Retrieved 2020-08-30.
↑ staff, InfoWorld (27 September 2017). "Bossie Awards 2017: The best machine learning tools". InfoWorld.
↑ "State of Data Science and Machine Learning 2020".
↑ "State of Data Science and Machine Learning 2021".
↑ "PyPI Stats catboost". PyPI Stats.
↑ Joseph, Manu (2020-02-29). "The Gradient Boosters V: CatBoost". Deep & Shallow. Retrieved 2020-08-30.
↑ Dorogush, Anna Veronika; Ershov, Vasily; Gulin, Andrey (2018-10-24). "CatBoost: gradient boosting with categorical features support". arXiv: 1810.11363 [cs.LG].
↑ "CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs". NVIDIA Developer Blog. 2018-12-13. Retrieved 2020-08-30.
↑ "Code Completion, Episode 4: Model Training". JetBrains Developer Blog. 2021-08-20.
↑ "Stop the Bots: Practical Lessons in Machine Learning". The Cloudflare Blog. 2019-02-20.
↑ "How Careem's Destination Prediction Service speeds up your ride". Careem. 2019-02-19.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Andrey Gulin - People - Research at Yandex". research.yandex.com.

[catboost-authors-2] "catboost/catboost". GitHub.

[catboost-launch-techcrunch-3] "Yandex open sources CatBoost, a gradient boosting machine learning library". TechCrunch. 18 July 2017. Retrieved 2020-08-30.

[catboost-launch-infoworld-4] Yegulalp, Serdar (2017-07-18). "Yandex open sources CatBoost machine learning library". InfoWorld. Retrieved 2020-08-30.

[catboost-latest-release-5] "Releases · catboost/catboost". GitHub. Retrieved 2024-03-14.

[source-code-6] 1 2 "catboost/catboost". August 30, 2020 – via GitHub.

[catboost-categorical-handling-7] 1 2 Prokhorenkova, Liudmila; Gusev, Gleb; Vorobev, Aleksandr; Dorogush, Anna Veronika; Gulin, Andrey (2019-01-20). "CatBoost: unbiased boosting with categorical features". arXiv: 1706.09516 [cs.LG].

[catboost-python-8] "Python Package Index PYPI: catboost" . Retrieved 2020-08-20.

[catboost-r-conda-9] "Conda force package catboost-r" . Retrieved 2020-08-30.

[catboost-java-10] "Maven Repository: ai.catboost » catboost-prediction". mvnrepository.com. Retrieved 2020-08-30.

[11] staff, InfoWorld (27 September 2017). "Bossie Awards 2017: The best machine learning tools". InfoWorld.

[12] "State of Data Science and Machine Learning 2020".

[13] "State of Data Science and Machine Learning 2021".

[14] "PyPI Stats catboost". PyPI Stats.

[15] Joseph, Manu (2020-02-29). "The Gradient Boosters V: CatBoost". Deep & Shallow. Retrieved 2020-08-30.

[16] Dorogush, Anna Veronika; Ershov, Vasily; Gulin, Andrey (2018-10-24). "CatBoost: gradient boosting with categorical features support". arXiv: 1810.11363 [cs.LG].

[17] "CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs". NVIDIA Developer Blog. 2018-12-13. Retrieved 2020-08-30.

[18] "Code Completion, Episode 4: Model Training". JetBrains Developer Blog. 2021-08-20.

[19] "Stop the Bots: Practical Lessons in Machine Learning". The Cloudflare Blog. 2019-02-20.

[20] "How Careem's Destination Prediction Service speeds up your ride". Careem. 2019-02-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]