MediaPipe

MediaPipe
MediaPipe
Original authors	Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg and Matthias Grundmann
Initial release	June 2019;6 years ago
Stable release	0.10.26
Repository	github.com/google-ai-edge/mediapipe
Platform	Android, JavaScript web, Python, iOS
Type	Framework
License	Apache
Website	ai.google.dev/edge/mediapipe/solutions/guide

Last updated December 20, 2025

MediaPipe is an open source framework with many libraries developed by Google for several artificial intelligence and machine learning solutions. These solutions range from generative artificial intelligence,^[2]^[3] real-time computer vision,^[4]^[5]^[6] natural language processing ^[7] and audio^[8]^[9]^[10] techniques. These solutions can also be used on various platforms such as Android,^[11]^[12] JavaScript web,^[13] Python ^[14]^[15] and iOS,^[16] supporting edge devices.^[17]^[18]^[19]

History

Google has long used MediaPipe in its products and services. Since 2012, it has been used for real-time analysis of video and audio on YouTube. Over time MediaPipe has been incorporated into many more products such as Gmail, Google Home, etc.^[20]

MediaPipe's first stable release was version 0.5.0.^[21] It was made open source in June 2019 at the Conference on Computer Vision and Pattern Recognition in Long Beach, California, by Google Research. This initial release included only five pipelines examples: Object Detection, Face Detection, Hand Tracking, Multi-hand Tracking, and Hair Segmentation.^[22] From its initial release to April 2023, numerous pipelines have been made. In May 2025, MediaPipe Solutions was introduced. This transition offered more capabilities for on-device machine learning.^[23] MediaPipe is now under Google's subdivision, Google AI Edge.

Solutions

MediaPipe's available solutions are:

LLM Inference API
Object detection
Image classification
Image segmentation
Interactive segmentation
Hand landmark detection
Gesture Recognition
Image embedding
Face detection
Face landmark detection
Pose landmark detection
Image generation
Text classification
Text embedding
Language detector
Audio Classification

MediaPipe's legacy solutions are:

Face Detection
Face Mesh
Iris
Hands
Pose
Holistic
Selfie segmentation
Hair segmentation
Object detection
Box tracking
Instant motion tracking
Objectron
KNIFT
AutoFlip
MediaSequence
YouTube 8M

Programming Language

MediaPipe is primarily written in the programming language C++, although this is not the sole programing language used in its creation. The other notable programming languages used within its source code include Python, Starlark, and Java.^[21]

The ability for MediaPipe to separate itself into a system of components allows for customization. Pre-built solutions are also available and it may help to start with these and slightly optimize them for an ideal output.^[24]

How MediaPipe Works

MediaPipe contains a multitude of different components that all work together to create a general purpose computer vision framework. Each component works in its own unique way with different architectures.

Hand Tracking

MediaPipe includes a hand tracking system that has been designed to run efficiently on devices with limited computational resources. This works by estimating a set of 3D landmarks for each detected hand and is intended to remain stable across a wide range of environments including different poses, lightning conditions, and motions.^[25]

MediaPipe works off of a pre-trained deep learning model that is trained to detect the palm area on human hands, which is done through a detector model named BlazePalm.^[24]^[26] Starting with the identification of the palm, MediaPipe is able to use the positioning of the palm as an input to a second model that predicts the positions of key landmarks that will represent the hand's structure.^[25]

MediaPipe continuously monitors the confidence of its predictions and re-runs detection when needed to maintain its accuracy, while temporal smoothing helps reduce the jitter between frames. For scenes with more than one hand, the process is repeated independently for each detected region.^[25]^[24]

Human Pose Estimation

Another area that MediaPipe specializes in is recognizing changes in the human body specifically posture. Mediapipe can support the creation of body posture analysis systems. This can aid in many fields such as ergonomic industry, the arts, sports, and entertainment. ^[24]

References

↑ Hon Law, Wai. "Release MediaPipe v0.10.26 · google-ai-edge/mediapipe". GitHub. Retrieved 2025-11-21.
↑ Abstreiter, Maximilian; Tarkoma, Sasu; Morabito, Roberto (2025-03-12), Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge, arXiv: 2503.09114 , retrieved 2025-11-21
↑ Nezami, Zeinab; Hafeez, Maryam; Djemame, Karim; Zaidi, Syed Ali Raza (2024-11-18), Generative AI on the Edge: Architecture and Performance Evaluation, arXiv: 2411.17712 , retrieved 2025-11-21
↑ Luna-Jiménez, Cristina; Gil-Martín, Manuel; Kleinlein, Ricardo; San-Segundo, Rubén; Fernández-Martínez, Fernando (2023-10-09). "Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks". International Conference on Multimodal Interaction. ICMI '23. New York, NY, USA: Association for Computing Machinery. pp. 373–377. doi:10.1145/3577190.3614143. ISBN 979-8-4007-0055-2.
↑ Kim, Jong-Wook; Choi, Jin-Young; Ha, Eun-Ju; Choi, Jae-Ho (2023-02-20). "Human Pose Estimation Using MediaPipe Pose and Optimization Method Based on a Humanoid Model". Applied Sciences. 13 (4): 2700. Bibcode:2023ApSci..13.2700K. doi: 10.3390/app13042700 . ISSN 2076-3417.
↑ Thaman, B.; Cao, T.; Caporusso, N. (2022-05-23). Face Mask Detection using MediaPipe Facemesh . 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). Opatija, Croatia: IEEE. pp. 378–382. doi:10.23919/MIPRO55190.2022.9803531. ISBN 978-953-233-103-5.
↑ Kim, Heejin; Lee, Jeongha; Bahn, Hyokyung (2025-11-17). "Rethinking I/O Caching for Large Language Model Inference on Resource-Constrained Mobile Platforms". Mathematics. 13 (22): 3689. doi: 10.3390/math13223689 . ISSN 2227-7390.
↑ Meyer, David; Abecassis, Eitan; Fernandez-Labrador, Clara; Schroers, Christopher (2024). "RAST: A Reference-Audio Synchronization Tool for Dubbed Content". Interspeech 2024. pp. 67–71. doi:10.21437/Interspeech.2024-2203.
↑ Høgenhaug, Mads; Friis, Marcus; Pedersen, Morten; Rossi, Luca (2025-04-03), TikTok StitchGraph: Characterizing communication patterns on TikTok through a collection of interaction networks, arXiv: 2502.18661 , retrieved 2025-11-21
↑ Withanage Don, Daksitha Senel; Kiderle, Thomas; Mertes, Silvan; Schiller, Dominik; Ritschel, Hannes; André, Elisabeth (2025-03-24). "MeLaX: Conversations with Generative AI in Socially Interactive Agents". Companion Proceedings of the 30th International Conference on Intelligent User Interfaces. ACM. pp. 163–166. doi:10.1145/3708557.3716363. ISBN 979-8-4007-1409-2.
↑ Sreenath, Sreehari; Daniels, D. Ivan; Ganesh, Apparaju S. D.; Kuruganti, Yashaswi S.; Chittawadigi, Rajeevlochana G. (2021-09-30). "Monocular Tracking of Human Hand on a Smart Phone Camera using MediaPipe and its Application in Robotics". 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC). IEEE. pp. 1–6. doi:10.1109/R10-HTC53172.2021.9641542. ISBN 978-1-6654-3240-5.
↑ Nunes, João; Nascimento, Thamer Horbylon; Felix, Juliana; Soares, Fabrizzio (2025-07-08). "Real-Time Hand Gesture Recognition for Touchless Video Control Using MediaPipe and Random Forest". 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE. pp. 1776–1781. doi:10.1109/COMPSAC65507.2025.00242. ISBN 979-8-3315-7434-5.
↑ Patel, Meenu; Rao, Saksham; Chauhan, Shweta; Kumar, Bibek (2024-12-16). "Real-time Hand Gesture Recognition Using Python and Web Application". 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N). IEEE. pp. 564–570. doi:10.1109/ICAC2N63387.2024.10895151. ISBN 979-8-3503-5681-6.
↑ Kumar Rao B, Narendra; Panini Challa, Nagendra; Krishna, E S Phalguna; Chakravarthi, S. Sreenivasa (2023-05-12). "Facial Landmarks Detection System with OpenCV Mediapipe and Python using Optical Flow (Active) Approach". 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE. pp. 92–96. doi:10.1109/icacite57410.2023.10182585. ISBN 979-8-3503-9926-4.
↑ Kaur, Husanpreet; Jyoti; Devi, Sanjana; Rahul (2024-05-24). "Body Language Decoder Using Python". 2024 5th International Conference for Emerging Technology (INCET). IEEE. pp. 1–6. doi:10.1109/INCET61516.2024.10593646. ISBN 979-8-3503-6115-5.
↑ Lugaresi, Camillo; Tang, Jiuqiang; Nash, Hadon; McClanahan, Chris; Uboweja, Esha; Hays, Michael; Zhang, Fan; Chang, Chuo-Ling; Yong, Ming Guang (2019-06-14), MediaPipe: A Framework for Building Perception Pipelines, arXiv: 1906.08172 , retrieved 2025-11-21
↑ Yung-Chen; Hsieh; Sun, Yu (2025-08-10), An Intelligent Mobile Application to Monitor and Correct Sitting Posture Using Raspberry Pi and MediaPipe Pose Detection, arXiv: 2508.11683 , retrieved 2025-11-21
↑ Kulkarni, Pavan Kumar V; M S, Sudha; K R, Divya; S, Vignesh; M, Sindhu (2024-05-22). "Real-Time Gesture Recognition For Arduino-Based LED and Servometer Manipulation Using OpenCV and MediaPipe". 2024 1st International Conference on Communications and Computer Science (InCCCS). IEEE. pp. 1–5. doi:10.1109/InCCCS60947.2024.10593524. ISBN 979-8-3503-5885-8.
↑ Williams-Linera, Eric; Ramírez-Cortés, Juan Manuel (2024-09-18). "Stereo Vision System based on the NVIDIA Jetson Nano for Real-time Evaluation of Yoga Poses". 2024 IEEE International Symposium on Technology and Society (ISTAS). IEEE. pp. 1–7. doi:10.1109/ISTAS61960.2024.10732331. ISBN 979-8-3315-4070-8.
↑ Pranav Durai, Kukil (March 1, 2022). "MediaPipe – The Ultimate Guide to Video Processing". learnopencv.com. Retrieved December 1, 2025.
1 2 Yong, Ming. "Release MediaPipe v0.5.0 · google-ai-edge/mediapipe". GitHub. Retrieved 2025-11-21.
↑ "Object Detection and Tracking using MediaPipe- Google Developers Blog". developers.googleblog.com. Retrieved 2025-11-21.
↑ "Introducing MediaPipe Solutions for On-Device Machine Learning- Google Developers Blog". developers.googleblog.com. Retrieved 2025-11-21.
1 2 3 4 Kukil, Pranav Durai (2022-03-01). "MediaPipe-The Ultimate Guide to Video Processing". learnopencv.com. Retrieved 2025-12-01.{{cite web}}: CS1 maint: url-status (link)
1 2 3 Zhang, Fan; Bazarevsky, Valentin; Vakunov, Andrey; Tkachenka, Andrei; Sung, George; Chang, Chuo-Ling; Grundmann, Matthias (2020-06-18), MediaPipe Hands: On-device Real-time Hand Tracking, arXiv: 2006.10214 , retrieved 2025-11-22
↑ "On-Device, Real-Time Hand Tracking with MediaPipe". research.google. Retrieved 2025-12-19.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hon Law, Wai. "Release MediaPipe v0.10.26 · google-ai-edge/mediapipe". GitHub. Retrieved 2025-11-21.

[2] Abstreiter, Maximilian; Tarkoma, Sasu; Morabito, Roberto (2025-03-12), Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge, arXiv: 2503.09114 , retrieved 2025-11-21

[3] Nezami, Zeinab; Hafeez, Maryam; Djemame, Karim; Zaidi, Syed Ali Raza (2024-11-18), Generative AI on the Edge: Architecture and Performance Evaluation, arXiv: 2411.17712 , retrieved 2025-11-21

[4] Luna-Jiménez, Cristina; Gil-Martín, Manuel; Kleinlein, Ricardo; San-Segundo, Rubén; Fernández-Martínez, Fernando (2023-10-09). "Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks". International Conference on Multimodal Interaction. ICMI '23. New York, NY, USA: Association for Computing Machinery. pp. 373–377. doi:10.1145/3577190.3614143. ISBN 979-8-4007-0055-2.

[5] Kim, Jong-Wook; Choi, Jin-Young; Ha, Eun-Ju; Choi, Jae-Ho (2023-02-20). "Human Pose Estimation Using MediaPipe Pose and Optimization Method Based on a Humanoid Model". Applied Sciences. 13 (4): 2700. Bibcode:2023ApSci..13.2700K. doi: 10.3390/app13042700 . ISSN 2076-3417.

[6] Thaman, B.; Cao, T.; Caporusso, N. (2022-05-23). Face Mask Detection using MediaPipe Facemesh . 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). Opatija, Croatia: IEEE. pp. 378–382. doi:10.23919/MIPRO55190.2022.9803531. ISBN 978-953-233-103-5.

[7] Kim, Heejin; Lee, Jeongha; Bahn, Hyokyung (2025-11-17). "Rethinking I/O Caching for Large Language Model Inference on Resource-Constrained Mobile Platforms". Mathematics. 13 (22): 3689. doi: 10.3390/math13223689 . ISSN 2227-7390.

[8] Meyer, David; Abecassis, Eitan; Fernandez-Labrador, Clara; Schroers, Christopher (2024). "RAST: A Reference-Audio Synchronization Tool for Dubbed Content". Interspeech 2024. pp. 67–71. doi:10.21437/Interspeech.2024-2203.

[9] Høgenhaug, Mads; Friis, Marcus; Pedersen, Morten; Rossi, Luca (2025-04-03), TikTok StitchGraph: Characterizing communication patterns on TikTok through a collection of interaction networks, arXiv: 2502.18661 , retrieved 2025-11-21

[10] Withanage Don, Daksitha Senel; Kiderle, Thomas; Mertes, Silvan; Schiller, Dominik; Ritschel, Hannes; André, Elisabeth (2025-03-24). "MeLaX: Conversations with Generative AI in Socially Interactive Agents". Companion Proceedings of the 30th International Conference on Intelligent User Interfaces. ACM. pp. 163–166. doi:10.1145/3708557.3716363. ISBN 979-8-4007-1409-2.

[11] Sreenath, Sreehari; Daniels, D. Ivan; Ganesh, Apparaju S. D.; Kuruganti, Yashaswi S.; Chittawadigi, Rajeevlochana G. (2021-09-30). "Monocular Tracking of Human Hand on a Smart Phone Camera using MediaPipe and its Application in Robotics". 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC). IEEE. pp. 1–6. doi:10.1109/R10-HTC53172.2021.9641542. ISBN 978-1-6654-3240-5.

[12] Nunes, João; Nascimento, Thamer Horbylon; Felix, Juliana; Soares, Fabrizzio (2025-07-08). "Real-Time Hand Gesture Recognition for Touchless Video Control Using MediaPipe and Random Forest". 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE. pp. 1776–1781. doi:10.1109/COMPSAC65507.2025.00242. ISBN 979-8-3315-7434-5.

[13] Patel, Meenu; Rao, Saksham; Chauhan, Shweta; Kumar, Bibek (2024-12-16). "Real-time Hand Gesture Recognition Using Python and Web Application". 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N). IEEE. pp. 564–570. doi:10.1109/ICAC2N63387.2024.10895151. ISBN 979-8-3503-5681-6.

[14] Kumar Rao B, Narendra; Panini Challa, Nagendra; Krishna, E S Phalguna; Chakravarthi, S. Sreenivasa (2023-05-12). "Facial Landmarks Detection System with OpenCV Mediapipe and Python using Optical Flow (Active) Approach". 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE. pp. 92–96. doi:10.1109/icacite57410.2023.10182585. ISBN 979-8-3503-9926-4.

[15] Kaur, Husanpreet; Jyoti; Devi, Sanjana; Rahul (2024-05-24). "Body Language Decoder Using Python". 2024 5th International Conference for Emerging Technology (INCET). IEEE. pp. 1–6. doi:10.1109/INCET61516.2024.10593646. ISBN 979-8-3503-6115-5.

[16] Lugaresi, Camillo; Tang, Jiuqiang; Nash, Hadon; McClanahan, Chris; Uboweja, Esha; Hays, Michael; Zhang, Fan; Chang, Chuo-Ling; Yong, Ming Guang (2019-06-14), MediaPipe: A Framework for Building Perception Pipelines, arXiv: 1906.08172 , retrieved 2025-11-21

[17] Yung-Chen; Hsieh; Sun, Yu (2025-08-10), An Intelligent Mobile Application to Monitor and Correct Sitting Posture Using Raspberry Pi and MediaPipe Pose Detection, arXiv: 2508.11683 , retrieved 2025-11-21

[18] Kulkarni, Pavan Kumar V; M S, Sudha; K R, Divya; S, Vignesh; M, Sindhu (2024-05-22). "Real-Time Gesture Recognition For Arduino-Based LED and Servometer Manipulation Using OpenCV and MediaPipe". 2024 1st International Conference on Communications and Computer Science (InCCCS). IEEE. pp. 1–5. doi:10.1109/InCCCS60947.2024.10593524. ISBN 979-8-3503-5885-8.

[19] Williams-Linera, Eric; Ramírez-Cortés, Juan Manuel (2024-09-18). "Stereo Vision System based on the NVIDIA Jetson Nano for Real-time Evaluation of Yoga Poses". 2024 IEEE International Symposium on Technology and Society (ISTAS). IEEE. pp. 1–7. doi:10.1109/ISTAS61960.2024.10732331. ISBN 979-8-3315-4070-8.

[20] Pranav Durai, Kukil (March 1, 2022). "MediaPipe – The Ultimate Guide to Video Processing". learnopencv.com. Retrieved December 1, 2025.

[:0-21] 1 2 Yong, Ming. "Release MediaPipe v0.5.0 · google-ai-edge/mediapipe". GitHub. Retrieved 2025-11-21.

[22] "Object Detection and Tracking using MediaPipe- Google Developers Blog". developers.googleblog.com. Retrieved 2025-11-21.

[23] "Introducing MediaPipe Solutions for On-Device Machine Learning- Google Developers Blog". developers.googleblog.com. Retrieved 2025-11-21.

[:2-24] 1 2 3 4 Kukil, Pranav Durai (2022-03-01). "MediaPipe-The Ultimate Guide to Video Processing". learnopencv.com. Retrieved 2025-12-01.{{cite web}}: CS1 maint: url-status (link)

[:1-25] 1 2 3 Zhang, Fan; Bazarevsky, Valentin; Vakunov, Andrey; Tkachenka, Andrei; Sung, George; Chang, Chuo-Ling; Grundmann, Matthias (2020-06-18), MediaPipe Hands: On-device Real-time Hand Tracking, arXiv: 2006.10214 , retrieved 2025-11-22

[26] "On-Device, Real-Time Hand Tracking with MediaPipe". research.google. Retrieved 2025-12-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]