Vladlen Koltun

Last updated
Vladlen Koltun
Born1980
NationalityIsraeli–American
CitizenshipUnited States
Alma mater
Known for
Awards
Scientific career
Fields
  • Autonomous Driving
  • Computer Graphics
  • Computer Vision
  • Machine Learning
  • Robotics
Institutions
Thesis Arrangements in four dimensions and related structures  (2002)
Doctoral advisor Micha Sharir
Other academic advisors Christos Papadimitriou
Website vladlen.info

Vladlen Koltun (born 1980) is an Israeli-American computer scientist and intelligent systems researcher. He currently serves as distinguished scientist at Apple Inc. His main areas of research are artificial intelligence, computer vision, machine learning, and pattern recognition. He also made a significant contribution to robotics and autonomous driving. [6]

Contents

Koltun's contributions to research and publications are in the areas of convolutional neural networks, reality simulation, view synthesis, [7] photorealistic rendering, urban self-driving cars simulation, 3D computer graphics, robot locomotion, [1] and drones maneuverability in dynamic environments. [8] [9]

He is also known for his work on the photorealism enhancement system, [10] [11] and for the critical study of the Hirsch index, a metric of scientists' work that is common in the community. [12]

Early life and education

Vladlen Koltun was born in 1980 in Kiev, Ukraine and grew up in Israel. [13] He completed his BS degree in computer science magna cum laude from Tel Aviv University in 2000. He continued his studies at the university, and finished his PhD with honors in computer science in 2002 with a thesis, Arrangements in four dimensions and related structures; his doctoral adviser was Micha Sharir. [13] [14] He then completed his postdoctoral fellowship under the supervision of Christos Papadimitriou at the University of California, Berkeley, where he conducted research in theoretical computer science in 2002–2005. [15]

Career

Koltun served as an assistant professor at Stanford University from 2005 to 2013, where he lectured in the areas of computer science, computer graphics, and geometric algorithms. [16] During his tenure at Stanford, he supervised PhD students and postdoctoral researchers. [14] While at Stanford, Koltun was a recipient of the Sloan Research Fellowship [2] and the National Science Foundation's Career Award. [3]

Koltun's research at Stanford contributed to the development of data-driven 3D modeling technology in collaboration with Siddhartha Chaudhuri. [17] Chaudhuri's work along with Koltun, Evangelos Kalogerakis, and Leonidas Guibas resulted in a SIGGRAPH publication in 2011. [18] As a result, Mixamo licensed the technology from Stanford and later Adobe Inc. acquired Mixamo and further developed Adobe Fuse CC, 3D computer graphics software that enabled users to create 3D characters. [19] In 2014, Koltun joined Adobe to conduct research in visual computing with the primary focus on three-dimensional reconstruction. [20]

Koltun left Adobe to join Intel, where he served in various positions until 2021 for the company's R&D projects for Intelligent Systems. [6] [5]

Since August 2021, Koltun has been serving as a distinguished scientist at Apple Inc. [16]

Research

At Intel, Koltun contributed to the development of virtual reality simulators for urban autonomous driving, robots, and drones, focusing on deep reinforcement learning techniques with neural networks in virtual environments. These networks underwent trial-and-error learning in VR before being transferred to robots or drones for real-world applications. This method was applied to the ANYmal robot, a quadrupedal machine with proprioceptive feedback in locomotion control. [21] [8] [22]

The studies in the domain of urban autonomous driving led Koltun's group to the development of the Car Learning to Act (CARLA) project in 2017. [23] It is an open-source simulator, powered by Unreal Engine, that can be used to test self-driving technologies in realistic environments with random dangerous situations. [23] [24] [25] The project was funded by the Intel Labs and Toyota Research Institute. [23] [26]

In 2020, inspired by Google Cardboard, Koltun developed OpenBot along with a German scientist Matthias Müller. [1] [27] It is a software stack that transforms Android smartphones into four-wheeled robots capable of navigation, object tracking, and obstacle avoidance. The robot features a 3D-printable chassis, accommodating a controller, LEDs, a smartphone mount, and a USB cable. [27] The software consists of the Arduino Nano board, which bridges the smartphone with the motor actuation tasks and batteries, and an Android app responsible for the integration of data. [1] [27] The project was released as open-source software for robotics-related applications with the software development kit available on GitHub. [28] [29]

Koltun also contributed to further development in the fields of 3D photorealistic view synthesis and rendering. In 2021, using his work with other researchers at Intel, Enhancing Photorealism Enhancement, [30] a photorealism enhancement system was tested in the Grand Theft Auto 5. [31] [32] [33]

Hirsch index critique

Koltun has expressed concerns regarding the h-index's reliability, highlighting the inflation of its values due to the prevalence of multiple co-authorships in scientific communities. This critique was presented in collaboration with David Hafner. [34] [35]

Selected works

Related Research Articles

<span class="mw-page-title-main">Non-photorealistic rendering</span> Style of rendering

Non-photorealistic rendering (NPR) is an area of computer graphics that focuses on enabling a wide variety of expressive styles for digital art, in contrast to traditional computer graphics, which focuses on photorealism. NPR is inspired by other artistic modes such as painting, drawing, technical illustration, and animated cartoons. NPR has appeared in movies and video games in the form of cel-shaded animation as well as in scientific visualization, architectural illustration and experimental animation.

<span class="mw-page-title-main">Legged robot</span> Type of mobile robot

Legged robots are a type of mobile robot which use articulated limbs, such as leg mechanisms, to provide locomotion. They are more versatile than wheeled robots and can traverse many different terrains, though these advantages require increased complexity and power consumption. Legged robots often imitate legged animals, such as humans or insects, in an example of biomimicry.

<span class="mw-page-title-main">Applications of artificial intelligence</span>

Artificial intelligence (AI) has been used in applications to alleviate certain problems throughout industry and academia. AI, like electricity or computers, is a general-purpose technology that has a multitude of applications. It has been used in fields of language translation, image recognition, credit scoring, e-commerce and other domains.

<span class="mw-page-title-main">Computer graphics</span> Graphics created using computers

Computer graphics deals with generating images and art with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. A great deal of specialized hardware and software has been developed, with the displays of most devices being driven by computer graphics hardware. It is a vast and recently developed area of computer science. The phrase was coined in 1960 by computer graphics researchers Verne Hudson and William Fetter of Boeing. It is often abbreviated as CG, or typically in the context of film as computer generated imagery (CGI). The non-artistic aspects of computer graphics are the subject of computer science research.

<span class="mw-page-title-main">Open-source robotics</span> Open-source branch of robotics

Open-source robotics is a branch of robotics where robots are developed with open-source hardware and free and open-source software, publicly sharing blueprints, schematics, and source code. The term usually means that information about the hardware is easily discerned, so that others can make it from standard commodity components and tools. It is thus closely related to the open design movement, the maker movement and open science.

A cognitive computer is a computer that hardwires artificial intelligence and machine learning algorithms into an integrated circuit that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic engineering approach. Synonyms include neuromorphic chip and cognitive chip.

<span class="mw-page-title-main">Mixamo</span> Technology company

Mixamo is a 3D computer graphics technology company. Based in San Francisco, the company develops and sells web-based services for 3D character animation. Mixamo's technologies use machine learning methods to automate the steps of the character animation process, including 3D modeling to rigging and 3D animation.

Google Brain was a deep learning artificial intelligence research team under the umbrella of Google AI, a research division at Google dedicated to artificial intelligence. Formed in 2011, Google Brain combined open-ended machine learning research with information systems and large-scale computing resources. The team has created tools such as TensorFlow, which allow for neural networks to be used by the public, with multiple internal AI research projects. The team aims to create research opportunities in machine learning and natural language processing. The team was merged into former Google sister company DeepMind to form Google DeepMind in April 2023.

<span class="mw-page-title-main">Tango (platform)</span> Mobile computer vision platform for Android developed by Google

Tango was an augmented reality computing platform, developed and authored by the Advanced Technology and Projects (ATAP), a skunkworks division of Google. It used computer vision to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. This allowed application developers to create user experiences that include indoor navigation, 3D mapping, physical space measurement, environmental recognition, augmented reality, and windows into a virtual world.

Adobe Fuse CC is a discontinued 3D computer graphics software developed by Mixamo that enables users to create 3D characters. Its main novelty is the ability to import and integrate user-generated content into the character creator. Fuse was part of Mixamo's product suite, and it is aimed at video game developers, video game modders, and 3D enthusiasts.

An AI accelerator or neural processing unit is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

Reza Zadeh is an American-Canadian-Iranian computer scientist and technology executive working on machine learning. He is adjunct professor at Stanford University and CEO of Matroid. He has served on the technical advisory boards of Databricks and Microsoft. His work focuses on machine learning, distributed computing, and discrete applied mathematics.

<span class="mw-page-title-main">Deep reinforcement learning</span> Machine learning that combines deep learning and reinforcement learning

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

<span class="mw-page-title-main">Timeline of computing 2020–present</span> Historical timeline

This article presents a detailed timeline of events in the history of computing from 2020 to the present. For narratives explaining the overall developments, see the history of computing.

AirSim is an open-source, cross platform simulator for drones, ground vehicles such as cars and various other objects, built on Epic Games’ proprietary Unreal Engine 4 as a platform for AI research. It is developed by Microsoft and can be used to experiment with deep learning, computer vision and reinforcement learning algorithms for autonomous vehicles. This allows testing of autonomous solutions without worrying about real-world damage.

<span class="mw-page-title-main">Multi-agent reinforcement learning</span> Sub-field of reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics.

<span class="mw-page-title-main">Chelsea Finn</span> American computer scientist and academic

Chelsea Finn is an American computer scientist and assistant professor at Stanford University. Her research investigates intelligence through the interactions of robots, with the hope to create robotic systems that can learn how to learn. She is part of the Google Brain group.

A neural radiance field (NeRF) is a method based on deep learning for reconstructing a three-dimensional representation of a scene from sparse two-dimensional images. The NeRF model can learn the scene geometry, camera poses, and the reflectance properties of objects, allowing it to render photorealistic views of the scene from novel viewpoints. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation.

Wang Gang, also known as Michael Wang, is an electrical and computer engineer and academic specializing in Artificial Intelligence and its application in autonomous driving. Wang has authored or co-authored more than 100 publications, cited over 28,000 times. His h-index is computed to be 72.

Xiaoming Liu is a Chinese-American computer scientist and an academic. He is a Professor in the Department of Computer Science and Engineering, MSU Foundation Professor as well as Anil K. and Nandita Jain Endowed Professor of Engineering at Michigan State University.

References

  1. 1 2 3 4 "How To Turn Your Smartphone Into A Robot". Discover Magazine.
  2. 1 2 "Sloan Research Fellowship in computer science (2007)" (PDF). Alfred P. Sloan Foundation.
  3. 1 2 "Career Award: Fundamental Geometric Algorithms".
  4. "Computer Graphics as a Telecommunication Medium". Princeton University.
  5. 1 2 "Computer Science Bibliography: Vladlen Koltun's affiliations". Schloss Dagstuhl, Leibniz Center for Informatics.
  6. 1 2 "Vladlen Koltun". Institute of Electrical and Electronics Engineers.
  7. "This new tech from Intel Labs could revolutionize VR gaming". PC Games.
  8. 1 2 3 Lee, Joonho; Hwangbo, Jemin; Wellhausen, Lorenz; Koltun, Vladlen; Hutter, Marco (2020). "Learning quadrupedal locomotion over challenging terrain". Science Robotics. 5 (47). doi:10.1126/scirobotics.abc5986. hdl: 20.500.11850/448343 . PMID   33087482. S2CID   224828219.
  9. "Robots, hominins and superconductors: 10 remarkable papers from 2019". Nature. 576 (7787): 394–396. 2019. Bibcode:2019Natur.576..394.. doi: 10.1038/d41586-019-03834-4 . PMID   31844266. S2CID   209371845.
  10. Tunholi, Murilo. "GTA 5 fica mais realista com aprendizado de máquina do Intel Labs". Terra (in Brazilian Portuguese). Retrieved 9 March 2023.
  11. Kemper, Jonathan (19 May 2021). "GTA 5: Intel überarbeitet Spielegrafik mithilfe deutscher Städte". Allround-PC (in German). Retrieved 9 March 2023.
  12. Durrani, Jamie (2021-07-29). "Reliability of researcher metric the h-index is in decline". Chemistry World.
  13. 1 2 "Arrangements in four dimensions and related structures". The National Library of Israel.
  14. 1 2 "Vladlen Koltun". Math Genealogy.
  15. "Introducing scholars who have recently joined the faculty". Stanford University News. 6 November 2021. Archived from the original on 2021-11-06. Retrieved 9 March 2023.
  16. 1 2 "Vladlen Koltun's Biography".
  17. "3D modeling with data-driven suggestions". Stanford Digital Repository.
  18. "Probabilistic Reasoning for Assembly-Based 3D Modeling". Cornell University.
  19. "Adobe buys 3D startup Mixamo". Fortune.
  20. "Learning Complex Neural Network Policies with Trajectory Optimization".
  21. "How robots learn to hike".
  22. Lee, Joonho; Hwangbo, Jemin; Wellhausen, Lorenz; Koltun, Vladlen; Hutter, Marco (October 2020). "Learning Quadrupedal Locomotion over Challenging Terrain". Science Robotics. 5 (47). arXiv: 2010.11251 . doi:10.1126/scirobotics.abc5986. hdl:20.500.11850/448343. PMID   33087482. S2CID   224828219.
  23. 1 2 3 "Toyota donates $100,000 for open-source self-driving simulator". CNET.
  24. "The Open-Source Driving Simulator That Trains Autonomous Vehicles". MIT Technology Review.
  25. Dosovitskiy, Alexey; Ros, German; Codevilla, Felipe; Lopez, Antonio; Koltun, Vladlen (November 2017). "CARLA: An Open Urban Driving Simulator". Conference on Robot Learning (CoRL). arXiv: 1711.03938 .
  26. "Carla Project".
  27. 1 2 3 "Intel researchers design smartphone-powered robot that costs $50 to assemble". VentureBeat. 26 August 2020.
  28. Müller, Matthias; Koltun, Vladlen (August 2020). "OpenBot: Turning Smartphones into Robots". Cornell University. arXiv: 2008.10631v2 .
  29. "OpenBot code". GitHub. 13 January 2023.
  30. "Machine Learning Takes GTA V Photorealism to Never-Before-Seen Levels". Interesting Engineering. 17 May 2021.
  31. "'Grand Theft Auto V' mod adds uncanny photorealism through AI". Engadget. Retrieved 9 March 2023.
  32. Liszewski, Andrew (2021-05-12). "Grand Theft Auto Looks Frighteningly Photorealistic With This Machine Learning Technique". Gizmodo.
  33. Dickson, Ben (2021-05-31). "Intel's image-enhancing AI is a step forward for photorealistic game engines". VentureBeat.
  34. Koltun, Vladlen; Hafner, David (June 2021). "The h-index is no longer an effective correlate of scientific reputation". PLOS ONE. 16 (6): e0253397. doi: 10.1371/journal.pone.0253397 . PMC   8238192 . PMID   34181681.
  35. Hafner, David (June 2021). "Data for "The h-index is no longer an effective correlate of scientific reputation"". Mendeley Data. 1. doi:10.17632/wsrjd8m2h6.1.
  36. "Paper Awards, 2020". Robotics Science and Systems.
  37. Savva, Manolis; Kadian, Abhishek; Maksymets, Oleksandr; Zhao, Yili; Wijmans, Erik; Jain, Bhavana; Straub, Julian; Liu, Jia; Koltun, Vladlen; Malik, Jitendra; Parikh, Devi; Batra, Dhruv (February 2020). Habitat: A Platform for Embodied AI Research. pp. 9338–9346. arXiv: 1904.01201 . doi:10.1109/ICCV.2019.00943. ISBN   978-1-7281-4803-8. S2CID   91184540.