Xu Li (computer scientist)

Last updated

Xu Li
Alma mater Chinese University of Hong Kong
Shanghai Jiao Tong University
AwardsBest Paper Award Non-Photorealistic Rendering and Animation (NPAR) 2012
Best Reviewer Award Asian Conference on Computer Vision ACCV 2012
International Conference on Computer Vision (ICCV) 2015
Website SenseTime

Xu Li is a Chinese computer scientist and co-founder and current CEO of SenseTime, an artificial intelligence (AI) company. Xu has led SenseTime since the company's incorporation and helped it independently develop its proprietary deep learning platform.

Contents

Education and research

Xu obtained both his bachelor's and master's degrees in computer science from Shanghai Jiao Tong University. [1] He received his doctorate in computer science from the Chinese University of Hong Kong.

Xu has published more than 50 papers at international conferences and in journals in the field of computer vision and won the Best Paper Award at the international conference on Non-Photorealistic Rendering and Animation (NPAR) 2012 and the Best Reviewer Award at the international conferences Asian Conference on Computer Vision ACCV 2012 and International Conference on Computer Vision (ICCV) 2015. [2] He has three algorithms that have been included into the visual open-source platform OpenCV, and his "L0 Smoothing" algorithm garnered the most citations in research papers over a span of five years (2011–2015) within the ACM Transactions on Graphics (TOG), a scientific journal that Thomson Reuters InCites has placed first among software engineering journals. [3] [4]

Career

Previously, Xu worked at Lenovo Corporate Research & Development. He was also a visiting researcher at Motorola China R&D Institute, Omron Research Institute, and Microsoft Research. [5]

Selected publications

Awards and honors

Xu was ranked 7th in Fortune magazine's 2018 edition of its 40 Under 40. [7] He was also named "China's Outstanding AI Industry Leader" by The Economic Observer, received the "Innovative Business Leader" Award under NetEase's "Future Technology Talent Awards", and was honored as Sina's "2017 Top Ten Economic Figures". In 2018, Xu was named EY's "Entrepreneur of the Year China" in the Technology category. [8]

Related Research Articles

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

In computer vision, the bag-of-words model sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

The IEEE Annual Symposium on Foundations of Computer Science (FOCS) is an academic conference in the field of theoretical computer science. FOCS is sponsored by the IEEE Computer Society.

<span class="mw-page-title-main">Jitendra Malik</span> Indian-American academic (born 1960)

Jitendra Malik is an Indian-American academic who is the Arthur J. Chick Professor of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He is known for his research in computer vision.

<span class="mw-page-title-main">Fei-Fei Li</span> Chinese-American computer scientist (born 1976)

Fei-Fei Li is a Chinese-American computer scientist, known for establishing ImageNet, the dataset that enabled rapid advances in computer vision in the 2010s. She is the Sequoia Capital professor of computer science at Stanford University and former board director at Twitter. Li is a co-director of the Stanford Institute for Human-Centered Artificial Intelligence and a co-director of the Stanford Vision and Learning Lab. She served as the director of the Stanford Artificial Intelligence Laboratory from 2013 to 2018.

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge, where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.

<span class="mw-page-title-main">AlexNet</span> Convolutional neural network

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.

Jason Joseph Corso is Co-Founder / CEO of the computer vision startup Voxel51 and a Professor of Robotics, Electrical Engineering and Computer Science at the University of Michigan.

Inception v3 is a convolutional neural network (CNN) for assisting in image analysis and object detection, and got its start as a module for GoogLeNet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. The design of Inceptionv3 was intended to allow deeper networks while also keeping the number of parameters from growing too large: it has "under 25 million parameters", compared against 60 million for AlexNet.

Jiebo Luo is a Chinese-American computer scientist, the Albert Arendt Hopeman Professor of Engineering and Professor of Computer Science at the University of Rochester. He is interested in artificial intelligence, data science and computer vision.

In the domain of physics and probability, the filters, random fields, and maximum entropy (FRAME) model is a Markov random field model of stationary spatial processes, in which the energy function is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. The FRAME model was originally developed by Song-Chun Zhu, Ying Nian Wu, and David Mumford for modeling stochastic texture patterns, such as grasses, tree leaves, brick walls, water waves, etc. This model is the maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters, where for each filter tuned to a specific scale and orientation, the marginal histogram is pooled over all the pixels in the image domain. The FRAME model is also proved to be equivalent to the micro-canonical ensemble, which was named the Julesz ensemble. Gibbs sampler is adopted to synthesize texture images by drawing samples from the FRAME model.

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

<span class="mw-page-title-main">Video super-resolution</span> Generating high-resolution video frames from given low-resolution ones

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

Jiaya Jia is a Chair Professor of the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology (HKUST). He is an IEEE Fellow, the associate editor-in-chief of one of IEEE’s flagship and premier journals- Transactions on Pattern Analysis and Machine Intelligence (TPAMI), as well as on the editorial board of International Journal of Computer Vision (IJCV).

Yixin Chen is a computer scientist, academic, and author. He is a professor of computer science and engineering at Washington University in St. Louis.

Wang Gang, also known as Michael Wang, is an electrical and computer engineer and academic specializing in Artificial Intelligence and its application in autonomous driving. Wang has authored or co-authored more than 100 publications, cited over 28,000 times. His h-index is computed to be 72.

References

  1. "Meet SenseTime, Hong Kong's first unicorn no one's heard of". South China Morning Post. 23 October 2017.
  2. 1 2 "LI XU – Homepage". lxu.me.
  3. "Image Smoothing via L0 Gradient Minimization". www.cse.cuhk.edu.hk.
  4. "2017 Latest Impact Factors (2016 Journal Citation Reports, Thomson Reuters)".
  5. "Dr. Xu Li | Hong Kong X-Tech Startup Platform".
  6. "CVPR 2017 Open Access Repository". openaccess.thecvf.com.
  7. "40-Under-40: Xu Li".
  8. "EY Entrepreneur Of The Year China 2018 winner – Xu Li – EY – China Mainland".