Devi Parikh

Last updated September 06, 2023

Devi Parikh is an American computer scientist.

Career

Parikh earned her PhD in Electrical and Computer Engineering at Carnegie Mellon University. She has served as a professor at Virginia Tech and Georgia Tech, and as of 2022 she is a research director at Meta.^[1]

Research

Parikh's research focuses on computer vision and natural language processing.

In 2015, Parikh and her students at Virginia Tech worked on AI for Visual Question Answering (VQA). This technology allows users to ask questions about pictures, e.g. "Is this a vegetarian pizza?"^[2]^[3] Parikh's VQA dataset has been used to evaluate over 30 AI models.^[4]

In 2017, Parikh published a conversational agent called ParlAI.^[5] In 2020, she developed an AI system that generates dance moves in sync with songs.^[6]^[7] In 2022, Parikh and a team at Meta developed Make-a-Video, a text-to-video AI model that is based on the diffusion algorithm.^[8]^[9]

Awards

2017 IJCAI Computers and Thought Award
2011 ICCV Best-Paper Award ("Marr Prize")

Related Research Articles

Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a natural language.

Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a multidisciplinary branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. Artificial intelligence applications have been used in a wide range of fields including medical diagnosis, economic-financial applications, robot control, law, scientific discovery, video games, and toys. However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 21st century, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time.

Artificial intelligence and music (AIM) is a common subject in the International Computer Music Conference, the Computing Society Conference and the International Joint Conference on Artificial Intelligence. The first International Computer Music Conference (ICMC) was held in 1974 at Michigan State University. Current research includes the application of AI in music composition, performance, theory and digital sound processing.

Maluuba is a Canadian technology company conducting research in artificial intelligence and language understanding. Founded in 2011, the company was acquired by Microsoft in 2017.

In artificial intelligence, a differentiable neural computer (DNC) is a memory augmented neural network architecture (MANN), which is typically recurrent in its implementation. The model was published in 2016 by Alex Graves et al. of DeepMind.

fastText is a library for learning of word embeddings and text classification created by Facebook's AI Research (FAIR) lab. The model allows one to create an unsupervised learning or supervised learning algorithm for obtaining vector representations for words. Facebook makes available pretrained models for 294 languages. Several papers describe the techniques used by fastText.

This article presents a detailed timeline of events in the history of computing from 2020 to the present. For narratives explaining the overall developments, see the history of computing.

Yejin Choi is the Brett Helsel Professor of Computer Science at the University of Washington. Her research considers natural language processing and computer vision. Choi was awarded a MacArthur Fellowship in 2022.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on BookCorpus, a dataset of over 7,000 unpublished fiction books from various genres, and trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

DALL-E and DALL-E 2 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions, called "prompts". DALL-E was revealed by OpenAI in a blog post in January 2021, and uses a version of GPT-3 modified to generate images. In April 2022, OpenAI announced DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles".

Prompt engineering, primarily used in communication with a text-to-text model, is the process of structuring text that can be interpreted and understood by a generative AI model. Prompt engineering is enabled by in-context learning, defined as a model's ability to temporarily learn from prompts. The ability for in-context learning is an emergent ability of large language models.

Meta AI is an artificial intelligence laboratory that belongs to Meta Platforms Inc. Meta AI intends to develop various forms of artificial intelligence, improving augmented and artificial reality technologies. Meta AI is an academic research laboratory focused on generating knowledge for the AI community. This is in contrast to Facebook's Applied Machine Learning (AML) team, which focuses on practical applications of its products.

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. It was developed by researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a compute donation by Stability AI and training data from non-profit organizations.

A text-to-image model is a machine learning model which takes an input natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural networks. In 2022, the output of state of the art text-to-image models, such as OpenAI's DALL-E 2, Google Brain's Imagen, StabilityAI's Stable Diffusion, and Midjourney began to approach the quality of real photographs and human-drawn art.

A text-to-video model is a machine learning model which takes as input a natural language description and produces a video matching that description.

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a confident response by an AI that does not seem to be justified by its training data. For example, a hallucinating chatbot might, when asked to generate a financial report for a company, falsely state that the company's revenue was $13.6 billion.

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. The first GPT was introduced in 2018 by OpenAI. GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

A large language model (LLM) is a language model characterized by its large size. Their size is enabled by AI accelerators, which are able to process vast amounts of text data, mostly scraped from the Internet. The artificial neural networks which are built can contain from tens of millions and up to billions of weights and are (pre-)trained using self-supervised learning and semi-supervised learning. Transformer architecture contributed to faster training. Alternative architectures include the mixture of experts (MoE), which has been proposed by Google, starting with sparsely-gated ones in 2017, Gshard in 2021 to GLaM in 2022.

Generative artificial intelligence (AI) is artificial intelligence capable of generating text, images, or other media, using generative models. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

References

↑ Parikh, Devi (2022-12-28). "Curriculum Vitae" (PDF). Retrieved 2022-12-28.
↑ Agrawal, Aishwarya; Lu, Jiasen; Antol, Stanislaw; Mitchell, Margaret; Zitnick, C. Lawrence; Batra, Dhruv; Parikh, Devi (2016-10-26). "VQA: Visual Question Answering". arXiv: 1505.00468 [cs.CL].
↑ Yao, Mariya. "Meet These Incredible Women Advancing A.I. Research". Forbes. Retrieved 2022-12-28.
↑ "Papers with Code - VQA v2 test-dev Benchmark (Visual Question Answering)". paperswithcode.com. Retrieved 2022-12-28.
↑ Mannes, John (2017-05-15). "Facebook's ParlAI is where researchers will push the boundaries of conversational AI". TechCrunch. Retrieved 2022-12-28.
↑ Tendulkar, Purva; Das, Abhishek; Kembhavi, Aniruddha; Parikh, Devi (2020-06-23). "Feel The Music: Automatically Generating A Dance For An Input Song". arXiv: 2006.11905 [cs.AI].
↑ "Facebook's new choreography AI is a dancing queen". Engadget. Retrieved 2022-12-28.
↑ Edwards, Benj (2022-09-29). "Meta announces Make-A-Video, which generates video from text [Updated]". Ars Technica. Retrieved 2022-12-28.
↑ Singer, Uriel; Polyak, Adam; Hayes, Thomas; Yin, Xi; An, Jie; Zhang, Songyang; Hu, Qiyuan; Yang, Harry; Ashual, Oron; Gafni, Oran; Parikh, Devi; Gupta, Sonal; Taigman, Yaniv (2022-09-29). "Make-A-Video: Text-to-Video Generation without Text-Video Data". arXiv: 2209.14792 [cs.CV].

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Parikh, Devi (2022-12-28). "Curriculum Vitae" (PDF). Retrieved 2022-12-28.

[2] Agrawal, Aishwarya; Lu, Jiasen; Antol, Stanislaw; Mitchell, Margaret; Zitnick, C. Lawrence; Batra, Dhruv; Parikh, Devi (2016-10-26). "VQA: Visual Question Answering". arXiv: 1505.00468 [cs.CL].

[3] Yao, Mariya. "Meet These Incredible Women Advancing A.I. Research". Forbes. Retrieved 2022-12-28.

[4] "Papers with Code - VQA v2 test-dev Benchmark (Visual Question Answering)". paperswithcode.com. Retrieved 2022-12-28.

[5] Mannes, John (2017-05-15). "Facebook's ParlAI is where researchers will push the boundaries of conversational AI". TechCrunch. Retrieved 2022-12-28.

[6] Tendulkar, Purva; Das, Abhishek; Kembhavi, Aniruddha; Parikh, Devi (2020-06-23). "Feel The Music: Automatically Generating A Dance For An Input Song". arXiv: 2006.11905 [cs.AI].

[7] "Facebook's new choreography AI is a dancing queen". Engadget. Retrieved 2022-12-28.

[8] Edwards, Benj (2022-09-29). "Meta announces Make-A-Video, which generates video from text [Updated]". Ars Technica. Retrieved 2022-12-28.

[9] Singer, Uriel; Polyak, Adam; Hayes, Thomas; Yin, Xi; An, Jie; Zhang, Songyang; Hu, Qiyuan; Yang, Harry; Ashual, Oron; Gafni, Oran; Parikh, Devi; Gupta, Sonal; Taigman, Yaniv (2022-09-29). "Make-A-Video: Text-to-Video Generation without Text-Video Data". arXiv: 2209.14792 [cs.CV].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]