Ashish Vaswani

Last updated
Ashish Vaswani
Born1986
Alma mater
Known for Transformer (deep learning architecture)
Scientific career
Fields
Institutions
Thesis Smaller, Faster, and Accurate Models for Statistical Machine Translation (2014)
Doctoral advisor
  • David Chiang
  • Liang Huang
Website https://www.isi.edu/~avaswani/

Ashish Vaswani is a computer scientist working in deep learning, [1] who is known for his significant contributions to the field of artificial intelligence (AI) and natural language processing (NLP). He is one of the co-authors of the seminal paper "Attention Is All You Need" [2] which introduced the Transformer model, a novel architecture that uses a self-attention mechanism and has since become foundational to many state-of-the-art models in NLP. Transformer architecture is the core of language models that power applications such as ChatGPT. [3] [4] [5] He was a co-founder of Adept AI Labs [6] [7] and a former staff research scientist at Google Brain. [8] [9]

Contents

Career

Vaswani completed his engineering in Computer Science from BIT Mesra in 2002. In 2004, he moved to the US to pursue higher studies at University of Southern California. [10] He did his PhD at the University of Southern California. [11] He has worked as a researcher at Google, [12] where he was part of the Google Brain team. He was a co-founder of Adept AI Labs but left the company. [13] [14]

Notable works

Vaswani's most notable work is the paper "Attention Is All You Need", published in 2017. [15] The paper introduced the Transformer model, which eschews the use of recurrence in sequence-to-sequence tasks and relies entirely on self-attention mechanisms. The model has been instrumental in the development of several subsequent state-of-the-art models in NLP, including BERT, [16] GPT-2, and GPT-3.

Related Research Articles

A multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear kind of activation function, organized in at least three layers, notable for being able to distinguish data that is not linearly separable. It is a misnomer because the original perceptron used a Heaviside step function, instead of a nonlinear kind of activation function.

Anthropic PBC is an American artificial intelligence (AI) startup company, founded by former members of OpenAI. Anthropic has developed a family of large language models named Claude.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI spring.

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". It has no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl. Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

Machine learning-based attention is a mechanism which intuitively mimics cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. These weights can be computed either in parallel or sequentially. "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

A vision transformer (ViT) is a transformer designed for computer vision. A ViT breaks down an input image into a series of patches, serialises each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

<span class="mw-page-title-main">GPT-1</span> 2018 text-generating language model

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Bard, a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT.

<span class="mw-page-title-main">ChatGPT</span> Chatbot developed by OpenAI

ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

<span class="mw-page-title-main">Aidan Gomez</span> Artificial intelligence researcher

Aidan Gomez is a British-Canadian computer scientist working in the field of artificial intelligence, with a focus on natural language processing. He is the co-founder and CEO of the technology company Cohere.

<span class="mw-page-title-main">Attention Is All You Need</span> 2017 research paper by Google

"Attention Is All You Need" is a 2017 research paper by Google. Authored by eight scientists, it was responsible for expanding 2014 attention mechanisms proposed by Bahdanau et. al. into a new deep learning architecture known as the transformer. The paper is considered by some to be a founding document for modern artificial intelligence, as transformers became the main architecture of large language models. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but even in their paper the authors saw the potential for other tasks like question answering and for what is now called multimodal Generative AI.

References

  1. "Ashish Vaswani". scholar.google.com. Retrieved 2023-07-11.
  2. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  3. "Inside the brain of ChatGPT". stackbuilders.com. Retrieved 2023-07-12.
  4. "Understanding ChatGPT as explained by ChatGPT". Advancing Analytics. 2023-01-18. Retrieved 2023-07-12.
  5. Seetharaman, Deepa; Jin, Berber (2023-05-08). "ChatGPT Fever Has Investors Pouring Billions Into AI Startups, No Business Plan Required". Wall Street Journal. ISSN   0099-9660 . Retrieved 2023-07-12.
  6. "Introducing Adept".
  7. "Top ex-Google AI researchers raise $8 million in funding from Thrive Capital". The Economic Times. May 4, 2023.
  8. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (May 21, 2017). "Attention is All You Need". arXiv: 1706.03762 [cs.CL].
  9. Shead, Sam (2022-06-10). "A.I. gurus are leaving Big Tech to work on buzzy new start-ups". CNBC. Retrieved 2023-07-12.
  10. Team, OfficeChai (February 4, 2023). "The Indian Researchers Whose Work Led To The Creation Of ChatGPT". OfficeChai.
  11. "Ashish Vaswani's webpage at ISI". www.isi.edu.
  12. "Transformer: A Novel Neural Network Architecture for Language Understanding". ai.googleblog.com. August 31, 2017.
  13. Rajesh, Ananya Mariam; Hu, Krystal; Rajesh, Ananya Mariam; Hu, Krystal (March 16, 2023). "AI startup Adept raises $350 mln in fresh funding". Reuters via www.reuters.com.
  14. Tong, Anna; Hu, Krystal; Tong, Anna; Hu, Krystal (2023-05-04). "Top ex-Google AI researchers raise funding from Thrive Capital". Reuters. Retrieved 2023-07-11.
  15. "USC Alumni Paved Path for ChatGPT". USC Viterbi | School of Engineering.
  16. Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (May 24, 2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv: 1810.04805 [cs.CL].