Prompt engineering

Last updated

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. [1] [2]

Contents

A prompt is natural language text describing the task that an AI should perform. [3] A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", [4] a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", [5] or a longer statement including context, instructions, [6] and conversation history. Prompt engineering may involve phrasing a query, specifying a style, [5] choice of words and grammar, [7] providing relevant context, [8] or assigning a role to the AI such as "act as a native French speaker". [9]

When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse" [10] or "Lo-fi slow BPM electro chill with organic samples". [11] Prompting a text-to-image model may involve adding, removing, emphasizing, and re-ordering words to achieve a desired subject, style, [1] layout, lighting, [12] and aesthetic.

History

In 2018, researchers first proposed that all previously separate tasks in natural language processing (NLP) could be cast as a question-answering problem over a context. In addition, they trained a first single, joint, multi-task model that would answer any task-related question like "What is the sentiment" or "Translate this sentence to German" or "Who is the president?" [13]

The 21st century of the first year of the AI boom led to a relatively large amount of "prompting technique" to get the model to output the desired outcome, similarly characterized by trial-and-error, [14] and can also be characterized as a type of prompt injection. In 2021, researchers fine-tuned one generatively pretrained model (T0) on performing 12 NLP tasks (using 62 datasets, as each task can have multiple datasets). The model showed good performance on new tasks, surpassing models trained directly on just performing one task (without pretraining). To solve a task, T0 is given the task in a structured prompt; for example, If {{premise}} is true, is it also true that {{hypothesis}}? ||| {{entailed}}. is the prompt that is used for making T0 solve entailment. [15]

A repository for prompts reported that over 2,000 public prompts for around 170 datasets were available in February 2022. [16] In 2022, the chain-of-thought prompting technique was proposed by Google researchers. [17] [18] In 2023, several text-to-text and text-to-image prompt databases were made publicly available. [19] [20] The Personalized Image-Prompt (PIP) dataset, a generated image-text dataset that has been categorized by 3,115 users, has also been made available publicly in 2024. [21]

Text-to-text

According to a 2024 review, at least 29 distinct prompt engineering techniques have been published. [22]

Chain-of-thought

According to Google's CEO, Sundar Pichai, chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps [23] before giving a final answer. In 2022, the Brain team of Google also claimed that chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought. [23] [17] [24] Chain-of-thought techniques hypothetically allow large language models to overcome difficulties with some reasoning tasks that require logical thinking and multiple steps to solve, such as arithmetic or commonsense reasoning questions, according to announcements from Google and Amazon. [25] [26] [27]

For example, given the question, "Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?", Google claims that a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9." [17] When applied to PaLM, a 540 billion parameter language model, Google claims that CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, achieving state-of-the-art results at the time on the GSM8K mathematical reasoning benchmark. [17] According to Google, it is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability. [28] [29]

An example of a CoT prompting: [30]

   Q: {question}    A: Let's think step by step.

As originally proposed by Google, [17] each CoT prompt included a few Q&A examples. This made it a few-shot prompting technique. However, according to researchers at Google and the University of Tokyo, simply appending the words "Let's think step-by-step", [30] has also proven effective, which makes CoT a zero-shot prompting technique. OpenAI claims that this prompt allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples. [31]

Chain-of-symbol prompting

A research collaboration between Westlake University, the Chinese University of Hong Kong, and the University of Edinburgh has claimed that chain-of-symbol prompting in conjunction with CoT prompting assists LLMs with its difficulty of spatial reasoning in text. In other words, using arbitrary symbols such as ' / ' assists the LLM to interpret spacing in text. This is claimed to assist in reasoning and increases the performance of the LLM. [32]

Example: [32]

Input:   There are a set of bricks. The yellow brick C is on top of the brick E. The yellow brick D is on top of the brick A. The yellow brick E is on top of the brick D. The white brick A is on top of the brick B. For the brick B, the color is white. Now we have to get a specific brick. The bricks must now be grabbed from top to bottom, and if the lower brick is to be grabbed, the upper brick must be removed first. How to get brick D?  B/A/D/E/C C/E E/D D  Output:  So we get the result as C, E, D.

Few-shot learning

A prompt may include a few examples for a model to learn from, such as asking the model to complete "maison house, chat cat, chien" (the expected response being dog), [33] an approach called few-shot learning. [34]

Generated knowledge prompting

Generated knowledge prompting first prompts the model to generate relevant facts for completing the prompt, then proceeds to complete the prompt. [35] A hypothesis from a 2022 paper [35] stated that the performance of the results generally increases, but the performance may decline when more knowledge statements are introduced because some noisy knowledge is generated.

Example: [35]

   Generate some knowledge about the concepts in the input.    Input: {question}    Knowledge:

Least-to-most prompting

Least-to-most prompting prompts a model to first list the original problem's sub-problems to then solve each sub-problem in sequence, such that later sub-problems can be solved with the help of answers to previous sub-problems. The original problem is then manually appended as the final sub-problem to be solved. [36]

Example: [36]

   Input:    Q: {question}    A: Let's break down this problem:        1.

Self-consistency decoding

Self-consistency decoding [37] performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts. If the rollouts disagree by a lot, a human can be queried for the correct chain of thought. [38]

Complexity-based prompting

Complexity-based prompting [39] performs several CoT rollouts, then selects the rollouts with the longest chains of thought, then selects the most commonly reached conclusion out of those.

Self-refine

Self-refine [40] prompts the LLM to solve the problem, then prompts the LLM to critique its solution, then prompts the LLM to solve the problem again in view of the problem, solution, and critique. This process is repeated until stopped, either by running out of tokens, time, or by the LLM outputting a "stop" token.

Example critique: [40]

   I have some code. Give one suggestion to improve readability. Don't fix the code, just give a suggestion.    Code: {code}    Suggestion:

Example refinement:

   Code: {code}    Let's use this suggestion to improve the code.    Suggestion: {suggestion}    New Code:

Tree-of-thought

Tree-of-thought prompting [41] generalizes chain-of-thought by prompting the model to generate one or more "possible next steps", and then running the model on each of the possible next steps by breadth-first, beam, or some other method of tree search. [42] The LLM has additional modules that can converse the history of the problem-solving process to the LLM, which allows the system to 'backtrack steps' the problem-solving process.

Maieutic prompting

Maieutic prompting is similar to tree-of-thought. The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation, and so on. Inconsistent explanation trees are pruned or discarded. This improves performance on complex commonsense reasoning. [43]

Example: [43]

   Q: {question}    A: True, because
   Q: {question}    A: False, because

Directional-stimulus prompting

Directional-stimulus prompting [44] includes a hint or cue, such as desired keywords, to guide a language model toward the desired output.

Example: [44]

   Article: {article}    Keywords:
   Article: {article}    Q: Write a short summary of the article in 2-4 sentences that accurately incorporates the provided keywords.    Keywords: {keywords}    A:

Prompting to disclose uncertainty

By default, the output of language models may not contain estimates of uncertainty. The model may output text that appears confident, though the underlying token predictions have low likelihood scores. Large language models like GPT-4 can have accurately calibrated likelihood scores in their token predictions, [45] and so the model output uncertainty can be directly estimated by reading out the token prediction likelihood scores. But if one cannot access such scores (such as when one is accessing the model through a restrictive API), uncertainty can still be estimated. One simple method is to prompt the model to use words to estimate uncertainty. [46]

Prompting to estimate model sensitivity

Research consistently demonstrates that LLMs are highly sensitive to subtle variations in prompt formatting, structure, and linguistic properties. Some studies have shown up to 76 accuracy points across formatting changes in few-shot settings. [47] Linguistic features significantly influence prompt effectiveness—such as morphology, syntax, and lexico-semantic changes—which meaningfully enhance task performance across a variety of tasks. [7] [48] Clausal syntax, for example, improves consistency and reduces uncertainty in knowledge retrieval. [49] This sensitivity persists even with larger model sizes, additional few-shot examples, or instruction tuning.

To address sensitivity of models and make them more robust, several methods have been proposed. FormatSpread facilitates systematic analysis by evaluating a range of plausible prompt formats, offering a more comprehensive performance interval. [47] Similarly, PromptEval estimates performance distributions across diverse prompts, enabling robust metrics such as performance quantiles and accurate evaluations under constrained budgets. [50]

Automatic prompt generation

Retrieval-augmented generation

Two-phase process of document retrieval using dense embeddings and LLM for answer formulation RAG schema.svg
Two-phase process of document retrieval using dense embeddings and LLM for answer formulation

Retrieval-augmented generation (RAG) is a two-phase process involving document retrieval and answer generation by a large language model. The initial phase uses dense embeddings to retrieve documents. This retrieval can be based on a variety of database formats depending on the use case, such as a vector database, summary index, tree index, or keyword table index. [51] In response to a query, a document retriever selects the most relevant documents. This relevance is typically determined by first encoding both the query and the documents into vectors, then identifying documents whose vectors are closest in Euclidean distance to the query vector. Following document retrieval, the LLM generates an output that incorporates information from both the query and the retrieved documents. [52] RAG can also be used as a few-shot learner.

Graph retrieval-augmented generation

GraphRAG with a knowledge graph combining access patterns for unstructured, structured, and mixed data GraphRAG.svg
GraphRAG with a knowledge graph combining access patterns for unstructured, structured, and mixed data

GraphRAG [53] (coined by Microsoft Research) is a technique that extends RAG with the use of a knowledge graph (usually, LLM-generated) to allow the model to connect disparate pieces of information, synthesize insights, and holistically understand summarized semantic concepts over large data collections. It was shown to be effective on datasets like the Violent Incident Information from News Articles (VIINA). [54]

Earlier work showed the effectiveness of using a knowledge graph for question answering using text-to-query generation. [55] These techniques can be combined to search across both unstructured and structured data, providing expanded context, and improved ranking.

Using language models to generate prompts

Large language models (LLM) themselves can be used to compose prompts for large language models. [56] [57] The automatic prompt engineer algorithm uses one LLM to beam search over prompts for another LLM: [58] [59]

  • There are two LLMs. One is the target LLM, and another is the prompting LLM.
  • Prompting LLM is presented with example input-output pairs, and asked to generate instructions that could have caused a model following the instructions to generate the outputs, given the inputs.
  • Each of the generated instructions is used to prompt the target LLM, followed by each of the inputs. The log-probabilities of the outputs are computed and added. This is the score of the instruction.
  • The highest-scored instructions are given to the prompting LLM for further variations.
  • Repeat until some stopping criteria is reached, then output the highest-scored instructions.

CoT examples can be generated by LLM themselves. In "auto-CoT", [60] a library of questions are converted to vectors by a model such as BERT. The question vectors are clustered. Questions nearest to the centroids of each cluster are selected. An LLM does zero-shot CoT on each question. The resulting CoT examples are added to the dataset. When prompted with a new question, CoT examples to the nearest questions can be retrieved and added to the prompt.

In-context learning

Prompt engineering can possibly be further enabled by in-context learning, defined as a model's ability to temporarily learn from prompts. The ability for in-context learning is an emergent ability [61] of large language models. In-context learning itself is an emergent property of model scale, meaning breaks [62] in downstream scaling laws occur such that its efficacy increases at a different rate in larger models than in smaller models. [63] [17] In contrast to training and fine-tuning for each specific task, which are not temporary, what has been learnt during in-context learning is of a temporary nature. It does not carry the temporary contexts or biases, except the ones already present in the (pre)training dataset, from one conversation to the other. [64] This result of "mesa-optimization" [65] [66] within transformer layers is a form of meta-learning or "learning to learn". [67]

Text-to-image

Example of prompt engineering for text-to-image generation, with Fooocus Fooocus 2.5.5 screenshot showing the prompt section.webp
Example of prompt engineering for text-to-image generation, with Fooocus

In 2022, text-to-image models like DALL-E 2, Stable Diffusion, and Midjourney were released to the public. [68] These models take text prompts as input and use them to generate AI-generated images. Text-to-image models typically do not understand grammar and sentence structure in the same way as large language models, [69] thus may require a different set of prompting techniques.

Text-to-image models do not natively understand negation. The prompt "a party with no cake" is likely to produce an image including a cake. [69] As an alternative, negative prompts allow a user to indicate, in a separate prompt, which terms should not appear in the resulting image. [70] Techniques such as framing the normal prompt into a sequence-to-sequence language modeling problem can be used to automatically generate an output for the negative prompt. [71]

Algorithmically-generated landscape artwork of forest with Shinto shrine.png
Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for green trees.png
Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for round stones.png
Demonstration of the effect of negative prompts on images generated with Stable Diffusion
  • Top: no negative prompt
  • Centre: "green trees"
  • Bottom: "round stones, round rocks"

Prompt formats

A text-to-image prompt commonly includes a description of the subject of the art, the desired medium (such as digital painting or photography), style (such as hyperrealistic or pop-art), lighting (such as rim lighting or crepuscular rays), color, and texture. [72] Word order also affects the output of a text-to-image prompt. Words closer to the start of a prompt may be emphasized more heavily. [1]

The Midjourney documentation encourages short, descriptive prompts: instead of "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils", an effective prompt might be "Bright orange California poppies drawn with colored pencils". [69]

Artist styles

Some text-to-image models are capable of imitating the style of particular artists by name. For example, the phrase in the style of Greg Rutkowski has been used in Stable Diffusion and Midjourney prompts to generate images in the distinctive style of Polish digital artist Greg Rutkowski. [73] Famous artists such as Vincent van Gogh and Salvador Dalí have also been used for styling and testing. [74]

Non-text prompts

Some approaches augment or replace natural language text prompts with non-text input.

Textual inversion and embeddings

For text-to-image models, textual inversion [75] performs an optimization process to create a new word embedding based on a set of example images. This embedding vector acts as a "pseudo-word" which can be included in a prompt to express the content or style of the examples.

Image prompting

In 2023, Meta's AI research released Segment Anything, a computer vision model that can perform image segmentation by prompting. As an alternative to text prompts, Segment Anything can accept bounding boxes, segmentation masks, and foreground/background points. [76]

Using gradient descent to search for prompts

In "prefix-tuning", [77] "prompt tuning", or "soft prompting", [78] floating-point-valued vectors are searched directly by gradient descent to maximize the log-likelihood on outputs.

Formally, let be a set of soft prompt tokens (tunable embeddings), while and be the token embeddings of the input and output respectively. During training, the tunable embeddings, input, and output tokens are concatenated into a single sequence , and fed to the LLMs. The losses are computed over the tokens; the gradients are backpropagated to prompt-specific parameters: in prefix-tuning, they are parameters associated with the prompt tokens at each layer; in prompt tuning, they are merely the soft tokens added to the vocabulary. [79]

More formally, this is prompt tuning. Let an LLM be written as , where is a sequence of linguistic tokens, is the token-to-vector function, and is the rest of the model. In prefix-tuning, one provides a set of input-output pairs , and then use gradient descent to search for . In words, is the log-likelihood of outputting , if the model first encodes the input into the vector , then prepend the vector with the "prefix vector" , then apply .

For prefix tuning, it is similar, but the "prefix vector" is pre-appended to the hidden states in every layer of the model.

An earlier result [80] uses the same idea of gradient descent search, but is designed for masked language models like BERT, and searches only over token sequences, rather than numerical vectors. Formally, it searches for where is ranges over token sequences of a specified length.

Prompt injection

Prompt injection is a family of related computer security exploits carried out by getting a machine learning model (such as an LLM) which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator. [81] [82] [83]

See also

Related Research Articles

A language model is a probabilistic model of a natural language. In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.

Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Deep learning architecture for modelling sequential data

A transformer is a deep learning architecture that was developed by researchers at Google and is based on the multi-head attention mechanism, which was proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. It is notable for its dramatic improvement over previous state-of-the-art models, and as an early example of a large language model. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments.

<span class="mw-page-title-main">Attention (machine learning)</span> Machine learning technique

Attention is a machine learning method that determines the relative importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.

<span class="mw-page-title-main">Vision transformer</span> Machine learning model for vision processing

A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches, serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

Prompt injection is a family of related computer security exploits carried out by getting a machine learning model which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator.

<span class="mw-page-title-main">Hallucination (artificial intelligence)</span> Erroneous material generated by AI

In the field of artificial intelligence (AI), a hallucination or artificial hallucination is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where hallucination typically involves false percepts. However, there is a key difference: AI hallucination is associated with erroneous responses rather than perceptual experiences.

Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

<span class="mw-page-title-main">Reinforcement learning from human feedback</span> Machine learning technique

In machine learning, reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences. It involves training a reward model to represent preferences, which can then be used to train other models through reinforcement learning.

A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

<span class="mw-page-title-main">Llama (language model)</span> Large language model by Meta AI

Llama is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.3, released in December 2024.

<span class="mw-page-title-main">PaLM</span> Large language model developed by Google

PaLM is a 540 billion-parameter transformer-based large language model (LLM) developed by Google AI. Researchers also trained smaller versions of PaLM to test the effects of model scale.

In machine learning, the term stochastic parrot is a metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process. The term was coined by Emily M. Bender in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.

<span class="mw-page-title-main">Attention Is All You Need</span> 2017 research paper by Google

"Attention Is All You Need" is a 2017 landmark research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. It is considered a foundational paper in modern artificial intelligence, as the transformer approach has become the main architecture of large language models like those based on GPT. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique's potential for other tasks like question answering and what is now known as multimodal Generative AI.

T5 is a series of large language models developed by Google AI introduced in 2019. Like the original Transformer model, T5 models are encoder-decoder Transformers, where the encoder processes the input text, and the decoder generates the output text.

References

  1. 1 2 3 Diab, Mohamad; Herrera, Julian; Chernow, Bob (October 28, 2022). "Stable Diffusion Prompt Book" (PDF). Retrieved August 7, 2023. Prompt engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Think of it as the language you need to speak in order to tell an AI model what to draw.
  2. Ziegler, Albert; Berryman, John (July 17, 2023). "A developer's guide to prompt engineering and LLMs". The GitHub Blog. Prompt engineering is the art of communicating with a generative AI model.
  3. Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilya (2019). "Language Models are Unsupervised Multitask Learners" (PDF). OpenAI. We demonstrate language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification
  4. "Introducing ChatGPT". OpenAI Blog. November 30, 2022. Retrieved August 16, 2023. what is the fermat's little theorem
  5. 1 2 Robinson, Reid (August 3, 2023). "How to write an effective GPT-3 or GPT-4 prompt". Zapier. Retrieved August 14, 2023. "Basic prompt: 'Write a poem about leaves falling.' Better prompt: 'Write a poem in the style of Edgar Allan Poe about leaves falling.'
  6. Gouws-Stewart, Natasha (June 16, 2023). "The ultimate guide to prompt engineering your GPT-3.5-Turbo model". masterofcode.com.
  7. 1 2 Wahle, Jan Philip; Ruas, Terry; Xu, Yang; Gipp, Bela (2024). "Paraphrase Types Elicit Prompt Engineering Capabilities". In Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (eds.). Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics. pp. 11004–11033. arXiv: 2406.19898 . doi:10.18653/v1/2024.emnlp-main.617.
  8. Greenberg, J., Laura (May 31, 2023). "How to Prime and Prompt ChatGPT for More Reliable Contract Drafting Support". contractnerds.com. Retrieved July 24, 2023.
  9. "GPT Best Practices". OpenAI. Retrieved August 16, 2023.
  10. Heaven, Will Douglas (April 6, 2022). "This horse-riding astronaut is a milestone on AI's long road towards understanding". MIT Technology Review. Retrieved August 14, 2023.
  11. Wiggers, Kyle (June 12, 2023). "Meta open sources an AI-powered music generator". TechCrunch. Retrieved August 15, 2023. Next, I gave a more complicated prompt to attempt to throw MusicGen for a loop: "Lo-fi slow BPM electro chill with organic samples."
  12. "How to Write AI Photoshoot Prompts: A Guide for Better Product Photos". claid.ai. June 12, 2023. Retrieved June 12, 2023.
  13. McCann, Bryan; Shirish, Nitish; Xiong, Caiming; Socher, Richard (2018). "The Natural Language Decathlon: Multitask Learning as Question Answering". arXiv: 1806.08730 [cs.CL].
  14. Knoth, Nils; Tolzin, Antonia; Janson, Andreas; Leimeister, Jan Marco (June 1, 2024). "AI literacy and its implications for prompt engineering strategies". Computers and Education: Artificial Intelligence. 6: 100225. doi: 10.1016/j.caeai.2024.100225 . ISSN   2666-920X.
  15. Sanh, Victor; et al. (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv: 2110.08207 [cs.LG].
  16. Bach, Stephen H.; Sanh, Victor; Yong, Zheng-Xin; Webson, Albert; Raffel, Colin; Nayak, Nihal V.; Sharma, Abheesht; Kim, Taewoon; M Saiful Bari; Fevry, Thibault; Alyafeai, Zaid; Dey, Manan; Santilli, Andrea; Sun, Zhiqing; Ben-David, Srulik; Xu, Canwen; Chhablani, Gunjan; Wang, Han; Jason Alan Fries; Al-shaibani, Maged S.; Sharma, Shanya; Thakker, Urmish; Almubarak, Khalid; Tang, Xiangru; Radev, Dragomir; Mike Tian-Jian Jiang; Rush, Alexander M. (2022). "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". arXiv: 2202.01279 [cs.LG].
  17. 1 2 3 4 5 6 Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (October 31, 2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS 2022). Vol. 35. arXiv: 2201.11903 .
  18. Wei, Jason; Zhou (May 11, 2022). "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved March 10, 2023.
  19. Chen, Brian X. (June 23, 2023). "How to Turn Your Chatbot Into a Life Coach". The New York Times.
  20. Chen, Brian X. (May 25, 2023). "Get the Best From ChatGPT With These Golden Prompts" . The New York Times. ISSN   0362-4331 . Retrieved August 16, 2023.
  21. Chen, Zijie; Zhang, Lichao; Weng, Fangsheng; Pan, Lili; Lan, Zhenzhong (June 16, 2024). "Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting". 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE: 7727–7736. arXiv: 2310.08129 . doi:10.1109/cvpr52733.2024.00738.
  22. Sahoo, Pranab; Singh, Ayush Kumar; Saha, Sriparna; Jain, Vinija; Mondal, Samrat; Chadha, Aman (February 5, 2024). "A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications". arXiv: 2402.07927 [cs.AI].
  23. 1 2 McAuliffe, Zachary (March 11, 2022). "Google's Latest AI Model Can Be Taught How to Solve Problems". CNET.
  24. Sharan Narang and Aakanksha Chowdhery (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance".
  25. Dang, Ekta (February 8, 2023). "Harnessing the power of GPT-3 in scientific research". VentureBeat. Retrieved March 10, 2023.
  26. Montti, Roger (May 13, 2022). "Google's Chain of Thought Prompting Can Boost Today's Best Algorithms". Search Engine Journal. Retrieved March 10, 2023.
  27. Ray, Tiernan. "Amazon's Alexa scientists demonstrate bigger AI isn't always better". ZDNET. Retrieved March 10, 2023.
  28. Chung, Hyung Won; Hou, Le; Longpre, Shayne; Zoph, Barret; Tay, Yi; Fedus, William; Li, Yunxuan; Wang, Xuezhi; Dehghani, Mostafa; Brahma, Siddhartha; Webson, Albert; Gu, Shixiang Shane; Dai, Zhuyun; Suzgun, Mirac; Chen, Xinyun; Chowdhery, Aakanksha; Castro-Ros, Alex; Pellat, Marie; Robinson, Kevin; Valter, Dasha; Narang, Sharan; Mishra, Gaurav; Yu, Adams; Zhao, Vincent; Huang, Yanping; Dai, Andrew; Yu, Hongkun; Petrov, Slav; Chi, Ed H.; Dean, Jeff; Devlin, Jacob; Roberts, Adam; Zhou, Denny; Le, Quoc V.; Wei, Jason (2022). "Scaling Instruction-Finetuned Language Models". arXiv: 2210.11416 [cs.LG].
  29. Wei, Jason; Tay, Yi (November 29, 2022). "Better Language Models Without Massive Compute". ai.googleblog.com. Retrieved March 10, 2023.
  30. 1 2 Kojima, Takeshi; Shixiang Shane Gu; Reid, Machel; Matsuo, Yutaka; Iwasawa, Yusuke (2022). "Large Language Models are Zero-Shot Reasoners". arXiv: 2205.11916 [cs.CL].
  31. Dickson, Ben (August 30, 2022). "LLMs have not learned our language — we're trying to learn theirs". VentureBeat. Retrieved March 10, 2023.
  32. 1 2 Hu, Hanxu; Lu, Hongyuan; Zhang, Huajian; Song, Yun-Ze; Lam, Wai; Zhang, Yue (October 3, 2023). "Chain-of-Symbol Prompting Elicits Planning in Large Language Models". arXiv: 2305.10276 [cs.CL].
  33. Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv: 2208.01066 [cs.CL].
  34. Brown, Tom; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared D.; Dhariwal, Prafulla; Neelakantan, Arvind (2020). "Language models are few-shot learners". Advances in Neural Information Processing Systems. 33: 1877–1901. arXiv: 2005.14165 .
  35. 1 2 3 Liu, Jiacheng; Liu, Alisa; Lu, Ximing; Welleck, Sean; West, Peter; Le Bras, Ronan; Choi, Yejin; Hajishirzi, Hannaneh (May 2022). "Generated Knowledge Prompting for Commonsense Reasoning". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics: 3154–3169. arXiv: 2110.08387 . doi: 10.18653/v1/2022.acl-long.225 . S2CID   239016123.
  36. 1 2 Zhou, Denny; Schärli, Nathanael; Hou, Le; Wei, Jason; Scales, Nathan; Wang, Xuezhi; Schuurmans, Dale; Cui, Claire; Bousquet, Olivier; Le, Quoc; Chi, Ed (May 1, 2022). "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". arXiv: 2205.10625 [cs.AI]. ...least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence.
  37. Wang, Xuezhi; Wei, Jason; Schuurmans, Dale; Le, Quoc; Chi, Ed; Narang, Sharan; Chowdhery, Aakanksha; Zhou, Denny (March 1, 2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models". arXiv: 2203.11171 [cs.CL].
  38. Diao, Shizhe; Wang, Pengcheng; Lin, Yong; Zhang, Tong (February 1, 2023). "Active Prompting with Chain-of-Thought for Large Language Models". arXiv: 2302.12246 [cs.CL].
  39. Fu, Yao; Peng, Hao; Sabharwal, Ashish; Clark, Peter; Khot, Tushar (October 1, 2022). "Complexity-Based Prompting for Multi-Step Reasoning". arXiv: 2210.00720 [cs.CL].
  40. 1 2 Madaan, Aman; Tandon, Niket; Gupta, Prakhar; Hallinan, Skyler; Gao, Luyu; Wiegreffe, Sarah; Alon, Uri; Dziri, Nouha; Prabhumoye, Shrimai; Yang, Yiming; Gupta, Shashank; Prasad Majumder, Bodhisattwa; Hermann, Katherine; Welleck, Sean; Yazdanbakhsh, Amir (March 1, 2023). "Self-Refine: Iterative Refinement with Self-Feedback". arXiv: 2303.17651 [cs.CL].
  41. Long, Jieyi (May 15, 2023). "Large Language Model Guided Tree-of-Thought". arXiv: 2305.08291 [cs.AI].
  42. Yao, Shunyu; Yu, Dian; Zhao, Jeffrey; Shafran, Izhak; Griffiths, Thomas L.; Cao, Yuan; Narasimhan, Karthik (May 17, 2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models". arXiv: 2305.10601 [cs.CL].
  43. 1 2 Jung, Jaehun; Qin, Lianhui; Welleck, Sean; Brahman, Faeze; Bhagavatula, Chandra; Le Bras, Ronan; Choi, Yejin (2022). "Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations". arXiv: 2205.11822 [cs.CL].
  44. 1 2 Li, Zekun; Peng, Baolin; He, Pengcheng; Galley, Michel; Gao, Jianfeng; Yan, Xifeng (2023). "Guiding Large Language Models via Directional Stimulus Prompting". arXiv: 2302.11520 [cs.CL]. The directional stimulus serves as hints or cues for each input query to guide LLMs toward the desired output, such as keywords that the desired summary should include for summarization.
  45. OpenAI and over 200 people (March 27, 2023). "GPT-4 Technical Report". arXiv: 2303.08774 [cs.CL].[See Figure 8.]
  46. Eliot, Lance (August 18, 2023). "Latest Prompt Engineering Technique Aims To Get Certainty And Uncertainty Of Generative AI Directly On The Table And Out In The Open". Forbes . Retrieved August 31, 2024. If you explicitly indicate in your prompt that you want the generative AI to emit a certainty or uncertainty qualification then you will almost certainly get such an indication.
  47. 1 2 Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (July 1, 2024). "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting". arXiv: 2310.11324 [cs.CL].
  48. Leidinger, Alina; van Rooij, Robert; Shutova, Ekaterina (2023). Bouamor, Houda; Pino, Juan; Bali, Kalika (eds.). "The language of prompting: What linguistic properties make a prompt successful?". Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics: 9210–9232. arXiv: 2311.01967 . doi:10.18653/v1/2023.findings-emnlp.618.
  49. Linzbach, Stephan; Dimitrov, Dimitar; Kallmeyer, Laura; Evang, Kilian; Jabeen, Hajira; Dietze, Stefan (June 2024). "Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models". In Duh, Kevin; Gomez, Helena; Bethard, Steven (eds.). Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Mexico City, Mexico: Association for Computational Linguistics. pp. 3645–3655. arXiv: 2404.01992 . doi:10.18653/v1/2024.naacl-long.201.
  50. Polo, Felipe Maia; Xu, Ronald; Weber, Lucas; Silva, Mírian; Bhardwaj, Onkar; Choshen, Leshem; de Oliveira, Allysson Flavio Melo; Sun, Yuekai; Yurochkin, Mikhail (October 30, 2024). "Efficient multi-prompt evaluation of LLMs". arXiv: 2405.17202 [cs.CL].
  51. "How Each Index Works - LlamaIndex 🦙 v0.10.17". docs.llamaindex.ai. Retrieved April 8, 2024.
  52. Lewis, Patrick; Perez, Ethan; Piktus, Aleksandra; Petroni, Fabio; Karpukhin, Vladimir; Goyal, Naman; Küttler, Heinrich; Lewis, Mike; Yih, Wen-tau; Rocktäschel, Tim; Riedel, Sebastian; Kiela, Douwe (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 9459–9474. arXiv: 2005.11401 .
  53. Larson, Jonathan; Truitt, Steven (February 13, 2024), GraphRAG: Unlocking LLM discovery on narrative private data, Microsoft
  54. Edge, Darren; Trinh, Ha; Cheng, Newman; Bradley, Joshua; Chao, Alex; Mody, Apurva; Truitt, Steven; Larson, Jonathan (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization". arXiv: 2404.16130 [cs.CL].
  55. Sequeda, Juan; Allemang, Dean; Jacob, Bryon (2023). "A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases". arXiv: 2311.07509 [cs.AI].
  56. Singh, Chandan; Morris, John; Aneja, Jyoti; Rush, Alexander; Gao, Jianfeng (October 4, 2022). "Explaining Patterns in Data with Language Models via Interpretable Autoprompting". arXiv: 2210.01848 [cs.LG].
  57. Fernando, Chrisantha; Banarse, Dylan Sunil; Michalewski, Henryk; Osindero, Simon; Rocktäschel, Tim (February 11, 2024), Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution, OpenReview, arXiv: 2309.16797 . Paper submitted and rejected for The Twelfth International Conference on Learning Representations.
  58. Zhou, Yongchao; Ioan Muresanu, Andrei; Han, Ziwen; Paster, Keiran; Pitis, Silviu; Chan, Harris; Ba, Jimmy (November 1, 2022). "Large Language Models Are Human-Level Prompt Engineers". arXiv: 2211.01910 [cs.LG].
  59. Pryzant, Reid; Iter, Dan; Li, Jerry; Lee, Yin Tat; Zhu, Chenguang; Zeng, Michael (2023). "Automatic Prompt Optimization with "Gradient Descent" and Beam Search". Microsoft Azure AI. arXiv: 2305.03495 .
  60. Zhang, Zhuosheng; Zhang, Aston; Li, Mu; Smola, Alex (October 1, 2022). "Automatic Chain of Thought Prompting in Large Language Models". arXiv: 2210.03493 [cs.CL].
  61. Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (August 31, 2022). "Emergent Abilities of Large Language Models". arXiv: 2206.07682 [cs.CL]. In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters... The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random
  62. Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.
  63. Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (August 31, 2022). "Emergent Abilities of Large Language Models". arXiv: 2206.07682 [cs.CL].
  64. Musser, George. "How AI Knows Things No One Told It". Scientific American . Retrieved May 17, 2023. By the time you type a query into ChatGPT, the network should be fixed; unlike humans, it should not continue to learn. So it came as a surprise that LLMs do, in fact, learn from their users' prompts—an ability known as in-context learning.
  65. Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv: 2212.07677 [cs.LG]. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass
  66. "Mesa-Optimization". May 31, 2019. Retrieved May 17, 2023. Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer.
  67. Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv: 2208.01066 [cs.CL]. Training a model to perform in-context learning can be viewed as an instance of the more general learning-to-learn or meta-learning paradigm
  68. Monge, Jim Clyde (August 25, 2022). "Dall-E2 VS Stable Diffusion: Same Prompt, Different Results". MLearning.ai. Retrieved August 31, 2022.
  69. 1 2 3 "Prompts". docs.midjourney.com. Retrieved August 14, 2023.
  70. Max Woolf (November 28, 2022). "Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results" . Retrieved August 14, 2023.
  71. Goldblum, R.; Pillarisetty, R.; Dauphinee, M. J.; Talal, N. (1975). "Acceleration of autoimmunity in NZB/NZW F1 mice by graft-versus-host disease". Clinical and Experimental Immunology. 19 (2): 377–385. ISSN   0009-9104. PMC   1538084 . PMID   2403.
  72. "Stable Diffusion prompt: a definitive guide". May 14, 2023. Retrieved August 14, 2023.
  73. Heikkilä, Melissa (September 16, 2022). "This Artist Is Dominating AI-Generated Art and He's Not Happy About It". MIT Technology Review. Retrieved August 14, 2023.
  74. Solomon, Tessa (August 28, 2024). "The AI-Powered Ask Dalí and Hello Vincent Installations Raise Uncomfortable Questions about Ventriloquizing the Dead". ARTnews.com. Retrieved January 10, 2025.
  75. Gal, Rinon; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen-Or, Daniel (2022). "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion". arXiv: 2208.01618 [cs.CV]. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model.
  76. Kirillov, Alexander; Mintun, Eric; Ravi, Nikhila; Mao, Hanzi; Rolland, Chloe; Gustafson, Laura; Xiao, Tete; Whitehead, Spencer; Berg, Alexander C.; Lo, Wan-Yen; Dollár, Piotr; Girshick, Ross (April 1, 2023). "Segment Anything". arXiv: 2304.02643 [cs.CV].
  77. Li, Xiang Lisa; Liang, Percy (2021). "Prefix-Tuning: Optimizing Continuous Prompts for Generation". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 4582–4597. doi:10.18653/V1/2021.ACL-LONG.353. S2CID   230433941. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning... Prefix-tuning draws inspiration from prompting
  78. Lester, Brian; Al-Rfou, Rami; Constant, Noah (2021). "The Power of Scale for Parameter-Efficient Prompt Tuning". Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 3045–3059. arXiv: 2104.08691 . doi:10.18653/V1/2021.EMNLP-MAIN.243. S2CID   233296808. In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts"...Unlike the discrete text prompts used by GPT-3, soft prompts are learned through back-propagation
  79. Sun, Simeng; Liu, Yang; Iter, Dan; Zhu, Chenguang; Iyyer, Mohit (2023). "How Does In-Context Learning Help Prompt Tuning?". arXiv: 2302.11521 [cs.CL].
  80. Shin, Taylor; Razeghi, Yasaman; Logan IV, Robert L.; Wallace, Eric; Singh, Sameer (November 2020). "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts". Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics. pp. 4222–4235. doi: 10.18653/v1/2020.emnlp-main.346 . S2CID   226222232.
  81. Willison, Simon (September 12, 2022). "Prompt injection attacks against GPT-3". simonwillison.net. Retrieved February 9, 2023.
  82. Papp, Donald (September 17, 2022). "What's Old Is New Again: GPT-3 Prompt Injection Attack Affects AI". Hackaday. Retrieved February 9, 2023.
  83. Vigliarolo, Brandon (September 19, 2022). "GPT-3 'prompt injection' attack causes bot bad manners". www.theregister.com. Retrieved February 9, 2023.