LangChain

Last updated
LangChain
Developer(s) Harrison Chase
Initial releaseOctober 2022
Stable release
0.1.16 [1] / 11 April 2024;5 months ago (11 April 2024)
Repository github.com/langchain-ai/langchain
Written in Python and JavaScript
Type Software framework for large language model application development
License MIT License
Website LangChain.com

LangChain is a software framework that helps facilitate the integration of large language models (LLMs) into applications. As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis. [2]

Contents

History

LangChain was launched in October 2022 as an open source project by Harrison Chase, while working at machine learning startup Robust Intelligence. The project quickly garnered popularity, [3] with improvements from hundreds of contributors on GitHub, trending discussions on Twitter, lively activity on the project's Discord server, many YouTube tutorials, and meetups in San Francisco and London. In April 2023, LangChain had incorporated and the new startup raised over $20 million in funding at a valuation of at least $200 million from venture firm Sequoia Capital, a week after announcing a $10 million seed investment from Benchmark. [4] [5]

In the third quarter of 2023, the LangChain Expression Language (LCEL) was introduced, which provides a declarative way to define chains of actions. [6] [7]

In October 2023 LangChain introduced LangServe, a deployment tool to host LCEL code as a production-ready API. [8]

Capabilities

LangChain's developers highlight the framework's applicability to use-cases including chatbots, [9] retrieval-augmented generation, [10] document summarization, [11] and synthetic data generation. [12]

As of March 2023, LangChain included integrations with systems including Amazon, Google, and Microsoft Azure cloud storage; [13] API wrappers for news, movie information, and weather; Bash for summarization, syntax and semantics checking, and execution of shell scripts; multiple web scraping subsystems and templates; few-shot learning prompt generation support; finding and summarizing "todo" tasks in code; Google Drive documents, spreadsheets, and presentations summarization, extraction, and creation; Google Search and Microsoft Bing web search; [14] OpenAI, Anthropic, and Hugging Face language models; iFixit repair guides and wikis search and summarization; MapReduce for question answering, combining documents, and question generation; N-gram overlap scoring; PyPDF, pdfminer, fitz, and pymupdf for PDF file text extraction and manipulation; Python and JavaScript code generation, analysis, and debugging; Milvus vector database [15] to store and retrieve vector embeddings; Weaviate vector database [16] to cache embedding and data objects; Redis cache database storage; Python RequestsWrapper and other methods for API requests; SQL and NoSQL databases including JSON support; Streamlit, including for logging; text mapping for k-nearest neighbors search; time zone conversion and calendar operations; tracing and recording stack symbols in threaded and asynchronous subprocess runs; and the Wolfram Alpha website and SDK. [17] As of April 2023, it can read from more than 50 document types and data sources. [18]

LangChain tools

Tool nameAccount required?API key required?LicencingFeaturesDocumentation URL
Alpha VantageNoYesProprietaryFinancial data, analyticshttps://python.langchain.com/docs/integrations/tools/alpha_vantage
ApifyNoYesCommercialWeb scraping, automationhttps://python.langchain.com/docs/integrations/providers/apify/
ArXivNoNoOpen SourceScientific papers, researchhttps://python.langchain.com/docs/integrations/tools/arxiv
AWS LambdaYesYesProprietaryServerless computinghttps://python.langchain.com/docs/integrations/tools/awslambda
BashNoNoOpen sourceShell environment accesshttps://python.langchain.com/docs/integrations/tools/bash
Bearly Code InterpreterNoYesCommercialRemote Python code executionhttps://python.langchain.com/docs/integrations/tools/bearly
Bing SearchNoYesProprietarySearch enginehttps://python.langchain.com/docs/integrations/tools/bing_search
Brave SearchNoNoOpen sourcePrivacy-focused searchhttps://python.langchain.com/docs/integrations/tools/brave_search
ChatGPT PluginsNoYesProprietaryChatGPThttps://python.langchain.com/docs/integrations/tools/chatgpt_plugins
ConneryNoYesCommercialAPI actionshttps://python.langchain.com/docs/integrations/tools/connery
Dall-E Image GeneratorNoYesProprietaryText-to-image generationhttps://python.langchain.com/docs/integrations/tools/dalle_image_generator
DataForSEONoYesCommercialSEO data, analyticshttps://python.langchain.com/docs/integrations/tools/dataforseo
DuckDuckGo SearchNoNoOpen sourcePrivacy-focused searchhttps://python.langchain.com/docs/integrations/tools/ddg
E2B Data AnalysisNoNoOpen sourceData analysishttps://python.langchain.com/docs/integrations/tools/e2b_data_analysis
Eden AINoYesCommercialAI tools, APIshttps://python.langchain.com/docs/integrations/tools/edenai_tools
Eleven Labs Text2SpeechNoYesCommercialText-to-speechhttps://python.langchain.com/docs/integrations/tools/eleven_labs_tts
Exa SearchNoYesCommercialWeb searchhttps://python.langchain.com/docs/integrations/tools/exa_search
File SystemNoNoOpen sourceFile system interactionhttps://python.langchain.com/docs/integrations/tools/filesystem
Golden QueryNoYesCommercialNatural language querieshttps://python.langchain.com/docs/integrations/tools/golden_query
Google Cloud Text-to-SpeechYesYesProprietaryText-to-speechhttps://python.langchain.com/docs/integrations/tools/google_cloud_texttospeech
Google DriveYesYesProprietaryGoogle Drive accesshttps://python.langchain.com/docs/integrations/tools/google_drive
Google FinanceYesYesProprietaryFinancial datahttps://python.langchain.com/docs/integrations/tools/google_finance
Google JobsYesYesProprietaryJob searchhttps://python.langchain.com/docs/integrations/tools/google_jobs
Google LensYesYesProprietaryVisual search, recognitionhttps://python.langchain.com/docs/integrations/tools/google_lens
Google PlacesYesYesProprietaryLocation-based serviceshttps://python.langchain.com/docs/integrations/tools/google_places
Google ScholarYesYesProprietaryScholarly article searchhttps://python.langchain.com/docs/integrations/tools/google_scholar
Google SearchYesYesProprietarySearch enginehttps://python.langchain.com/docs/integrations/tools/google_search
Google SerperNoYesCommercialSERP scrapinghttps://python.langchain.com/docs/integrations/tools/google_serper
Google TrendsYesYesProprietaryTrend datahttps://python.langchain.com/docs/integrations/tools/google_trends
GradioNoNoOpen sourceMachine learning UIshttps://python.langchain.com/docs/integrations/tools/gradio_tools
GraphQLNoNoOpen sourceAPI querieshttps://python.langchain.com/docs/integrations/tools/graphql
HuggingFace HubNoNoOpen sourceHugging Face models, datasetshttps://python.langchain.com/docs/integrations/tools/huggingface_tools
Human as a toolNoNoN/AHuman inputhttps://python.langchain.com/docs/integrations/tools/human_tools
IFTTT WebHooksNoYesCommercialWeb service automationhttps://python.langchain.com/docs/integrations/tools/ifttt
Ionic ShoppingNoYesCommercialShoppinghttps://python.langchain.com/docs/integrations/tools/ionic_shopping
Lemon AgentNoYesCommercialLemon AI interactionhttps://python.langchain.com/docs/integrations/tools/lemonai
MemorizeNoNoOpen sourceFine-tune LLM to memorize information using unsupervised learninghttps://python.langchain.com/docs/integrations/tools/memorize
NucliaNoYesCommercialIndexing of unstructured datahttps://python.langchain.com/docs/integrations/tools/nuclia
OpenWeatherMapNoYesCommercialWeather datahttps://python.langchain.com/docs/integrations/tools/openweathermap
Polygon Stock Market APINoYesCommercialStock market datahttps://python.langchain.com/docs/integrations/tools/polygon
PubMedNoNoOpen sourceBiomedical literaturehttps://python.langchain.com/docs/integrations/tools/pubmed
Python REPLNoNoOpen sourcePython shellhttps://python.langchain.com/docs/integrations/tools/python
Reddit SearchNoNoOpen sourceReddit searchhttps://python.langchain.com/docs/integrations/tools/reddit_search
RequestsNoNoOpen sourceHTTP requestshttps://python.langchain.com/docs/integrations/tools/requests
SceneXplainNoNoOpen sourceModel explanationshttps://python.langchain.com/docs/integrations/tools/sceneXplain
SearchNoNoOpen sourceQuery various search serviceshttps://python.langchain.com/docs/integrations/tools/search_tools
SearchApiNoYesCommercialQuery various search serviceshttps://python.langchain.com/docs/integrations/tools/searchapi
SearxNGNoNoOpen sourcePrivacy-focused searchhttps://python.langchain.com/docs/integrations/tools/searx_search
Semantic Scholar APINoNoOpen sourceAcademic paper searchhttps://python.langchain.com/docs/integrations/tools/semanticscholar
SerpAPINoYesCommercialSearch engine results page scrapinghttps://python.langchain.com/docs/integrations/tools/serpapi
StackExchangeNoNoOpen sourceStack Exchange accesshttps://python.langchain.com/docs/integrations/tools/stackexchange
Tavily SearchNoYesCommercialQuestion answeringhttps://python.langchain.com/docs/integrations/tools/tavily_search
TwilioNoYesCommercialCommunication APIshttps://python.langchain.com/docs/integrations/tools/twilio
WikidataNoNoOpen sourceStructured data accesshttps://python.langchain.com/docs/integrations/tools/wikidata
WikipediaNoNoOpen sourceWikipedia accesshttps://python.langchain.com/docs/integrations/tools/wikipedia
Wolfram AlphaNoYesProprietaryComputational knowledgehttps://python.langchain.com/docs/integrations/tools/wolfram_alpha
Yahoo Finance NewsNoYesCommercialFinancial newshttps://python.langchain.com/docs/integrations/tools/yahoo_finance_news
YoutubeNoYesCommercialYouTube accesshttps://python.langchain.com/docs/integrations/tools/youtube
Zapier Natural Language ActionsNoYesCommercialWorkflow automationhttps://python.langchain.com/docs/integrations/tools/zapier

Related Research Articles

cairo (graphics) Vector graphics-based software library

Cairo is an open-source graphics library that provides a vector graphics-based, device-independent API for software developers. It provides primitives for two-dimensional drawing across a number of different backends. Cairo uses hardware acceleration when available.

This is a comparison of notable web frameworks, software used to build and deploy web applications.

The Dynamic Language Runtime (DLR) from Microsoft runs on top of the Common Language Runtime (CLR) and provides computer language services for dynamic languages. These services include:

Benevolent dictator for life (BDFL) is a title given to a small number of open-source software development leaders, typically project founders who retain the final say in disputes or arguments within the community. The phrase originated in 1995 with reference to Guido van Rossum, creator of the Python programming language.

Bitbucket is a Git-based source code repository hosting service owned by Atlassian. Bitbucket offers both commercial plans and free accounts with an unlimited number of private repositories.

<span class="mw-page-title-main">Flask (web framework)</span> Python web framework

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. However, Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers, form validation, upload handling, various open authentication technologies and several common framework related tools.

<span class="mw-page-title-main">PyCharm</span> Python IDE

PyCharm is an integrated development environment (IDE) used for programming in Python. It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems, and supports web development with Django. PyCharm is developed by the Czech company JetBrains and built on their IntelliJ platform.

<span class="mw-page-title-main">GraalVM</span> Virtual machine software

GraalVM is a Java Development Kit (JDK) written in Java. The open-source distribution of GraalVM is based on OpenJDK, and the enterprise distribution is based on Oracle JDK. As well as just-in-time (JIT) compilation, GraalVM can compile a Java application ahead of time. This allows for faster initialization, greater runtime performance, and decreased resource consumption, but the resulting executable can only run on the platform it was compiled for.

<span class="mw-page-title-main">KDE Frameworks</span> Collection of libraries and software frameworks for the Qt framework

KDE Frameworks is a collection of libraries and software frameworks readily available to any Qt-based software stacks or applications on multiple operating systems. Featuring frequently needed functionality solutions like hardware integration, file format support, additional graphical control elements, plotting functions, and spell checking, the collection serves as the technological foundation for KDE Plasma and KDE Gear. It is distributed under the GNU Lesser General Public License (LGPL).

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.

OpenAI Codex is an artificial intelligence model developed by OpenAI. It parses natural language and generates code in response. It powers GitHub Copilot, a programming autocompletion tool for select IDEs, like Visual Studio Code and Neovim. Codex is a descendant of OpenAI's GPT-3 model, fine-tuned for use in programming applications.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

deepset German natural language processing startup

deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.

Mojo is a programming language in the Python family that is currently under development. It is available both in browsers via Jupyter notebooks, and locally on Linux and macOS. Mojo aims to combine the usability of higher level programming languages, specifically Python, with the performance of system programming languages like C++, Rust, and Zig. The Mojo compiler is currently closed source with an open source standard library, although Modular, the company behind Mojo, has stated their intent to eventually open source the Mojo programming language itself as it matures.

A vector database, vector store or vector search engine is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

Tabnine is an artificial intelligence (AI) coding assistant developed by Tabnine, which was founded by Dror Weiss and Professor Eran Yahav in Tel Aviv, Israel, in 2013. Initially established under the name Codota, the company underwent a rebranding in May 2021 following the release of the company’s first large language model based AI coding assistant, adopting the name Tabnine.

References

  1. "Release 0.1.16". 11 April 2024. Retrieved 23 April 2024.
  2. Buniatyan, Davit (2023). "Code Understanding Using LangChain". Activeloop.
  3. Auffarth, Ben (2023). Generative AI with LangChain. Birmingham: Packt Publishing. p. 83. ISBN   9781835083468.
  4. Palazzolo, Stephanie (2023-04-13). "AI startup LangChain taps Sequoia to lead funding round at a valuation of at least $200 million". Business Insider. Archived from the original on 2023-04-18. Retrieved 2023-04-18.
  5. Griffith, Erin; Metz, Cade (2023-03-14). "'Let 1,000 Flowers Bloom': A.I. Funding Frenzy Escalates". The New York Times. ISSN   0362-4331. Archived from the original on 2023-04-18. Retrieved 2023-04-18.
  6. Mansurova, Mariya (2023-10-30). "Topic Modelling in production: Leveraging LangChain to move from ad-hoc Jupyter Notebooks to production modular service". towardsdatascience.com. Retrieved 2024-07-08.
  7. "LangChain Expression Language". langchain.dev. 2023-08-01. Retrieved 2024-07-08.
  8. "Introducing LangServe, the best way to deploy your LangChains". LangChain Blog. 2023-10-12. Retrieved 2023-10-17.
  9. "Chatbots | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  10. "Retrieval-augmented generation (RAG) | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  11. "Summarization | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  12. "Synthetic data generation | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  13. "Azure Cognitive Search and LangChain: A Seamless Integration for Enhanced Vector Search Capabilities". TECHCOMMUNITY.MICROSOFT.COM. Retrieved 2024-08-31.
  14. "Best Alternative AI Content Strategies and LLM Frameworks". Medium. 2024-08-31. Retrieved 2024-08-31.
  15. "Milvus — LangChain". python.langchain.com. Retrieved 2023-10-29.
  16. "Weaviate". python.langchain.com. Retrieved 2024-01-17.
  17. Hug, Daniel Patrick (2023-03-08). "Hierarchical topic tree of LangChain's integrations" (PDF). GitHub. Archived from the original on 2023-04-29. Retrieved 2023-04-18.
  18. "Document Loaders — LangChain 0.0.142". python.langchain.com. Archived from the original on 2023-04-18. Retrieved 2023-04-18.