LangChain

Last updated
LangChain
Developer(s) Harrison Chase
Initial releaseOctober 2022
Stable release
0.1.8 [1] / 19 February 2024;31 days ago (19 February 2024)
Repository github.com/langchain-ai/langchain
Written in Python and JavaScript
Type Software framework for large language model application development
License MIT License
Website LangChain.com

LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis. [2]

Contents

History

LangChain was launched in October 2022 as an open source project by Harrison Chase, while working at machine learning startup Robust Intelligence. The project quickly garnered popularity, [3] with improvements from hundreds of contributors on GitHub, trending discussions on Twitter, lively activity on the project's Discord server, many YouTube tutorials, and meetups in San Francisco and London. In April 2023, LangChain had incorporated and the new startup raised over $20 million in funding at a valuation of at least $200 million from venture firm Sequoia Capital, a week after announcing a $10 million seed investment from Benchmark. [4] [5]

In October 2023 LangChain introduced LangServe, a deployment tool designed to facilitate the transition from LCEL (LangChain Expression Language) prototypes to production-ready applications. [6]

Capabilities

LangChain's developers highlight the framework's applicability to use-cases including chatbots, [7] retrieval-augmented generation, [8] document summarization, [9] and synthetic data generation. [10]

As of March 2023, LangChain included integrations with systems including Amazon, Google, and Microsoft Azure cloud storage; API wrappers for news, movie information, and weather; Bash for summarization, syntax and semantics checking, and execution of shell scripts; multiple web scraping subsystems and templates; few-shot learning prompt generation support; finding and summarizing "todo" tasks in code; Google Drive documents, spreadsheets, and presentations summarization, extraction, and creation; Google Search and Microsoft Bing web search; OpenAI, Anthropic, and Hugging Face language models; iFixit repair guides and wikis search and summarization; MapReduce for question answering, combining documents, and question generation; N-gram overlap scoring; PyPDF, pdfminer, fitz, and pymupdf for PDF file text extraction and manipulation; Python and JavaScript code generation, analysis, and debugging; Milvus vector database [11] to store and retrieve vector embeddings; Weaviate vector database [12] to cache embedding and data objects; Redis cache database storage; Python RequestsWrapper and other methods for API requests; SQL and NoSQL databases including JSON support; Streamlit, including for logging; text mapping for k-nearest neighbors search; time zone conversion and calendar operations; tracing and recording stack symbols in threaded and asynchronous subprocess runs; and the Wolfram Alpha website and SDK. [13] As of April 2023, it can read from more than 50 document types and data sources. [14]

LangChain tools

Tool nameAccount required?API key required?LicencingDescriptionFeaturesDocumentation URL
Alpha VantageNoYesProprietaryProvides financial market data and analyticsFinancial data, analyticshttps://python.langchain.com/docs/integrations/tools/alpha_vantage
ApifyNoYesCommercialWeb scraping and automation platformWeb scraping, automationhttps://python.langchain.com/docs/integrations/tools/apify
ArXivNoNoOpen SourceAccess to scientific papers and researchScientific papers, researchhttps://python.langchain.com/docs/integrations/tools/arxiv
AWS LambdaYesYesProprietaryServerless computing serviceServerless computinghttps://python.langchain.com/docs/integrations/tools/awslambda
BashNoNoOpen SourceAccess to the shell environmentShell environment accesshttps://python.langchain.com/docs/integrations/tools/bash
Bearly Code InterpreterNoYesCommercialRemote execution of Python codePython code executionhttps://python.langchain.com/docs/integrations/tools/bearly
Bing SearchNoYesProprietarySearch engine powered by Microsoft BingSearch enginehttps://python.langchain.com/docs/integrations/tools/bing_search
Brave SearchNoNoOpen SourcePrivacy-focused search enginePrivacy-focused searchhttps://python.langchain.com/docs/integrations/tools/brave_search
ChatGPT PluginsNoYesProprietaryPlugins for ChatGPT language modelChatGPT pluginshttps://python.langchain.com/docs/integrations/tools/chatgpt_plugins
ConneryNoYesCommercialAction Tool Tool for performing actions using the Connery APIAPI actionshttps://python.langchain.com/docs/integrations/tools/connery
Dall-E Image GeneratorNoYesProprietaryText-to-image generation using OpenAI's DALL-E modelText-to-image generationhttps://python.langchain.com/docs/integrations/tools/dalle_image_generator
DataForSEONoYesCommercialSEO data and analytics platformSEO data, analyticshttps://python.langchain.com/docs/integrations/tools/dataforseo
DuckDuckGo SearchNoNoOpen SourcePrivacy-focused search engineSearch enginehttps://python.langchain.com/docs/integrations/tools/ddg
E2B Data AnalysisNoNoOpen SourceSandbox environment for running Python code for data analysisData analysis environmenthttps://python.langchain.com/docs/integrations/tools/e2b_data_analysis
Eden AINoYesCommercialSuite of AI tools and APIsAI tools, APIshttps://python.langchain.com/docs/integrations/tools/edenai_tools
Eleven Labs Text2SpeechNoYesCommercialText-to-speech API by Eleven LabsText-to-speechhttps://python.langchain.com/docs/integrations/tools/eleven_labs_tts
Exa SearchNoYesCommercialSearch engineSearch engine accesshttps://python.langchain.com/docs/integrations/tools/exa_search
File SystemNoNoOpen SourceTools for interacting with the local file systemFile system interactionhttps://python.langchain.com/docs/integrations/tools/filesystem
Golden QueryNoYesCommercialNatural language APIs for querying various servicesNatural language querieshttps://python.langchain.com/docs/integrations/tools/golden_query
Google Cloud Text-to-SpeechYesYesProprietaryText-to-speech API by Google CloudText-to-speechhttps://python.langchain.com/docs/integrations/tools/google_cloud_texttospeech
Google DriveYesYesProprietaryAccess and manage files on Google DriveGoogle Drive accesshttps://python.langchain.com/docs/integrations/tools/google_drive
Google FinanceYesYesProprietaryAccess financial data from Google FinanceFinancial datahttps://python.langchain.com/docs/integrations/tools/google_finance
Google JobsYesYesProprietarySearch for job listings using Google Jobs APIJob searchhttps://python.langchain.com/docs/integrations/tools/google_jobs
Google LensYesYesProprietaryVisual search and recognition tool by GoogleVisual search, recognitionhttps://python.langchain.com/docs/integrations/tools/google_lens
Google PlacesYesYesProprietaryAccess to Google Places API for location-based servicesLocation-based serviceshttps://python.langchain.com/docs/integrations/tools/google_places
Google ScholarYesYesProprietarySearch for scholarly articles using Google Scholar APIScholarly article searchhttps://python.langchain.com/docs/integrations/tools/google_scholar
Google SearchYesYesProprietarySearch engine powered by GoogleSearch enginehttps://python.langchain.com/docs/integrations/tools/google_search
Google SerperNoYesCommercialSearch engine results page (SERP) scraping toolSERP scrapinghttps://python.langchain.com/docs/integrations/tools/google_serper
Google TrendsYesYesProprietaryAccess to Google Trends dataTrend datahttps://python.langchain.com/docs/integrations/tools/google_trends
GradioNoNoOpen SourceLibrary for creating UIs for machine learning modelsMachine learning UIshttps://python.langchain.com/docs/integrations/tools/gradio_tools
GraphQLNoNoOpen SourceQuery language for APIsAPI querieshttps://python.langchain.com/docs/integrations/tools/graphql
HuggingFace HubNoNoOpen SourceTools for working with Hugging Face models and datasetsHugging Face models, datasetshttps://python.langchain.com/docs/integrations/tools/huggingface_tools
Human as a toolNoNoN/AUse human input as a tool for AIHuman inputhttps://python.langchain.com/docs/integrations/tools/human_tools
IFTTT WebHooksNoYesCommercialConnect and automate various web servicesWeb service automationhttps://python.langchain.com/docs/integrations/tools/ifttt
Ionic ShoppingNoYesCommercialTool for shopping using the Ionic APIShoppinghttps://python.langchain.com/docs/integrations/tools/ionic_shopping
Lemon AgentNoYesCommercialTool for interacting with the Lemon AI platformLemon AI interactionhttps://python.langchain.com/docs/integrations/tools/lemonai
MemorizeNoNoOpen SourceTool for memorizing information using unsupervised learningMemorizationhttps://python.langchain.com/docs/integrations/tools/memorize
NucliaNoYesCommercialUnderstanding Tool for indexing unstructured data using NucliaData indexinghttps://python.langchain.com/docs/integrations/tools/nuclia
OpenWeatherMapNoYesCommercialAccess to weather data using OpenWeatherMap APIWeather datahttps://python.langchain.com/docs/integrations/tools/openweathermap
Polygon Stock Market APINoYesCommercialAccess to stock market data using Polygon APIStock market datahttps://python.langchain.com/docs/integrations/tools/polygon
PubMedNoNoOpen SourceAccess to biomedical literature using PubMed APIBiomedical literaturehttps://python.langchain.com/docs/integrations/tools/pubmed
Python REPLNoNoOpen SourceInteractive Python shellPython shellhttps://python.langchain.com/docs/integrations/tools/python
Reddit SearchNoNoOpen SourceSearch for content on RedditReddit searchhttps://python.langchain.com/docs/integrations/tools/reddit_search
RequestsNoNoOpen SourceHTTP library for making requestsHTTP requestshttps://python.langchain.com/docs/integrations/tools/requests
SceneXplainNoNoOpen SourceTool for explaining the predictions of machine learning modelsModel explanationshttps://python.langchain.com/docs/integrations/tools/sceneXplain
SearchNoNoOpen SourceCollection of tools for searching and querying various servicesSearch toolshttps://python.langchain.com/docs/integrations/tools/search_tools
SearchApiNoYesCommercialTool for searching and querying various APIsAPI search toolshttps://python.langchain.com/docs/integrations/tools/searchapi
SearxNGNoNoOpen SourceSearch Privacy-focused metasearch enginePrivacy-focused searchhttps://python.langchain.com/docs/integrations/tools/searx_search
Semantic Scholar APINoNoOpen Sourcetool Access to academic papers using the Semantic Scholar APIAcademic paper searchhttps://python.langchain.com/docs/integrations/tools/semanticscholar
SerpAPINoYesCommercialSearch engine results page (SERP) scraping toolSERP scrapinghttps://python.langchain.com/docs/integrations/tools/serpapi
StackExchangeNoNoOpen SourceAccess to the Stack Exchange networkStack Exchange accesshttps://python.langchain.com/docs/integrations/tools/stackexchange
Tavily SearchNoYesCommercialSearch engine for finding answers to questionsQuestion answeringhttps://python.langchain.com/docs/integrations/tools/tavily_search
TwilioNoYesCommercialCommunication APIs for SMS, voice, and videoCommunication APIshttps://python.langchain.com/docs/integrations/tools/twilio
WikidataNoNoOpen SourceAccess to structured data from WikidataStructured data accesshttps://python.langchain.com/docs/integrations/tools/wikidata
WikipediaNoNoOpen SourceAccess to articles and information from WikipediaWikipedia accesshttps://python.langchain.com/docs/integrations/tools/wikipedia
Wolfram AlphaNoYesProprietaryComputational knowledge engineComputational knowledgehttps://python.langchain.com/docs/integrations/tools/wolfram_alpha
Yahoo Finance NewsNoYesCommercialAccess to financial news using Yahoo Finance APIFinancial newshttps://python.langchain.com/docs/integrations/tools/yahoo_finance_news
YoutubeNoYesCommercialAccess to YouTube data and functionalityYouTube accesshttps://python.langchain.com/docs/integrations/tools/youtube
Zapier Natural Language ActionsNoYesCommercialIntegration platform for automating workflowsWorkflow automationhttps://python.langchain.com/docs/integrations/tools/zapier


Related Research Articles

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">Django (web framework)</span> Python web framework

Django is a free and open-source, Python-based web framework that runs on a web server. It follows the model–template–views (MTV) architectural pattern. It is maintained by the Django Software Foundation (DSF), an independent organization established in the US as a 501(c)(3) non-profit.

The Dynamic Language Runtime (DLR) from Microsoft runs on top of the Common Language Runtime (CLR) and provides computer language services for dynamic languages. These services include:

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

Heroku is a cloud platform as a service (PaaS) supporting several programming languages. As one of the first cloud platforms, Heroku has been in development since June 2007, when it supported only the Ruby programming language, but now also supports Java, Node.js, Scala, Clojure, Python, PHP, and Go. For this reason, Heroku is said to be a polyglot platform as it has features for a developer to build, run and scale applications in a similar manner across most of these languages. Heroku was acquired by Salesforce in 2010 for $212 million.

<span class="mw-page-title-main">Zed Shaw</span> Software developer

Zed A. Shaw is a software developer best known for creating the Learn Code the Hard Way series of programming tutorials, as well as for creating the Mongrel web server for Ruby web applications. He is also well known for his polemical views on programming languages and communities.

scikit-learn Python library for machine learning

scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

<span class="mw-page-title-main">Julia (programming language)</span> Dynamic programming language

Julia is a high-level, general-purpose dynamic programming language, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, efficient garbage collection, and a just-in-time (JIT) compiler.

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">Project Jupyter</span> Open source data science software

Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.

semgrep or Semgrep CLI is a free open-source static code analysis tool developed by Semgrep, Inc. and open-source contributors. It has stable support for C#, Go, Java, JavaScript, JSON, Python, PHP, Ruby, and Scala. It has experimental support for nineteen other languages, as well as a language agnostic mode.

GitHub Copilot is a code completion tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.

Hugging Face, Inc. is an French-American company based in New York City that develops computer tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

deepset German natural language processing startup

deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.

A vector database management system (VDBMS) or simply vector database or vector store is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

References

  1. "Release 0.1.8". 19 February 2024. Retrieved 20 February 2024.
  2. Buniatyan, Davit (2023). "Code Understanding Using LangChain". Activeloop.
  3. Auffarth, Ben (2023). Generative AI with LangChain. Birmingham: Packt Publishing. p. 83. ISBN   9781835083468.
  4. Palazzolo, Stephanie (2023-04-13). "AI startup LangChain taps Sequoia to lead funding round at a valuation of at least $200 million". Business Insider. Archived from the original on 2023-04-18. Retrieved 2023-04-18.
  5. Griffith, Erin; Metz, Cade (2023-03-14). "'Let 1,000 Flowers Bloom': A.I. Funding Frenzy Escalates". The New York Times. ISSN   0362-4331. Archived from the original on 2023-04-18. Retrieved 2023-04-18.
  6. "Introducing LangServe, the best way to deploy your LangChains". LangChain Blog. 2023-10-12. Retrieved 2023-10-17.
  7. "Chatbots | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  8. "Retrieval-augmented generation (RAG) | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  9. "Summarization | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  10. "Synthetic data generation | 🦜️🔗 Langchain". python.langchain.com. Retrieved 2023-11-26.
  11. "Milvus — LangChain". python.langchain.com. Retrieved 2023-10-29.
  12. "Weaviate". python.langchain.com. Retrieved 2024-01-17.
  13. Hug, Daniel Patrick (2023-03-08). "Hierarchical topic tree of LangChain's integrations" (PDF). GitHub. Archived from the original on 2023-04-29. Retrieved 2023-04-18.
  14. "Document Loaders — LangChain 0.0.142". python.langchain.com. Archived from the original on 2023-04-18. Retrieved 2023-04-18.