Developer(s) | Harrison Chase |
---|---|
Initial release | October 2022 |
Stable release | 0.1.8 [1] / 19 February 2024 |
Repository | github.com/langchain-ai/langchain |
Written in | Python and JavaScript |
Type | Software framework for large language model application development |
License | MIT License |
Website | LangChain.com |
LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis. [2]
LangChain was launched in October 2022 as an open source project by Harrison Chase, while working at machine learning startup Robust Intelligence. The project quickly garnered popularity, [3] with improvements from hundreds of contributors on GitHub, trending discussions on Twitter, lively activity on the project's Discord server, many YouTube tutorials, and meetups in San Francisco and London. In April 2023, LangChain had incorporated and the new startup raised over $20 million in funding at a valuation of at least $200 million from venture firm Sequoia Capital, a week after announcing a $10 million seed investment from Benchmark. [4] [5]
In October 2023 LangChain introduced LangServe, a deployment tool designed to facilitate the transition from LCEL (LangChain Expression Language) prototypes to production-ready applications. [6]
LangChain's developers highlight the framework's applicability to use-cases including chatbots, [7] retrieval-augmented generation, [8] document summarization, [9] and synthetic data generation. [10]
As of March 2023, LangChain included integrations with systems including Amazon, Google, and Microsoft Azure cloud storage; API wrappers for news, movie information, and weather; Bash for summarization, syntax and semantics checking, and execution of shell scripts; multiple web scraping subsystems and templates; few-shot learning prompt generation support; finding and summarizing "todo" tasks in code; Google Drive documents, spreadsheets, and presentations summarization, extraction, and creation; Google Search and Microsoft Bing web search; OpenAI, Anthropic, and Hugging Face language models; iFixit repair guides and wikis search and summarization; MapReduce for question answering, combining documents, and question generation; N-gram overlap scoring; PyPDF, pdfminer, fitz, and pymupdf for PDF file text extraction and manipulation; Python and JavaScript code generation, analysis, and debugging; Milvus vector database [11] to store and retrieve vector embeddings; Weaviate vector database [12] to cache embedding and data objects; Redis cache database storage; Python RequestsWrapper and other methods for API requests; SQL and NoSQL databases including JSON support; Streamlit, including for logging; text mapping for k-nearest neighbors search; time zone conversion and calendar operations; tracing and recording stack symbols in threaded and asynchronous subprocess runs; and the Wolfram Alpha website and SDK. [13] As of April 2023, it can read from more than 50 document types and data sources. [14]
Tool name | Account required? | API key required? | Licencing | Description | Features | Documentation URL |
---|---|---|---|---|---|---|
Alpha Vantage | No | Yes | Proprietary | Provides financial market data and analytics | Financial data, analytics | https://python.langchain.com/docs/integrations/tools/alpha_vantage |
Apify | No | Yes | Commercial | Web scraping and automation platform | Web scraping, automation | https://python.langchain.com/docs/integrations/tools/apify |
ArXiv | No | No | Open Source | Access to scientific papers and research | Scientific papers, research | https://python.langchain.com/docs/integrations/tools/arxiv |
AWS Lambda | Yes | Yes | Proprietary | Serverless computing service | Serverless computing | https://python.langchain.com/docs/integrations/tools/awslambda |
Bash | No | No | Open Source | Access to the shell environment | Shell environment access | https://python.langchain.com/docs/integrations/tools/bash |
Bearly Code Interpreter | No | Yes | Commercial | Remote execution of Python code | Python code execution | https://python.langchain.com/docs/integrations/tools/bearly |
Bing Search | No | Yes | Proprietary | Search engine powered by Microsoft Bing | Search engine | https://python.langchain.com/docs/integrations/tools/bing_search |
Brave Search | No | No | Open Source | Privacy-focused search engine | Privacy-focused search | https://python.langchain.com/docs/integrations/tools/brave_search |
ChatGPT Plugins | No | Yes | Proprietary | Plugins for ChatGPT language model | ChatGPT plugins | https://python.langchain.com/docs/integrations/tools/chatgpt_plugins |
Connery | No | Yes | Commercial | Action Tool Tool for performing actions using the Connery API | API actions | https://python.langchain.com/docs/integrations/tools/connery |
Dall-E Image Generator | No | Yes | Proprietary | Text-to-image generation using OpenAI's DALL-E model | Text-to-image generation | https://python.langchain.com/docs/integrations/tools/dalle_image_generator |
DataForSEO | No | Yes | Commercial | SEO data and analytics platform | SEO data, analytics | https://python.langchain.com/docs/integrations/tools/dataforseo |
DuckDuckGo Search | No | No | Open Source | Privacy-focused search engine | Search engine | https://python.langchain.com/docs/integrations/tools/ddg |
E2B Data Analysis | No | No | Open Source | Sandbox environment for running Python code for data analysis | Data analysis environment | https://python.langchain.com/docs/integrations/tools/e2b_data_analysis |
Eden AI | No | Yes | Commercial | Suite of AI tools and APIs | AI tools, APIs | https://python.langchain.com/docs/integrations/tools/edenai_tools |
Eleven Labs Text2Speech | No | Yes | Commercial | Text-to-speech API by Eleven Labs | Text-to-speech | https://python.langchain.com/docs/integrations/tools/eleven_labs_tts |
Exa Search | No | Yes | Commercial | Search engine | Search engine access | https://python.langchain.com/docs/integrations/tools/exa_search |
File System | No | No | Open Source | Tools for interacting with the local file system | File system interaction | https://python.langchain.com/docs/integrations/tools/filesystem |
Golden Query | No | Yes | Commercial | Natural language APIs for querying various services | Natural language queries | https://python.langchain.com/docs/integrations/tools/golden_query |
Google Cloud Text-to-Speech | Yes | Yes | Proprietary | Text-to-speech API by Google Cloud | Text-to-speech | https://python.langchain.com/docs/integrations/tools/google_cloud_texttospeech |
Google Drive | Yes | Yes | Proprietary | Access and manage files on Google Drive | Google Drive access | https://python.langchain.com/docs/integrations/tools/google_drive |
Google Finance | Yes | Yes | Proprietary | Access financial data from Google Finance | Financial data | https://python.langchain.com/docs/integrations/tools/google_finance |
Google Jobs | Yes | Yes | Proprietary | Search for job listings using Google Jobs API | Job search | https://python.langchain.com/docs/integrations/tools/google_jobs |
Google Lens | Yes | Yes | Proprietary | Visual search and recognition tool by Google | Visual search, recognition | https://python.langchain.com/docs/integrations/tools/google_lens |
Google Places | Yes | Yes | Proprietary | Access to Google Places API for location-based services | Location-based services | https://python.langchain.com/docs/integrations/tools/google_places |
Google Scholar | Yes | Yes | Proprietary | Search for scholarly articles using Google Scholar API | Scholarly article search | https://python.langchain.com/docs/integrations/tools/google_scholar |
Google Search | Yes | Yes | Proprietary | Search engine powered by Google | Search engine | https://python.langchain.com/docs/integrations/tools/google_search |
Google Serper | No | Yes | Commercial | Search engine results page (SERP) scraping tool | SERP scraping | https://python.langchain.com/docs/integrations/tools/google_serper |
Google Trends | Yes | Yes | Proprietary | Access to Google Trends data | Trend data | https://python.langchain.com/docs/integrations/tools/google_trends |
Gradio | No | No | Open Source | Library for creating UIs for machine learning models | Machine learning UIs | https://python.langchain.com/docs/integrations/tools/gradio_tools |
GraphQL | No | No | Open Source | Query language for APIs | API queries | https://python.langchain.com/docs/integrations/tools/graphql |
HuggingFace Hub | No | No | Open Source | Tools for working with Hugging Face models and datasets | Hugging Face models, datasets | https://python.langchain.com/docs/integrations/tools/huggingface_tools |
Human as a tool | No | No | N/A | Use human input as a tool for AI | Human input | https://python.langchain.com/docs/integrations/tools/human_tools |
IFTTT WebHooks | No | Yes | Commercial | Connect and automate various web services | Web service automation | https://python.langchain.com/docs/integrations/tools/ifttt |
Ionic Shopping | No | Yes | Commercial | Tool for shopping using the Ionic API | Shopping | https://python.langchain.com/docs/integrations/tools/ionic_shopping |
Lemon Agent | No | Yes | Commercial | Tool for interacting with the Lemon AI platform | Lemon AI interaction | https://python.langchain.com/docs/integrations/tools/lemonai |
Memorize | No | No | Open Source | Tool for memorizing information using unsupervised learning | Memorization | https://python.langchain.com/docs/integrations/tools/memorize |
Nuclia | No | Yes | Commercial | Understanding Tool for indexing unstructured data using Nuclia | Data indexing | https://python.langchain.com/docs/integrations/tools/nuclia |
OpenWeatherMap | No | Yes | Commercial | Access to weather data using OpenWeatherMap API | Weather data | https://python.langchain.com/docs/integrations/tools/openweathermap |
Polygon Stock Market API | No | Yes | Commercial | Access to stock market data using Polygon API | Stock market data | https://python.langchain.com/docs/integrations/tools/polygon |
PubMed | No | No | Open Source | Access to biomedical literature using PubMed API | Biomedical literature | https://python.langchain.com/docs/integrations/tools/pubmed |
Python REPL | No | No | Open Source | Interactive Python shell | Python shell | https://python.langchain.com/docs/integrations/tools/python |
Reddit Search | No | No | Open Source | Search for content on Reddit | Reddit search | https://python.langchain.com/docs/integrations/tools/reddit_search |
Requests | No | No | Open Source | HTTP library for making requests | HTTP requests | https://python.langchain.com/docs/integrations/tools/requests |
SceneXplain | No | No | Open Source | Tool for explaining the predictions of machine learning models | Model explanations | https://python.langchain.com/docs/integrations/tools/sceneXplain |
Search | No | No | Open Source | Collection of tools for searching and querying various services | Search tools | https://python.langchain.com/docs/integrations/tools/search_tools |
SearchApi | No | Yes | Commercial | Tool for searching and querying various APIs | API search tools | https://python.langchain.com/docs/integrations/tools/searchapi |
SearxNG | No | No | Open Source | Search Privacy-focused metasearch engine | Privacy-focused search | https://python.langchain.com/docs/integrations/tools/searx_search |
Semantic Scholar API | No | No | Open Source | tool Access to academic papers using the Semantic Scholar API | Academic paper search | https://python.langchain.com/docs/integrations/tools/semanticscholar |
SerpAPI | No | Yes | Commercial | Search engine results page (SERP) scraping tool | SERP scraping | https://python.langchain.com/docs/integrations/tools/serpapi |
StackExchange | No | No | Open Source | Access to the Stack Exchange network | Stack Exchange access | https://python.langchain.com/docs/integrations/tools/stackexchange |
Tavily Search | No | Yes | Commercial | Search engine for finding answers to questions | Question answering | https://python.langchain.com/docs/integrations/tools/tavily_search |
Twilio | No | Yes | Commercial | Communication APIs for SMS, voice, and video | Communication APIs | https://python.langchain.com/docs/integrations/tools/twilio |
Wikidata | No | No | Open Source | Access to structured data from Wikidata | Structured data access | https://python.langchain.com/docs/integrations/tools/wikidata |
Wikipedia | No | No | Open Source | Access to articles and information from Wikipedia | Wikipedia access | https://python.langchain.com/docs/integrations/tools/wikipedia |
Wolfram Alpha | No | Yes | Proprietary | Computational knowledge engine | Computational knowledge | https://python.langchain.com/docs/integrations/tools/wolfram_alpha |
Yahoo Finance News | No | Yes | Commercial | Access to financial news using Yahoo Finance API | Financial news | https://python.langchain.com/docs/integrations/tools/yahoo_finance_news |
Youtube | No | Yes | Commercial | Access to YouTube data and functionality | YouTube access | https://python.langchain.com/docs/integrations/tools/youtube |
Zapier Natural Language Actions | No | Yes | Commercial | Integration platform for automating workflows | Workflow automation | https://python.langchain.com/docs/integrations/tools/zapier |
PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.
Django is a free and open-source, Python-based web framework that runs on a web server. It follows the model–template–views (MTV) architectural pattern. It is maintained by the Django Software Foundation (DSF), an independent organization established in the US as a 501(c)(3) non-profit.
The Dynamic Language Runtime (DLR) from Microsoft runs on top of the Common Language Runtime (CLR) and provides computer language services for dynamic languages. These services include:
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:
Heroku is a cloud platform as a service (PaaS) supporting several programming languages. As one of the first cloud platforms, Heroku has been in development since June 2007, when it supported only the Ruby programming language, but now also supports Java, Node.js, Scala, Clojure, Python, PHP, and Go. For this reason, Heroku is said to be a polyglot platform as it has features for a developer to build, run and scale applications in a similar manner across most of these languages. Heroku was acquired by Salesforce in 2010 for $212 million.
Zed A. Shaw is a software developer best known for creating the Learn Code the Hard Way series of programming tutorials, as well as for creating the Mongrel web server for Ruby web applications. He is also well known for his polemical views on programming languages and communities.
scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.
DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.
Julia is a high-level, general-purpose dynamic programming language, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, efficient garbage collection, and a just-in-time (JIT) compiler.
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level systems programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.
RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.
Project Jupyter is a project to develop open-source software, open standards, and services for interactive computing across multiple programming languages.
semgrep
or Semgrep CLI is a free open-source static code analysis tool developed by Semgrep, Inc. and open-source contributors. It has stable support for C#, Go, Java, JavaScript, JSON, Python, PHP, Ruby, and Scala. It has experimental support for nineteen other languages, as well as a language agnostic mode.
GitHub Copilot is a code completion tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.
Hugging Face, Inc. is an French-American company based in New York City that develops computer tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.
A vector database management system (VDBMS) or simply vector database or vector store is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.