Tree-sitter (parser generator)

Last updated
Tree-sitter
Original author(s) Max Brunsfeld
Initial release2018;6 years ago (2018)
Stable release
0.21.0 / 21 February 2024
Repository
Written in Rust, C
Platform Cross-platform
Type Parser generator
License MIT License
Website tree-sitter.github.io/tree-sitter/

In computing, Tree-sitter is a parser generator and incremental parsing library.

Contents

Details

It is used to parse source code into concrete syntax trees usable in compilers, interpreters, text editors, and static analyzers. [1] [2] It is specialized for use in text editors, as it supports incremental parsing for updating parse trees while code is edited in real time, [3] and provides a built-in S-expression query system for analyzing code. [4]

Text editors which have official integrations with Tree-sitter include Atom, [5] GNU Emacs, [6] Neovim, [7] Lapce, [8] Zed, [9] and Helix. [10] Language bindings allow it to be used from programming languages including Go, Haskell, Java, JavaScript (with Node.js and WASM), Kotlin, Lua, OCaml, Perl, Python, Ruby, Rust, and Swift. Tree-sitter parsers have been written for these languages and many others. [11] GitHub uses Tree-sitter to support in-browser symbolic code navigation in Git repositories. [12]

Tree-sitter uses a GLR parser, a type of LR parser. [13] [14] [12]

Tree-sitter was originally developed by GitHub for use in the Atom text editor, where it was first released in 2018. [15] [5]

See also

Related Research Articles

The editor war is the rivalry between users of the Emacs and vi text editors. The rivalry has become an enduring part of hacker culture and the free software community.

An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, and a debugger. Some IDEs, such as IntelliJ IDEA, Eclipse and Lazarus contain the necessary compiler, interpreter or both; others, such as SharpDevelop and NetBeans, do not.

<span class="mw-page-title-main">Vim (text editor)</span> Improved version of the Vi keyboard-oriented text editor

Vim is a free and open-source, screen-based text editor program. It is an improved clone of Bill Joy's vi. Vim's author, Bram Moolenaar, derived Vim from a port of the Stevie editor for Amiga and released a version to the public in 1991. Vim is designed for use both from a command-line interface and as a standalone application in a graphical user interface. Since its release for the Amiga, cross-platform development has made it available on many other systems. In 2018, it was voted the most popular editor amongst Linux Journal readers; in 2015 the Stack Overflow developer survey found it to be the third most popular text editor, and in 2019 the fifth most popular development environment.

<span class="mw-page-title-main">KDevelop</span> Integrated development environment

KDevelop is a free and open-source integrated development environment (IDE) for Unix-like computer operating systems and Windows. It provides editing, navigation and debugging features for several programming languages, and integration with build automation and version-control systems, using a plugin-based architecture.

Ctags is a programming tool that generates an index file of names found in source and header files of various programming languages to aid code comprehension. Depending on the language, functions, variables, class members, macros and so on may be indexed. These tags allow definitions to be quickly and easily located by a text editor, a code search engine, or other utility. Alternatively, there is also an output mode that generates a cross reference file, listing information about various names found in a set of language files in human-readable form.

<span class="mw-page-title-main">Source-code editor</span> Text editor specializing in software code

A source-code editor is a text editor program designed specifically for editing source code of computer programs. It may be a standalone application or it may be built into an integrated development environment (IDE).

This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.

Hemlock is a free Emacs text editor for most POSIX-compliant Unix systems. It follows the tradition of the Lisp Machine editor ZWEI and the ITS/TOPS-20 implementation of Emacs, but differs from XEmacs or GNU Emacs, the most popular Emacs variants, in that it is written in Common Lisp rather than Emacs Lisp and C—although it borrows features from the later editors. Hemlock was originally written by the CMU Spice project in Spice Lisp for the PERQ computer.

The following is a comparison of notable hex editors.

<span class="mw-page-title-main">Geany</span> Integrated Development Environment

Geany is a free and open-source lightweight GUI text editor using Scintilla and GTK, including basic IDE features. It is designed to have short load times, with limited dependency on separate packages or external libraries on Linux. It has been ported to a wide range of operating systems, such as BSD, Linux, macOS, Solaris and Windows. The Windows port lacks an embedded terminal window; also missing from the Windows version are the external development tools present under Unix, unless installed separately by the user. Among the supported programming languages and markup languages are C, C++, C#, Java, JavaScript, PHP, HTML, LaTeX, CSS, Python, Perl, Ruby, Pascal, Haskell, Erlang, Vala and many others.

<span class="mw-page-title-main">Clojure</span> Dialect of the Lisp programming language on the Java platform

Clojure is a dynamic and functional dialect of the Lisp programming language on the Java platform.

<span class="mw-page-title-main">GNU Emacs</span> GNU version of the Emacs text editor

GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of the free software movement. Its tag line is "the extensible self-documenting text editor."

Emacs, originally named EMACS, is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, self-documenting, real-time display editor". Development of the first Emacs began in the mid-1970s, and work on GNU Emacs, directly descended from the original, is ongoing; its latest version is 29.3, released March 2024.

V8 is a JavaScript and WebAssembly engine developed by Google for its Chrome browser. V8 is free and open-source software that is part of the Chromium project and also used separately in non-browser contexts, notably the Node.js runtime system.

<span class="mw-page-title-main">Org-mode</span> Open source mode for GNU Emacs

Org Mode is a mode for document editing, formatting, and organizing within the free software text editor GNU Emacs and its derivatives, designed for notes, planning, and authoring. The name is used to encompass plain text files that include simple marks to indicate levels of a hierarchy, and an editor with functions that can read the markup and manipulate hierarchy elements.

<span class="mw-page-title-main">Atom (text editor)</span> Free and open-source text and source code editor

Atom is a free and open-source text and source-code editor for macOS, Linux, and Windows with support for plug-ins written in JavaScript, and embedded Git control. Developed by GitHub, Atom was released on June 25, 2015.

<span class="mw-page-title-main">Spacemacs</span> Free software

Spacemacs is a configuration framework for GNU Emacs. It can take advantage of all of GNU Emacs' features, including both graphical and command-line user interfaces, and being executable under X Window System and within a Unix shell terminal. It is free and open-source software (FOSS) released under the GPL-3.0-or-later license.

<span class="mw-page-title-main">PureScript</span> Strongly-typed language that compiles to JavaScript

PureScript is a strongly-typed, purely-functional programming language that transpiles to JavaScript, C++11, Erlang, and Go. It can be used to develop web applications, server side apps, and also desktop applications with use of Electron or via C++11 and Go compilers with suitable libraries. Its syntax is mostly comparable to that of Haskell. In addition, it introduces row polymorphism and extensible records. Also, contrary to Haskell, the PureScript language is defined as having a strict evaluation strategy, although there are non-conforming back ends which implement a lazy evaluation strategy.

<span class="mw-page-title-main">Magit</span> Emacs interface for the Git version control system

Magit is an interface to the Git version control system (a Git Client), implemented as a GNU Emacs package written in Elisp. It is made available through the MELPA package repository, on which it is the most-downloaded non-library package, with over 3.7 million downloads as of February 2023.

EditorConfig is an open specification and file format for Syntax highlighting, text editors and integrated development environment (IDEs) that aims to maintain a consistent coding style, particularly aimed at groups working together. It can help keep multiple editors as well as individuals using the same conventions. It stores configurations in a file which can be shared by multiple people or used in multiple editors. It defines rules for how to format different programming languages or other structured text files with conventions such as what character to end a line with and how to manage indentation.

References

  1. "Introductory to Treesitter". Blog Teknologi Umum. Retrieved 2023-07-30.
  2. Petersen, Mickey. "How to Get Started with Tree-Sitter". Mastering Emacs. Retrieved 2023-10-30.
  3. "TreeSitter - the holy grail of parsing source code". symflower.com. Retrieved 2023-07-30.
  4. Petersen, Mickey. "Tree Sitter and the Complications of Parsing Languages". Mastering Emacs. Retrieved 2023-07-30.
  5. 1 2 Brunsfeld, Max (2018-10-31). "Atom understands your code better than ever before". The GitHub Blog. Retrieved 2023-07-30.
  6. "GNU Emacs NEWS -- history of user-visible changes".
  7. "Treesitter - Neovim docs". neovim.io. Retrieved 2023-07-30.
  8. "lapce/CHANGELOG.md at f4747fbd306a4b8fda6927e37593bf23f4a1584b · lapce/lapce". GitHub. Retrieved 2023-07-30.
  9. "Zed - Code at the speed of thought". Zed. Retrieved 2023-07-30.
  10. "Helix". helix-editor.com. Retrieved 2023-07-30.
  11. "Tree-sitter|Introduction". tree-sitter.github.io. Retrieved 2023-07-30.
  12. 1 2 Clem, Timothy; Thomson, Patrick (2021-08-31). "Static Analysis at GitHub: An experience report". Queue. 19 (4): 42–67. doi: 10.1145/3487019.3487022 . ISSN   1542-7730. S2CID   238412787.
  13. Nadeem, Ayman (2020-08-04). "CodeGen: Semantic's improved language support system". The GitHub Blog. Retrieved 2023-07-30.
  14. "Tree-sitter - a new parsing system for programming tools" by Max Brunsfeld , retrieved 2023-07-30. See 22:30 for Wagner influence and 29:27 for GLR implementation.
  15. Krill, Paul (2018-03-16). "What's new in GitHub's Atom text editor". InfoWorld. Retrieved 2023-10-30.