Nokogiri (software)

Last updated
Nokogiri, an XML and HTML Parser
Original author(s) Aaron Patterson, Mike Dalessio
Developer(s) Aaron Patterson, Mike Dalessio, Yoko Harada, Timothy Elliott, John Shahid, Akinori MUSHA
Initial releaseOctober 30, 2008 (2008-10-30)
Stable release
1.15.5 / November 17, 2023;39 days ago (2023-11-17) [1]
Preview release
1.16.0.rc1 / December 13, 2023;13 days ago (2023-12-13) [1]
Repository
Operating system Linux, FreeBSD, OpenBSD, Windows, macOS
Platform Cross-platform
Available in Ruby, Java
Type Parser
License MIT License [2]
Website www.nokogiri.org   OOjs UI icon edit-ltr-progressive.svg

Nokogiri is an open source software library to parse HTML and XML in Ruby. [3] [4] [5] [6] It depends on libxml2 and libxslt to provide its functionality. [7]

Contents

Overview

It markets itself as providing a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is available for ruby as well as java through Jruby. It provides fast and standards-compliant parser by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).

It is one of the most downloaded Ruby gems, having been downloaded over 700 million times from the rubygems.org repository. [8]

Features

Enterprise support is available through tidelift, [9] a paid subscription model, offering commercial support for open source applications.

Related Research Articles

<span class="mw-page-title-main">Ruby (programming language)</span> General-purpose programming language

Ruby is an interpreted, high-level, general-purpose programming language which supports multiple programming paradigms. It was designed with an emphasis on programming productivity and simplicity. In Ruby, everything is an object, including primitive data types. It was developed in the mid-1990s by Yukihiro "Matz" Matsumoto in Japan.

<span class="mw-page-title-main">BioRuby</span>

BioRuby is a collection of open-source Ruby code, comprising classes for computational molecular biology and bioinformatics. It contains classes for DNA and protein sequence analysis, sequence alignment, biological database parsing, structural biology and other bioinformatics tasks. BioRuby is released under the GNU GPL version 2 or Ruby licence and is one of a number of Bio* projects, designed to reduce code duplication.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

<span class="mw-page-title-main">Ruby on Rails</span> Server-side open source web application framework

Ruby on Rails is a server-side web application framework written in Ruby under the MIT License. Rails is a model–view–controller (MVC) framework, providing default structures for a database, a web service, and web pages. It encourages and facilitates the use of web standards such as JSON or XML for data transfer and HTML, CSS and JavaScript for user interfacing. In addition to MVC, Rails emphasizes the use of other well-known software engineering patterns and paradigms, including convention over configuration (CoC), don't repeat yourself (DRY), and the active record pattern.

JRuby is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java. It is free software released under a three-way EPL/GPL/LGPL license. JRuby is tightly integrated with Java to allow the embedding of the interpreter into any Java application with full two-way access between the Java and the Ruby code.

why the lucky stiff Artist and computer programmer

Jonathan Gillette, known by the pseudonym why the lucky stiff, is a writer, cartoonist, artist, and programmer notable for his work with the Ruby programming language. Annie Lowrey described him as "one of the most unusual, and beloved, computer programmers" in the world. Along with Yukihiro Matsumoto and David Heinemeier Hansson, he was seen as one of the key figures in the Ruby community. His pseudonym might allude to the exclamation "Why, the lucky stiff!" from The Fountainhead by Ayn Rand.

RubyGems is a package manager for the Ruby programming language that provides a standard format for distributing Ruby programs and libraries, a tool designed to easily manage the installation of gems, and a server for distributing them. It was created by Chad Fowler, Jim Weirich, David Alan Black, Paul Brannan and Richard Kilmer during RubyConf 2004.

libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.

AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read “as-is”, or rendered to HTML or any other format supported by a DocBook tool-chain, i.e. PDF, TeX, Unix manpages, e-books, slide presentations, etc. Common file extensions for AsciiDoc files are txt and adoc.

<span class="mw-page-title-main">HTML5</span> Fifth and current version of hypertext markup language

HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

Rubinius is an alternative Ruby implementation created by Evan Phoenix. Based loosely on the Smalltalk-80 Blue Book design, Rubinius seeks to "provide a rich, high-performance environment for running Ruby code."

Haml is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner. Haml gives you the flexibility to have some dynamic content in HTML. Similar to other template systems like eRuby, Haml also embeds some code that gets executed during runtime and generates HTML code in order to provide some dynamic content. In order to run Haml code, files need to have a .haml extension. These files are similar to .erb or .eRuby files, which also help embed Ruby code while developing a web application.

Radiant is a free software content management system written in Ruby created and designed by John W. Long as a Ruby on Rails web application.

Data Format Description Language, published as an Open Grid Forum Recommendation in February 2021, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema allows any text or binary data to be read from its native format and to be presented as an instance of an information set.. The same DFDL schema also allows data to be taken from an instance of an information set and written out to its native format.

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

Foundation is a free responsive front-end framework, providing a responsive grid and HTML and CSS UI components, templates, and code snippets, including typography, forms, buttons, navigation and other interface elements, as well as optional functionality provided by JavaScript extensions. Foundation is an open source project, and was formerly maintained by ZURB. Since 2019, Foundation has been maintained by volunteers.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

James Nolan Weirich was a software developer, speaker, teacher, and contributor to the Ruby programming language community. He was active in the Ruby community worldwide, speaking at events in Asia, South America, Europe, and the United States.

Apotomo is a platform-specific widget framework for Ruby on Rails, designed to simplify the components and code of the view section of the MVC design pattern. Apotomo is built on, and was built by the same engineer as, the Cells Project.

Alchemy CMS, or just Alchemy, is a free and open-source content management system written on top of the Ruby on Rails web application framework. It is released under the BSD license and the code is available on GitHub. It comes as a mountable engine and is packaged as a Ruby gem.

References

  1. 1 2 "Releases - sparklemotion/nokogiri" . Retrieved 3 February 2022 via GitHub.
  2. "LICENSE". Github. Retrieved 5 September 2019.
  3. Peter Cooper (20 July 2009). Beginning Ruby: From Novice to Professional. Apress. pp. 528–529. ISBN   978-1-4302-2363-4 . Retrieved 15 May 2011.
  4. Chad Pytel; Tammer Saleh (9 November 2010). Rails AntiPatterns: Best Practice Ruby on Rails Refactoring. Addison-Wesley. p. 199. ISBN   978-0-321-60481-1 . Retrieved 15 May 2011.
  5. Mark Watson (2009). Scripting Intelligence: Web 3.0 Information, Gathering and Processing. Springer. p. 22. ISBN   978-1-4302-2351-1 . Retrieved 15 May 2011.
  6. Sparklemotion, Team. "Tutorials - Nokogiri 鋸". www.nokogiri.org. Retrieved 2016-02-04.
  7. "Nokogiri (README.md)" . Retrieved 22 November 2018 via GitHub.
  8. "nokogiri". RubyGems repository . Retrieved 26 December 2023.
  9. "nokogiri rubygems via the Tidelift Subscription". tidelift. Retrieved 3 February 2022.