SiSU

Last updated
SiSU
SiSU (software) logo.png
Developer(s) Ralph Amissah
Initial releaseJanuary 5, 2005;16 years ago (2005-01-05)
Stable release
7.1.11 / July 14, 2017;4 years ago (2017-07-14)
Repository OOjs UI icon edit-ltr-progressive.svg
Operating system Unix-like
Type Text Structuring, Publishing, Search
License GPLv3
Website sisudoc.org   OOjs UI icon edit-ltr-progressive.svg

SiSU (SiSU information structuring universe or Structured information, serialized units), [1] is a Unix command line-oriented framework for document structuring, publishing and search.

Contents

Usage

Using markup applied to a document, or a collection of documents, SiSU can produce plain text, HTML, XHTML, EPUB, XML, OpenDocument, LaTeX or PDF files, and populate an SQL database.

Document structuring

SiSU offers its user a way to structure plain text and to add graphics, hyperlinks, endnotes, footnotes etc. with simple text editing programs such as Notepad (Windows), TextEdit (Mac) or Gedit (Linux). The lightweight markup language is mnemonic and human readable.

To process the marked up document(s) with SiSU, the user issues a command via the command-line of the computer terminal. The output can be generated in multiple formats (html, pdf, epub, and others) with one single command.

Publishing and self-publishing

A document, or a collection of documents, which has been processed by SiSU is technically ready to be published on the web, or printed on paper. Canadian author Cory Doctorow, for instance, has used SiSU as a publishing tool and blogged about it. [2] In a newspaper article, Doctorow called SiSU an "automated ebook workflow tool". [3]

Earlier examples of webpublishing with SiSU are Projet de traité instituant l'Union Européenne / Draft Treaty Establishing the European Union [4] and the novel Tainaron by Finnish author Leena Krohn. [5]

SiSU can populate an SQL database with objects (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria are met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.

History

SiSU has been under development since 1997, and written in Ruby since 2000. It was released under the GPL in January 2005. SiSU developed out of work done on a project started earlier on documents related to (primarily private) international commercial law and international trade law started in 1993 on a site known then as Ananse, and more recently as LexMercatoria

SiSU first open source was on January 5, 2005, [6] and to Debian was in July 2005. SiSU version 1 was released December 2009. SiSU version 2 was released March 2010. Version 2 features a new processing engine. Markup remains substantially identical between versions, apart from changes to the markup for document headers (which contain document metadata and processing instructions). Both version 1 and 2 text processing engines are available in the version 2 tarball. Development takes place on the version 2 branch. Version 1 is available to guarantee compatibility with older prepared texts (prior to the updating of document headers), and as an earlier reference implementation.

Notes and references

  1. also chosen for the meaning of the Finnish term sisu .
  2. "Doctorow: Browse all versions". With a Little Help. 2010-10-03. Retrieved 2011-08-11.
  3. Doctorow, Cory (2010-12-17). "The Internet Problem: when an abundance of choice becomes an issue". The Guardian. London. Retrieved 2011-08-11. Guardian (London) 17 December, 2010.
  4. "Spinelli's Footsteps". 2005-11-28. Retrieved 2011-08-11.
  5. http://www.kaapeli.fi/krohn/tainaron/english/3/leena_krohn/tainaron.leena_krohn.1998/ This example was created with SiSU in February 1999. Accessed 2011-08-11.
  6. "Announce SiSU - publishing for e-documents, books, libraries, relational databases". Ruby Maillist. 2005-01-05. Retrieved 2015-05-05.

Related Research Articles

LaTeX Document preparation system

LaTeX is a software system for document preparation. When writing, the writer uses plain text as opposed to the formatted text found in "What You See Is What You Get" word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

troff, short for "typesetter roff", is the major component of a document processing system developed by AT&T Corporation for the Unix operating system. troff and the related nroff were both developed from the original roff.

XSLT is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. XSLT 1.0 is widely supported in modern web browsers.

GNU TeXmacs

GNU TeXmacs is a scientific word processor and typesetting component of the GNU Project. It was inspired by TeX and GNU Emacs, though it shares no code with those programs. TeXmacs does use TeX fonts. It is written and maintained by Joris van der Hoeven and a group of developers. The program produces structured documents with a WYSIWYG user interface. New document styles can be created by the user. The editor provides high-quality typesetting algorithms and TeX and other fonts for publishing professional looking documents.

TYPSET is an early document editor that was used with the 1964-released RUNOFF program, one of the earliest text formatting programs to see significant use.

SQLite Serverless relational database management system (RDBMS)

SQLite is a relational database management system (RDBMS) contained in a C library. In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.

YAML Human-readable data serialization format

YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax which intentionally differs from SGML. It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps thus JSON files are valid YAML 1.2.

reStructuredText is a file format for textual data used primarily in the Python programming language community for technical documentation.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

Well-known text (WKT) is a text markup language for representing vector geometry objects. A binary equivalent, known as well-known binary (WKB), is used to transfer and store the same information in a more compact form convenient for computer processing but that is not human-readable. The formats were originally defined by the Open Geospatial Consortium (OGC) and described in their Simple Feature Access. The current standard definition is in the ISO/IEC 13249-3:2016 standard.

HTML5 Fifth and current version of hypertext markup language

HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and last major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

Tainaron: Mail From Another City is a science fiction/fantasy novel written in 1985 by Finnish author Leena Krohn. The book is regarded as the author's breakthrough novel. Tainaron was nominated for the Finlandia Prize in 1985, The Nordic Council Literature Prize in 1988, the World Fantasy Award and the International Horror Guild Award in 2005. It won the Thanks for the Book Award in 1986.

Haml is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner. Haml gives the flexibility to have some dynamic content in HTML. Similar to other web languages like PHP, ASP, JSP and template systems like eRuby, Haml also embeds some code that gets executed during runtime and generates HTML code in order to provide some dynamic content. In order to run Haml code, files need to have a .haml extension. These files are similar to .erb or eRuby files which also help to embed Ruby code while developing a web application.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

EPUB E-book file format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook standard.

LaTeXML is a free, public domain software, which converts LaTeX documents to XML, HTML, EPUB, JATS and TEI.