Tagsistant

Last updated
Tagsistant
Tagsistant logo.png
Developer(s) Tx0 <tx0@strumentiresistenti.org>
Stable release
0.6
Written in C
Operating system Linux kernel
Available inEnglish
Type Semantic file system
License GNU GPL
Website http://www.tagsistant.net/
Tagsistant
Developer(s) Tx0

Tagsistant is a semantic file system for the Linux kernel, written in C and based on FUSE. Unlike traditional file systems that use hierarchies of directories to locate objects, Tagsistant introduces the concept of tags.

Contents

Design and differences with hierarchical file systems

In computing, a file system is a type of data store which could be used to store, retrieve and update files. Each file can be uniquely located by its path. The user must know the path in advance to access a file and the path does not necessarily include any information about the content of the file.

Tagsistant uses a complementary approach based on tags. The user can create a set of tags and apply those tags to files, directories and other objects (devices, pipes, ...). The user can then search all the objects that match a subset of tags, called a query. This kind of approach is well suited for managing user contents like pictures, audio recordings, movies and text documents but is incompatible with system files (like libraries, commands and configurations) where the univocity of the path is a security requirement to prevent the access to a wrong content.

The tags/ directory

A Tagsistant file system features four main directories:

archive/
relations/
stats/
tags/

Tags are created as sub directories of the tags/ directory and can be used in queries complying to this syntax:

tags/subquery/[+/subquery/[+/subquery/]]/@/ [1]

where a subquery is an unlimited list of tags, concatenated as directories:

tag1/tag2/tag3/.../tagN/

The portion of a path delimited by tags/ and @/ is the actual query. The +/ operator joins the results of different sub-queries in one single list. The @/ operator ends the query.

To be returned as a result of the following query:

tags/t1/t2/+/t1/t4/@/

an object must be tagged as both t1/ and t2/ or as both t1/ and t4/. Any object tagged as t2/ or t4/, but not as t1/ will not be retrieved.

The query syntax deliberately violates the POSIX file system semantics by allowing a path token to be a descendant of itself, like in tags/t1/t2/+/t1/t4/@ where t1/ appears twice. As a consequence a recursive scan of a Tagsistant file system will exit with an error or endlessly loop, as done by UNIX find :

~/tagsistant_mountpoint$ find tags/ tags/ tags/document tags/document/+ tags/document/+/document tags/document/+/document/+ tags/document/+/document/+/document tags/document/+/document/+/document/+ [...]

This drawback is balanced by the possibility to list the tags inside a query in any order. The query tags/t1/t2/@/ is completely equivalent to tags/t2/t1/@/ and tags/t1/+/t2/t3/@/ is equivalent to tags/t2/t3/+/t1/@/.

The @/ element has the precise purpose of restoring the POSIX semantics: the path tags/t1/@/directory/ refers to a traditional directory and a recursive scan of this path will properly perform.

The reasoner and the relations/ directory

Tagsistant features a simple reasoner which expands the results of a query by including objects tagged with related tags. A relation between two tags can be established inside the relations/ directory following a three level pattern:

relations/tag1/rel/tag2/

The rel element can be includes or is_equivalent. To include the rock tag in the music tag, the UNIX command mkdir can be used:

mkdir -p relations/music/includes/rock

The reasoner can recursively resolve relations, allowing the creation of complex structures:

mkdir -p relations/music/includes/rock
mkdir -p relations/rock/includes/hard_rock
mkdir -p relations/rock/includes/grunge
mkdir -p relations/rock/includes/heavy_metal
mkdir -p relations/heavy_metal/includes/speed_metal

The web of relations created inside the relations/ directory constitutes a basic form of ontology.

Autotagging plugins

Tagsistant features an autotagging plugin stack which gets called when a file or a symlink is written. [2] Each plugin is called if its declared MIME type matches

The list of working plugins released with Tagsistant 0.6 is limited to:

The repository

Each Tagsistant file system has a corresponding repository containing an archive/ directory where the objects are actually saved and a tags.sql file holding tagging information as an SQLite database. If the MySQL database engine was specified with the --db argument, the tags.sql file will be empty. Another file named repository.ini is a GLib ini store with the repository configuration. [3]

Tagsistant 0.6 is compatible with the MySQL and Sqlite dialects of SQL for tag reasoning and tagging resolution. While porting its logic to other SQL dialects is possible, differences in basic constructs (especially the INTERSECT SQL keyword) must be considered.

The archive/ and stats/ directories

The archive/ directory has been introduced to provide a quick way to access objects without using tags. Objects are listed with their inode number prefixed. [4]

The stats/ directory features some read-only files containing usage statistics. A file configuration holds both compile time information and current repository configuration.

Main criticisms

It has been highlighted that relying on an external database to store tags and tagging information could cause the complete loss of metadata if the database gets corrupted. [5]

It has been highlighted that using a flat namespace tends to overcrowd the tags/ directory. [6] This could be mitigated introducing triple tags.

See also

Related Research Articles

mkdir Command used to make a new directory

The mkdir command in the Unix, DOS, DR FlexOS, IBM OS/2, Microsoft Windows, and ReactOS operating systems is used to make a new directory. It is also available in the EFI shell and in the PHP scripting language. In DOS, OS/2, Windows and ReactOS, the command is often abbreviated to md.

In computing, a symbolic link is a term for any file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution.

SQLite Serverless relational database management system (RDBMS)

SQLite is a database engine, written in the C language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the most widely deployed database engine, as it is used by several of the top web browsers, operating systems, mobile phones, and other embedded systems.

Trac is an open-source, web-based project management and bug tracking system. It has been adopted by a variety of organizations for use as a bug tracking system for both free and open-source software and proprietary projects and products. Trac integrates with major version control systems including Subversion and Git. Trac is used, among others, by the Internet Research Task Force, Django, FFmpeg, jQuery UI, WebKit, 0 A.D., and WordPress.

The SQL SELECT statement returns a result set of records, from one or more tables.

Maven is a build automation tool used primarily for Java projects. Maven can also be used to build and manage projects written in C#, Ruby, Scala, and other languages. The Maven project is hosted by the Apache Software Foundation, where it was formerly part of the Jakarta Project.

In Unix-like and some other operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

Remote File Sharing (RFS) is a Unix operating system component for sharing resources, such as files, devices, and file system directories, across a network, in a network-independent manner, similar to a distributed file system. It was developed at Bell Laboratories of AT&T in the 1980s, and was first delivered with UNIX System V Release 3 (SVR3). RFS relied on the STREAMS Transport Provider Interface feature of this operating system. It was also included in UNIX System V Release 4, but as that also included the Network File System (NFS) which was based on TCP/IP and more widely supported in the computing industry, RFS was little used. Some licensees of AT&T UNIX System V Release 4 did not include RFS support in SVR4 distributions, and Sun Microsystems removed it from Solaris 2.4.

The following is a comparison of version-control software. The following tables include general and technical information on notable version control and software configuration management (SCM) software. For SCM software not suitable for source code, see Comparison of open-source configuration-management software.

The NTFS file system defines various ways to redirect files and folders, e.g., to make a file point to another file or its contents. The object being pointed to is called the target. There are three classes of links:

<i>Serendipity</i> (software)

Serendipity is a blog and web-based content management system written in PHP and available under a BSD license. It supports PostgreSQL, MySQL, SQLite database backends, the Smarty template engine, and a plugin architecture for user contributed modifications.

The Doctrine Project is a set of PHP libraries primarily focused on providing persistence services and related functionality. Its prize projects are an object–relational mapper (ORM) and the database abstraction layer it is built on top of.

A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

PhpStorm Integrated development environment for PHP

PhpStorm is a proprietary, cross-platform IDE for PHP, built by the Czech Republic-based company JetBrains.

TACTIC (web framework) Web-based, open source workflow platform and digital asset management system

TACTIC is a web-based, open source workflow platform and digital asset management system supported by Southpaw Technology in Toronto, ON. Designed to optimize busy production environments with high volumes of content traffic, TACTIC applies business or workflow logic to combined database and file system management. Using elements of digital asset management, production asset management and workflow management, TACTIC tracks the creation and development of digital assets through production pipelines. TACTIC is available under both commercial and open-source licenses, and also as a hosted cloud service through Amazon Web Services Marketplace.

The following is provided as an overview of and topical guide to databases:

Apache Drill

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system, also productized as BigQuery. Drill is an Apache top-level project.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. "tags/ and relations/ directories".
  2. "How to write a plugin for Tagsistant?".
  3. "Key-value file parser".
  4. "Tagsistant 0.6 howto - Inodes".
  5. "Extended attributes and tag file systems".
  6. "The major problem with this approach is scalability". news.ycombinator.com.