Semantic file system

Last updated

Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than their location, as with current file systems. It allows the data to be addressed by their content (associative access). Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.

Contents

Semantic file systems raise technical design challenges as indexes of words, tags or elementary signs of some sort have to be created and constantly updated, maintained and cached for performance to offer the desired random, multi-variate access to files in addition to the underlying, mostly traditional block-based filesystem.

A semantic file system can be envisioned as a part of a semantic desktop.

History

The notion of semantic file system was proposed in 1991 by researchers of the MIT and École des Mines de Paris. [1] They proposed an integrated system whose main query interface looked like a traditional file system interface via a virtual directory system that interpreted a path as a conjunctive query. Their implementation had automatic extraction of the relevant metadata via what they called file type specific transducers.

Starting in around 2004, a new wave of implementations centered on manual tagging of files and folders.

In 2008, researchers proposed to integrate semantic file systems with Semantic Web technologies. [2]

Types of metadata

Tags

Tags can be used instead of folders to circumvent the limits of a hierarchical model.

File type-specific

Gifford et al. [1] suggested the idea of file type-specific metadata automatically extracted by a file-type specific transducer.

For instance, for a source code text file, metadata could include the names of the procedures that the program exports or imports, procedure types, and the files included by the program. For a document, its date, author, title and structure (sections and subsections). For an e-mail, its sender, recipient and subject.

Lineage

In scientific workflows, provenance of a data file is important. A scientist might want to select a results file by filtering by the input dataset.

Architecture

Vasudevan and Pazandak [3] introduce the distinction between integrated and augmented approaches:

They suggest Open systems architecture as being well adapted to semantic file system implementations.

Compatibility with hierarchical file systems

Even integrated semantic file systems may choose to expose an interface for compatibility with existing local or distributed file system protocols. For instance, Gifford et al.’s 1991 implementation was fully compatible with NFS. [1]

Metadata storage

Extended file attributes provided by the file system can be a way to store the metadata.

A relational database is another very frequent way to store the metadata.

Research implementations

NameTypeMetadataOSDateComment
Lineage File System [4] File system extensionLineage Linux 2005Modifies the Linux kernel to log all process creation and file-related system calls. Uses a MySQL database.
SemFS (formerly TagFS) [5] File systemTags Linux, Windows2006On Windows, can be mounted as a WebDAV drive. On Linux, based on FUSE. Tags are stored as RDF. Uses an internal file system, not exposed.
SFS [1] File system extensionFile type-specificLinux1991

Implementations

NameTypeMetadata OS License Programming language(s)Last updateComment
Be File System (BFS)File system BeOS Proprietary; last version is freeware Metadata is stored in extended file attributes. Works with file manager Tracker
dantalian File system extensionTags Linux and contiguous POSIX-compatible file systems Apache 2 Python 2016Uses symlinks
dhtfs User-level file system extensionTagsLinux BSD 3-clause Python 2009Based on FUSE
Elyse Graphical file manager TagsWindows and MacOSProprietary, no cost2021
Fuse::TagLayer File system extensionTagsLinux GPL v3 / AL v2 Perl 2013Based on FUSE
Tabbles Graphical file manager TagsWindows Vista to 11 Proprietary, freemium .NET Framework Uses a SQL Server relational database.
Tag2Find TagsWindows XP and Vista 32-bit2007
TagsForAll Graphical file manager TagsWindows x64Freemium201470 tag limit in free version. Metadata is stored in two places: in files as ADS (Alternate Data Stream for NTFS), and in local database.
Tagsistant File systemTagsLinux GPL C 2017Tag-based, based on FUSE
TagSpaces Graphical file manager, web or desktop (uses Electron)TagsWindows, macOS, Linux, and Android. AGPL (Freemium) TypeScript, JavaScript, Java, Objective-C Continues
tagxfs File system extensionTagsLinux Boost Software License 1.0 C++ 2013Extends the user space file system to a tag based hierarchy.
TMSU Virtual file systemTags2022Uses a SQLite relational database.
TransparenTag File systemTagsLinux, BSD GPL v2 OCaml 2013Data and tags are stored as regular files
WinFS File system and managerAny type Windows XP Proprietary .NET Framework 2006Uses a relational database
xtagfs File system extensionTags MacOS X GPL v2 Python 2009Based on FUSE

See also

References

  1. 1 2 3 4 Gifford, David; Jouvelot, Pierre; Sheldon, Mark A.; O’Toole, James W. Jr. (1991). "Semantic file systems" (PDF). ACM Operating Systems Review. 25 (5): 16–25. doi:10.1145/121133.121138.
  2. Faubel, Sebastian; Kuschel, Christian (2008). "Towards Semantic File System Interfaces" (PDF). ISWC (Posters & Demos).
  3. Vasudevan, Venu; Pazandak, Paul (1997). "Semantic File Systems". Object Services and Consulting, Inc. Retrieved 2024-03-05.
  4. Sar, Can; Cao, Pei (2005). "Lineage File System". Stanford University. Retrieved 2024-03-14.
  5. Bloehdorn, Stephan; Völkel, Max (2006). "TagFS — Tag Semantics for Hierarchical File Systems". WWW Conference Proceedings via CiteSeerX.

Research & Specifications