Semantic file system

Last updated

Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than the location as with current file systems. It allows the data to be addressed by their content (associative access). Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.

Contents

Semantic file systems raise technical design challenges as indexes of words, tags or elementary signs of some sort have to be created and constantly updated, maintained and cached for performance to offer the desired random, multi-variate access to files in addition to the underlying, mostly traditional block-based filesystem.

A semantic file system can be envisioned as a part of a semantic desktop.

History

The notion of semantic file system was proposed in 1991 by researchers of the MIT and École des Mines de Paris. [1] They proposed an integrated system whose main query interface looked like a traditional file system interface via a virtual directory system that interpreted a path as a conjunctive query. Their implementation had automatic extraction of the relevant metadata via what they called file type specific transducers.

Starting in around 2004, a new wave of implementations centered on manual tagging of files and folders.

In 2008, researchers proposed to integrate semantic file systems with Semantic Web technologies. [2]

Types of metadata

Tags

Tags can be used instead of folders to circumvent the limits of a hierarchical model.

File type-specific

Gifford et al. [1] suggested the idea of file type-specific metadata automatically extracted by a file-type specific transducer.

For instance, for a source code text file, metadata could include the names of the procedures that the program exports or imports, procedure types, and the files included by the program. For a document, its date, author, title and structure (sections and subsections). For an e-mail, its sender, recipient and subject.

Lineage

In scientific workflows, provenance of a data file is important. A scientist might want to select a results file by filtering by the input dataset.

Architecture

Vasudevan and Pazandak [3] introduce the distinction between integrated and augmented approaches:

They suggest Open systems architecture as being well adapted to semantic file system implementations.

Compatibility with hierarchical file systems

Even integrated semantic file systems may choose to expose an interface for compatibility with existing local or distributed file system protocols. For instance, Gifford et al.’s 1991 implementation was fully compatible with NFS. [1]

Metadata storage

Extended file attributes provided by the file system can be a way to store the metadata.

A relational database is another very frequent way to store the metadata.

Research implementations

NameTypeMetadataOSDateComment
Lineage File System [4] File system extensionLineage Linux 2005Modifies the Linux kernel to log all process creation and file-related system calls. Uses a MySQL database.
SemFS (formerly TagFS) [5] File systemTags Linux, Windows2006On Windows, can be mounted as a WebDAV drive. On Linux, based on FUSE. Tags are stored as RDF. Uses an internal file system, not exposed.
SFS [1] File system extensionFile type-specificLinux1991

Implementations

NameTypeMetadata OS License Programming language(s)Last updateComment
Be File System (BFS)File system BeOS Proprietary; last version is freeware Metadata is stored in extended file attributes. Works with file manager Tracker
dantalian File system extensionTags Linux and contiguous POSIX-compatible file systems Apache 2 Python 2016Uses symlinks
dhtfs User-level file system extensionTagsLinux BSD 3-clause Python 2009Based on FUSE
Elyse Graphical file manager TagsWindows and MacOSProprietary, no cost2021
Fuse::TagLayer File system extensionTagsLinux GPL v3 / AL v2 Perl 2013Based on FUSE
Tabbles Graphical file manager TagsWindows Vista to 11 Proprietary, freemium .NET Framework Uses a SQL Server relational database.
Tag2Find TagsWindows XP and Vista 32-bit2007
TagsForAll Graphical file manager TagsWindows x64Freemium201470 tag limit in free version. Metadata is stored in two places: in files as ADS (Alternate Data Stream for NTFS), and in local database.
Tagsistant File systemTagsLinux GPL C 2017Tag-based, based on FUSE
TagSpaces Graphical file manager, web or desktop (uses Electron)TagsWindows, macOS, Linux, and Android. AGPL (Freemium) TypeScript, JavaScript, Java, Objective-C Continues
tagxfs File system extensionTagsLinux Boost Software License 1.0 C++ 2013Extends the user space file system to a tag based hierarchy.
TMSU Virtual file systemTags2022Uses a SQLite relational database.
TransparenTag File systemTagsLinux, BSD GPL v2 OCaml 2013Data and tags are stored as regular files
WinFS File system and managerAny type Windows XP Proprietary .NET Framework 2006Uses a relational database
xtagfs File system extensionTags MacOS X GPL v2 Python 2009Based on FUSE

See also

Related Research Articles

<span class="mw-page-title-main">Apache Subversion</span> Free and open-source software versioning and revision control system

Apache Subversion is a software versioning and revision control system distributed as open source under the Apache License. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Its goal is to be a mostly compatible successor to the widely used Concurrent Versions System (CVS).

<span class="mw-page-title-main">File Explorer</span> File manager application that is included with releases of the Microsoft Windows operating system

File Explorer, previously known as Windows Explorer, is a file manager application and default desktop environment that is included with releases of the Microsoft Windows operating system from Windows 95 onwards. It provides a graphical user interface for accessing the file systems, as well as user interface elements such as the taskbar and desktop.

WinFS was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003. It was intended as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of structured, semi-structured and unstructured data.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. According to Andreas M. Hein, the semantic gap can be defined as "the difference in meaning between constructs formed within different representation systems". In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation.

<span class="mw-page-title-main">Tag (metadata)</span> Keyword assigned to information

In information systems, a tag is a keyword or term assigned to a piece of information. This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system, although they may also be chosen from a controlled vocabulary.

A file system API is an application programming interface through which a utility or user program requests services of a file system. An operating system may provide abstractions for accessing different file systems transparently.

In computing, a virtual folder generally denotes an organizing principle for files that is not dependent on location in a hierarchical directory tree. Instead, it consists of software that coalesces results from a data store, which may be a database or a custom index, and presents them visually in the format in which folder views are presented. A virtual folder can be thought of as a view that lists all files tagged with a certain tag, and thus a simulation of a folder whose dynamic contents can be assembled on the fly, when requested. It is related in concept to several other topics in computer science, with names including saved search, saved query, and filtering.

A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.

Extended file attributes are file system features that enable users to associate computer files with metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem. Unlike forks, which can usually be as large as the maximum file size, extended attributes are usually limited in size to a value significantly smaller than the maximum file size. Typical uses include storing the author of a document, the character encoding of a plain-text document, or a checksum, cryptographic hash or digital certificate, and discretionary access control information.

<span class="mw-page-title-main">GNOME Commander</span> Twin-panel file manager for the GNOME desktop

GNOME Commander is a 'two panel' graphical file manager for GNOME. It is built using the GTK+ toolkit and GVfs.

<span class="mw-page-title-main">Directory (computing)</span> File system structure for locating files

In computing, a directory is a file system cataloging structure which contains references to other computer files, and possibly other directories. On many computers, directories are known as folders, or drawers, analogous to a workbench or the traditional office filing cabinet. The name derives from books like a telephone directory that lists the phone numbers of all the people living in a certain area.

Filesystem-level encryption, often called file-based encryption, FBE, or file/folder encryption, is a form of disk encryption where individual files or directories are encrypted by the file system itself.

In computer science, the semantic desktop is a collective term for ideas related to changing a computer's user interface and data handling capabilities so that data are more easily shared between different applications or tasks and so that data that once could not be automatically processed by a computer could be. It also encompasses some ideas about being able to share information automatically between different people. This concept is very much related to the Semantic Web, but is distinct insofar as its main concern is the personal use of information.

Tagsistant is a semantic file system for the Linux kernel, written in C and based on FUSE. Unlike traditional file systems that use hierarchies of directories to locate objects, Tagsistant introduces the concept of tags.

<span class="mw-page-title-main">Windows Search</span> Desktop search platform by Microsoft

Windows Search is a content index desktop search platform by Microsoft introduced in Windows Vista as a replacement for both the previous Indexing Service of Windows 2000 and the optional MSN Desktop Search for Windows XP and Windows Server 2003, designed to facilitate local and remote queries for files and non-file items in compatible applications including Windows Explorer. It was developed after the postponement of WinFS and introduced to Windows constituents originally touted as benefits of that platform.

The Cloud Data Management Interface (CDMI) is a SNIA standard that specifies a protocol for self-provisioning, administering and accessing cloud storage.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

  1. 1 2 3 4 Gifford, David; Jouvelot, Pierre; Sheldon, Mark A.; O’Toole, James W. Jr. (1991). "Semantic file systems" (PDF). ACM Operating Systems Review. 25 (5): 16–25. doi:10.1145/121133.121138.
  2. Faubel, Sebastian; Kuschel, Christian (2008). "Towards Semantic File System Interfaces" (PDF). ISWC (Posters & Demos).
  3. Vasudevan, Venu; Pazandak, Paul (1997). "Semantic File Systems". Object Services and Consulting, Inc. Retrieved 2024-03-05.
  4. Sar, Can; Cao, Pei (2005). "Lineage File System". Stanford University. Retrieved 2024-03-14.
  5. Bloehdorn, Stephan; Völkel, Max (2006). "TagFS — Tag Semantics for Hierarchical File Systems". WWW Conference Proceedings via CiteSeerX.

Research & Specifications