Keyspace (distributed data store)

Last updated
A keyspace example with a number of column families. Keyspace example (data store).png
A keyspace example with a number of column families.

A keyspace (or key space) in a NoSQL data store is an object that holds together all column families of a design. [1] [2] It is the outermost grouping of the data in the data store. [3] It resembles the schema concept in Relational database management systems. [4] Generally, there is one keyspace per application.

Contents

Structure

A keyspace may contain column families or super columns. Each super column contains one or more column families, and each column family contains at least one column. The keyspace is the highest abstraction in a distributed data store. This is fundamental in preserving the structural heuristics in dynamic data retrieval. [5] Multiple relay protocol algorithms are integrated within the simple framework. [6]

Comparison with relational database systems

The keyspace has similar importance like a schema has in a database. In contrast to the schema, however, it does not stipulate any concrete structure, like it is known in the entity-relationship model used widely in the relational data models. For instance, the contents of the keyspace can be column families, each having different number of columns, or even different columns. So, the column families that somehow relate to the row concept in relational databases do not stipulate any fixed structure. The only point that is the same with a schema is that it also contains a number of "objects", which are tables in RDBMS systems and here column families or super columns.

So, in distributed data stores, the whole burden to handle rows that may even change from data-store update to update lies on the shoulders of the programmers.

Examples

As an example, we show a number of column families in a keyspace. The CompareWith keyword defines how the column comparison is made. In the example, the UTF-8 standard has been selected. Other ways of comparison exist, such as AsciiType, BytesType, LongType, TimeUUIDType.

<KeyspaceName="DeliciousClone"><KeysCachedFraction>0.01</KeysCachedFraction><ColumnFamilyCompareWith="UTF8Type"Name="Users"/><ColumnFamilyCompareWith="UTF8Type"Name="Bookmarks"/><ColumnFamilyCompareWith="UTF8Type"Name="Tags"/><ColumnFamilyCompareWith="UTF8Type"Name="UserTags"/><ColumnFamilyCompareWith="UTF8Type"CompareSubcolumnsWith="TimeUUIDType"ColumnType="Super"Name="UserBookmarks"/></Keyspace>

Another example shows a simplified Twitter clone data model:

<KeyspaceName="TwitterClone"><KeysCachedFraction>0.01</KeysCachedFraction><ColumnFamilyCompareWith="UTF8Type"Name="Users"/><ColumnFamilyCompareWith="UTF8Type"Name="UserAudits"/><ColumnFamilyCompareWith="UTF8Type"CompareSubcolumnsWith="TimeUUIDType"ColumnType="Super"Name="UserRelationships"/><ColumnFamilyCompareWith="UTF8Type"Name="Usernames"/><ColumnFamilyCompareWith="UTF8Type"Name="Statuses"/><ColumnFamilyCompareWith="UTF8Type"Name="StatusAudits"/><ColumnFamilyCompareWith="UTF8Type"CompareSubcolumnsWith="TimeUUIDType"ColumnType="Super"Name="StatusRelationships"/></Keyspace>

Related Research Articles

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

<span class="mw-page-title-main">Object–relational database</span> Database management system

An object–relational database (ORD), or object–relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, just as with pure relational systems, it supports extension of the data model with custom data types and methods.

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

Object–relational impedance mismatch creates difficulties going from data in relational data stores to usage in domain-driven object models. Object-orientation (OO) is the default method for business-centric design in programming languages. The problem lies in neither relational nor OO, but in the conceptual difficulty mapping between the two logic models. Both are logical models implementable differently on database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Entity Framework (EF) is an open source object–relational mapping (ORM) framework for ADO.NET. It was originally shipped as an integral part of .NET Framework, however starting with Entity Framework version 6.0 it has been delivered separately from the .NET Framework.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

Apache Empire-db is a Java library that provides a high level object-oriented API for accessing relational database management systems (RDBMS) through JDBC. Apache Empire-db is open source and provided under the Apache License 2.0 from the Apache Software Foundation.

A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called Not only SQL to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A column family is a database object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a column family is as a "table", each key-value pair being a "row". Each column is a tuple consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.

<span class="mw-page-title-main">Super column</span>

A super column is a tuple with a binary super column name and a value that maps it to many columns. They consist of a key–value pairs, where the values are columns. Theoretically speaking, super columns are (sorted) associative array of columns. Similar to a regular column family where a row is a sorted map of column names and column values, a row in a super column family is a sorted map of super column names that maps to column names and column values.

<span class="mw-page-title-main">Super column family</span>

A super column family is a NoSQL object that contains column families. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that are column families. In analogy with relational databases, a super column family is something like a "view" on a number of tables. It can also be seen as a map of tables.

<span class="mw-page-title-main">Column (data store)</span> NoSQL object of the lowest level in a keyspace

A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple consisting of three elements:

<span class="mw-page-title-main">Standard column family</span>

The standard column family is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key–value pair being a "row". Each column is a tuple consisting of a column name, a value, and a timestamp. In a relational database table, this data would be grouped together within a table with other non-related data.

<span class="mw-page-title-main">Actian Zen</span>

Actian Zen is an ACID-compliant, Zero-DBA, Embedded, Nano-footprint, Multi-Model, Multi-Platform database management system (DBMS) developed originally by Pervasive Software, which was acquired by Actian Corporation in 2013.

References

  1. Ronald Mathies (2010-03-18). "Installing and using Apache Cassandra With Java Part 2 (Data model): Keyspaces". Sodeso - Software Development Solutions. Archived from the original on 2014-02-03. Retrieved 2011-03-28. Keyspaces are quite simple again, from an RDBMS point of view you can compare this to your schema, normally you have one per application. A keyspace contains the ColumnFamilies. Note, however, there is no relationship between the ColumnFamilies. They are just separate containers.
  2. "Overview: Terminology/Abbreviations: Keyspace". Cassandra Wiki. Archived from the original on 2013-07-23. Retrieved 2011-03-31. [A Keyspace] Contains multiple Column Families.
  3. Arin Sarkissian (2010-08-23). "WTF is a SuperColumn? An Intro to the Cassandra Data Model". Arin Sarkissian's blog. Archived from the original on 2010-12-31. Retrieved 2011-03-25. A Keyspace is the outer most grouping of your data. All your ColumnFamily's go inside a Keyspace. Your Keyspace will probably named after your application.
  4. Guy Harrison (2010-08-23). "Playing with Cassandra and Oracle". Terminology in NoSQL. Guy Harrison's Web bits. Retrieved 2011-03-25. In Cassandra:
    • A Keyspace is like a schema
    • ColumnFamily is roughly like a table
    It can be confusing, with each NoSQL database using terms differently from each other, and all of them using terms differently from RDBMS.
  5. Fagin; et al. (2009). "Extendible hashing—a fast access method for dynamic files". ACM Transactions on Database Systems. 41 (3): 315–344.
  6. Fu; et al. "Security Issues and Solutions of the Key Management Protocols in Multi-Hop Relay Network". IEICE Transactions on Communications. 94 (5): 1295–1302.