InfinityDB

Last updated

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface (a subinterface of java.util.Map) that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures. [1]

Contents

A new Client/Server version 5.0 is in alpha testing, wrapping the established embedded version to provide shared access via a secure, remote server.

In the embedded system, data is stored to and retrieved from a single embedded database file using the InfnityDB API that allows direct access to the variable length item spaces. Database client programmers can construct traditional relations as well as specialized models that directly satisfy the needs of the dependent application. There is no limit to the number of items, database size, or JVM size, so InfinityDB can function in both the smallest environment that provides random access storage and can be scaled to large settings. Traditional relations and specialized models can be directed to the same database file. InfinityDB can be optimized for standard relations as well as all other types of data, allowing client applications to perform at a minimum of one million operations per second on a virtual, 8-core system.

AirConcurrentMap, is an in-memory map that implements the Java ConcurrentMap interface, [2] but internally it uses a multi-core design so that its performance and memory make it the fastest Java Map when ordering is performed and it holds medium to large numbers of entries. [3] AirConcurrentMap iteration is faster than any Java Map iterators, regardless of the specific map type.

Map API

InfinityDB can be accessed as an extended standard java.util.concurrent.ConcurrentNavigableMap, or via a low-level 'ItemSpace' API. The ConcurrentNavigableMap interface is a subinterface of java.util.Map but has special ordering and concurrency methods: this is the same API as java.util.concurrent.ConcurrentSkipListMap. Maps may be nested to form complex structures. The Maps have the standard semantics, but work internally on a 'tuple space', while the Maps are not actually stored but are helpers, each representing nothing more than an immutable tuple prefix. Maps may be created dynamically at high speed if needed for access, and are thread-safe and multi-core concurrent. The key and value types available include all Java primitive data types, Dates, Strings, small char or byte arrays, 'ByteStrings', 'huge array' indexes, Character Long Objects or Binary Long Objects, plus the special-purpose types 'EntityClass' and 'Attribute'. Maps may be multi-valued. Applications may choose to use the Map-based access alone and may mix in lower-level 'ItemSpace' access over the same tuples, as the Map access is just a wrapper and there is no tuple-level distinction.

The lower level 'ItemSpace' data model

The 12 primitive data types are called 'components' and they are atomic. Components can be concatenated into short composites called 'Items' which are the unit of storage and retrieval. Higher-level structures that combine these Items are client-devised, and include for example unlimited size records of an unlimited number of columns or attributes, with complex attribute values of unlimited size. Keys may then be a composition of components. Attribute values can be ordered sets of composite components, character large objects (CLOB's), binary large objects (BLOB's), or unlimited sparse arrays. Other higher-level structures built of multiple Items include key/value associations like ordered maps, ordered sets, Entity-Attribute-Value nets of quadruples, trees, DAG's, taxonomies, or full-text indexes. Mixtures of these can occur along with other custom client-defined structures.

Any ItemSpace may be represented as an extended JSON document, and JSON printers and parsers are provided. JSON documents are not native but are mapped to sets of Items when desired, at any scale determined by an Item prefix that represents the path to the sub-document. Hence, the entire database or any subtree of it down to a single value can be represented as extended JSON. Because Items are always kept sorted, the JSON keys of an object are always in order.

Data encoding

An 'ItemSpace' represents the entire database, and it is a simple ordered set of Items, with no other state. An Item is actually stored with each component encoded in variable-length binary form in a char array, with components being self-describing in a standard format which sorts correctly. Programmers deal with the components only as primitives, and the stored data is strongly typed. Data is not stored as text to be parsed with weak typing as in JSON or XML, nor is it parsed out of programmer-defined binary stream representations. There are no custom client-devised binary formats that can grow brittle, and which can have security, documentation, upgrade, testing, versioning, scaling, and debugging problems, such as is the case with Java Object serialization.

Performance scaling

All access to the system is via a few basic methods that can store or retrieve in order one variable-length 'Item' or 'tuple' at a time at a speed which is on the order of 1M operations/second aggregated over multiple threads when in memory. The operations are either the standard Map API for get(), put(), iterators, and so on, or at the lower level, insert(), delete(), update(), first(), next(), last(), and previous(). Typical Items are about 30 bytes uncompressed in memory, but LOB's for example use 1 KB Items. Because each operation affects only one Item, small data structures are fast to access. This is in contrast to chunked access, such as for example formatting and parsing entire JSON or XML texts or entire Java Object serialization graphs. The space and performance scaling of an ItemSpace is smooth as any size of client-imposed multi-Item structure is created, grows, shrinks, or disappears. On-storage performance is like any block-oriented B-tree, with blocks of about 4 KB, which is O(log(n)) per access. There is a block cache of 2.5 MB by default, which is of unlimited size but which is often about 100 MB. The cache grows only as needed.

Space Scaling

For performance and efficiency, the Items are stored inside a single B-tree prefix-compressed and variable length as an uninterpreted sequence of bytes for further compression. The B-tree may typically grow to the 100 GB's range but has no limits. There is only one file, so there is no log or other files to write to and flush. InfinityDB minimizes the size of its database file through four types of compression (prefix, suffix, zlib, and UTF-8).

Schemaless upgrade

Schema upgrade when structures are extended or modified is done by adding or removing Items in new ways at runtime, and there are no upgrade scripts - hence the data model is NoSQL and schemaless.

Besides the normal Java primitive types, there are 'EntityClass' and 'Attribute' types, each identified by a name or number. These are optional 'metadata' that can be mixed in with the other components of any Item. They can be used to represent tables, for example, where each table is given a particular EntityClass near the front of the Item, and each column is given a different Attribute. The Items of the table each start with a particular EntityClass, then therehe are one or more normal primitives representing an 'entity' (like a key) then there is a particular Attribute corresponding to a column, and finally some normal primitives representing the value of that Attribute. This simple pattern can be extended at any time to allow nested tables inside any Attribute, or nested Attributes inside other Attributes, or multi-valued Attributes and much more. There is no fixed schema elsewhere, so new data that arrives in the system describes itself, at Item-level granularity. The EntityClass and Attribute numbers or names can be represented in extended JSON. When data is displayed in the web-based Client/Server database browser, it can be viewed, manipulated, and transferred as a list of sorted formatted Items, or as JSON documents, or as nestable tables whose visible structure is determined by the EntityClass and Attributes that are mixed into the Items. The dynamic loose flexibility of JSON and the formality of tables are combined.

Transactionality

Both global 'ACD' and per-thread 'ACID' transactions are provided. Each InfinityDB instance stores data into a single database file and does not require additional log or rollback files. In the event of any catastrophe except power failure or other hardware malfunction, the database is guaranteed to be consistent with the status as of completion of the last global commit. Recovery after abrupt termination is immediate and requires no slow log replay. Bulk loading is globally transactional with unlimited data size, and is concurrent with all other uses. Global transactions do not provide inter-thread isolation, so the semantics are 'ACD' (rather than 'ACID'). Alternatively, ACID transactions employ optimistic locking to allow inter-thread isolation.

Immediate garbage collection

As data structures grow and shrink, freed space is reclaimed immediately and made available for other structures. Systems can run indefinitely without gradual space leaks or long interruptions during garbage reclamation phases. When a data structure becomes empty, all of its space is recycled, rather than leaving behind 'tombstones' or other place holders. For example, a possibly very large multi-value attribute may shrink to one value, becoming as efficient as any single-valued attribute, and if that last value is deleted, all space for it is reclaimed, including the space for the attribute it was attached to, and if a row has only attributes with no values, the row is reclaimed completely as well. If a table loses all of its rows, the space for the table is reclaimed. Any size or type of data structure has this property. There are no reference counters, hence any type of graph is incrementally collected automatically.

Products

InfinityDB Client/Server (in alpha-testing state) features:

InfinityDB Encrypted (Version 5) (in beta-testing state) features:

InfinityDB Embedded (version 4) features:

AirConcurrentMap is a Java ConcurrentNavigableMap implementation. It features:

For both InfinityDB and AirConcurrentMap:

History

Roger L. Deran designed and developed the Infinity Database Engine over 20 years ago and holds US Patents 5283894 [4] and 10417209. [5] The Infinity Database Engine was first deployed in Intel 8088 assembly language in the ROSCOR sports video editor (RSVE) that was licensed to NFL teams in the 1980s. Lexicon purchased the RSVE in 1989, and greatly expanded its deployment to all types of professional and college sports. [6] Java version 2.0 added transactionality, and version 3.0 added concurrency features which are patent pending, and apply to InfinityDB as well as AirConcurrentMap. Infinity DB remains in active use in thousands of sites of various kinds, while AirConcurrentMap is new.

Uses of the all-JAVA InfinityDB, marketed by Boiler Bay Inc. since 2002, include:

Related Research Articles

Data structure Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

Eiffel is an object-oriented programming language designed by Bertrand Meyer and Eiffel Software. Meyer conceived the language in 1985 with the goal of increasing the reliability of commercial software development; the first version becoming available in 1986. In 2005, Eiffel became an ISO-standardized language.

In computing, serialization or serialisation is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

ABAP is a high-level programming language created by the German software company SAP SE. It is extracted from the base computing languages Java, C, C++ and Python. It is currently positioned, alongside Java, as the language for programming the SAP NetWeaver Application Server, which is part of the SAP NetWeaver platform for building business applications.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, please see Comparison of the Java and .NET platforms.

Java syntax

The syntax of Java refers to the set of rules defining how a Java program is written and interpreted.

The object–relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by a relational schema.

Java collections framework

The Java collections framework is a set of classes and interfaces that implement commonly reusable collection data structures.

Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. Such entities correspond to the mathematical notion of a sparse matrix.

MultiValue is a type of NoSQL and multidimensional database, typically considered synonymous with PICK, a database originally developed as the Pick operating system.

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Apache CouchDB

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software that requires access to stored data, such that the database system is "hidden" from the application's end-user and requires little or no ongoing maintenance. It is actually a broad technology category that includes

Couchbase Server Open-source NoSQL database

Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

Amazon DynamoDB

Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key–value and document data structures and is offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB exposes a similar data model to and derives its name from Dynamo, but has a different underlying implementation. Dynamo had a multi-leader design requiring the client to resolve version conflicts and DynamoDB uses synchronous replication across multiple data centers for high durability and availability. DynamoDB was announced by Amazon CTO Werner Vogels on January 18, 2012, and is presented as an evolution of Amazon SimpleDB solution.

The following is provided as an overview of and topical guide to databases:

Wikibase Collection of software (applications and libraries) for creating, managing and sharing structured data

Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the Wikibase Repository, an extension for storing and managing data, and the Wikibase Client which allows for the retrieval and embedding of structured data from a wikibase repository. Wikibase was developed for and is used by Wikidata.

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.concurrent.ConcurrentMapinterface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap, and the java.util.concurrent.ConcurrentNavigableMap interface was added as a subinterface combination.

References

  1. Peters, L & Lavers, T (2008). Swing Extreme Testing: The Extreme Approach to Complete Java Application Testing. Packt Publishing. p. 224. ISBN   9781847194824.
  2. "The Map Interface (The Java™ Tutorials > Collections > Interfaces)".
  3. http://boilerbay.com/docs/AirConcurrentMap_Performance_Testing.pdf
  4. US 5283894 "Lockless concurrent B-tree index meta access method for cached nodes"
  5. US 10417209 "Concurrent Index Using Copy On Write"
  6. New York Times - Sports World Specials: Video Technology; Custom Replays

See also