SAND CDBMS

Last updated

SAND Nucleus CDBMS is a column-oriented DBMS software system optimized for business intelligence applications, delivering the data warehousing component, developed by SAND Technology Inc.

Contents

Company history

SAND Technology was founded in 1983.

SAND CDBMS traces its roots to developments by Nucleus International Corporation research and eventual patent issued to, among others, Edward L. Glaser [1] on “Bit string compressor with boolean operation processing capability.” [2]

Originally encoded on firmware, the application is now completely software based.

SAND Technology is now a division of N. Harris Computer Corporation.

Description

A fully tokenized, bit array encoded and compressed database, data storage is column-oriented using domains across schemas/tables rather than as rows of data within tables. This results in an optimized platform for data analytics and data mining, although not suitable for transaction processing.

This architecture exhibits the following characteristics:

Platform agnostic, SAND CDBMS runs on 64 bit Windows or the following 64 bit Linux/Unix environments: HP-UX, IBM-AIX, Red Hat Linux, SuSE Linux and Sun Solaris.

Database Administration

SAND/DNA Analytics is managed via ANSI standard SQL and DML commands.

Database space allocation and core management are done by the database engine. This means typical database administration is focused on data modeling, data content, managing the data life cycle and managing the user profiles and access permissions.

Data loading is straight forward and only requires pointing to the source and destination in load scripts. There are multiple data manipulation functions and commands available, but all internal structure optimizations are automatically managed by the database engine.

While data load performance can be slower than row based databases, it is mitigated by not having to build any indexes or run post-load administration routines once complete. Parallel load processing or segmented pre-load processing can also be used to improve load performance.

A core feature of SAND CDBMS is its support of “virtual” mounting of a database. [3] This provides an isolated environment for developing and testing changes to a database, where upon dismounting, the entire environment is removed.

Related Research Articles

Data warehouse Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

IBM Db2 Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

A bitmap index is a special kind of database index that uses bitmaps.

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

Termcap

Termcap is a software library and database used on Unix-like computers. It enables programs to use display computer terminals in a device-independent manner, which greatly simplifies the process of writing portable text mode applications. Bill Joy wrote the first termcap library in 1978 for the Berkeley Unix operating system; it has since been ported to most Unix and Unix-like environments, even OS-9. Joy's design was reportedly influenced by the design of the terminal data store in the earlier Incompatible Timesharing System.

Exasol Database management software company

Exasol is an analytics database management software company. Its product is called Exasol, an in-memory, column-oriented, relational database management system

Greenplum

Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.

Vertica Software company

Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later CEOs.

Infobright is a commercial provider of column-oriented relational database software with a focus in machine-generated data. The company's head office is located in Toronto, Ontario, Canada. Most of its research and development is based in Warsaw, Poland. Support personnel are located in various offices around the world.

Lempel–Ziv–Stac is a lossless data compression algorithm that uses a combination of the LZ77 sliding-window compression algorithm and fixed Huffman coding. It was originally developed by Stac Electronics for tape compression, and subsequently adapted for hard disk compression and sold as the Stacker disk compression software. It was later specified as a compression algorithm for various network protocols. LZS is specified in the Cisco IOS stack.

In computer science, in-memory processing is an emerging technology for processing of data stored in an in-memory database. In-memory processing is one method of addressing the performance and power bottlenecks caused by the movement of data between the processor and the main memory. Older systems have been based on disk storage and relational databases using SQL query language, but these are increasingly regarded as inadequate to meet business intelligence (BI) needs. Because stored data is accessed much more quickly when it is placed in random-access memory (RAM) or flash memory, in-memory processing allows data to be analysed in real time, enabling faster reporting and decision-making in business.

Actian Vector is an SQL relational database management system designed for high performance in analytical database applications. It published record breaking results on the Transaction Processing Performance Council's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware.

The following is provided as an overview of and topical guide to databases:

SingleStore

SingleStore is a cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

SAP HANA Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

ClickHouse Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the Bay Area of California, United States with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

References