Original author(s) | Stefan Urbanek [1] |
---|---|
Initial release | March 27, 2011 |
Stable release | 1.1 / July 2, 2016 |
Repository | github |
Written in | Python |
Operating system | Cross-platform |
Type | OLAP |
License | MIT License [2] |
Website | cubes |
Cubes is a light-weight open source multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data written in Python programming language released under the MIT License.
Cubes provides to an analyst or any application end-user "understandable and natural way of reporting using concept of data Cubes – multidimensional data objects".
Cubes was first publicly released in March 2011. The project was originally developed for Public Procurements of Slovakia. [3] Cubes 1.0 was released in September 2014 and presented on the PyData Conference in New York [4]
Cubes is capable of handling large amounts of data and complex queries. According to a review by TechTarget, Cubes can handle "data volumes in the hundreds of millions of rows" and "complex queries and calculations that require multi-level aggregations and dynamic subsetting." Additionally, the review notes that Cubes is well-suited for smaller organizations or teams that don't require the complexity and scalability of enterprise-level OLAP solutions. [5]
The logical conceptual model in Cubes is described using JSON and can be provided either in a form of a file, directory bundle or from an external model provider (for example a database). The basic model objects are: cubes and their measures and aggregates, dimensions and their attributes, hierarchies. Logical model also contains mapping from logical attributes to their physical location in a database (or other data source).
Example model:
{"cubes":[{"name":"sales","label":"Our Sales","dimensions":["date","customer","location","product"],"measures":["amount"]}]"dimensions":[{"name":"product","label":"Product","levels":[{"name":"category","label":"Category","attributes":["category_id","category_label"],},{"name":"product","label":"Product","attributes":["product_id","product_label"],}]},...]}
Cubes provides basic set of operations such as Data drilling and filtering (slicing and dicing). The operations can be accessed either through Python interface or through a light web server called Slicer.
Example of the python interface:
importcubesworkspace=Workspace("slicer.ini")browser=workspace.browser("sales")result=browser.aggregate()print(result.summary)
The Cubes provides a non-traditional OLAP server with HTTP queries and JSON response API. Example query to get "total amount of all contracts between January 2012 and June 2016 by month":
http://localhost:5000/cube/contracts/aggregate?drilldown=date&drilldown=criteria&cut=date:2012,1-2012,6&order=date.month:desc
The response looks like:
{"summary":{"contract_amount_sum":10000000.0},"remainder":{},"cells":[{"date.year":2012,"criteria.code":"ekonaj","contract_amount_sum":12345.0,"criteria.description":"economically best offer","criteria.sdesc":"best offer","criteria.id":3},{"date.year":2012,"criteria.code":"cena","contract_amount_sum":23456.0,"criteria.description":"lowest price","criteria.sdesc":"lowest price","criteria.id":4},..."total_cell_count":6,"aggregates":["contract_amount_sum"],"cell":[{"type":"range","dimension":"date","hierarchy":"default","level_depth":2,"invert":false,"hidden":false,"from":["2012","1"],"to":["2015","6"]}],"levels":{"criteria":["criteria"],"date":["year"]}}
The simple HTTP/JSON interface makes it very easy to integrate OLAP reports in web applications written in pure HTML and JavaScript.
The Slicer server contains endpoints describing the cube metadata which helps to create generic reporting applications [6] that don't have to know the database model structure and conceptual hierarchies up-in-front.
The Slicer server is written using the Flask (web framework).
The built-in SQL backend of the framework provides ROLAP functionality on top a relational database. Cubes contains a SQL query generator that translates the reporting queries into SQL statements. The query generator takes into account topology of the star or snowflake schema and executes only joins that are necessary to retrieve attributes required by the data analyst.
The SQL backend uses SQLAlchemy Python toolkit to construct the queries.
A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.
Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than three.
In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. "Snowflaking" is a method of normalizing the dimension tables in a star schema. When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.
Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation acquired Hyperion Solutions Corporation in 2007. Until late 2005 IBM also marketed an OEM version of Essbase as DB2 OLAP Server.
In computer programming contexts, a data cube is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.
Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulas.
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across multiple databases, or in disparate tables or files. Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services, Reporting Services and Analysis Services. Analysis Services includes a group of OLAP and data mining capabilities and comes in two flavors multidimensional and tabular, where the difference between the two is how the data is presented. In a tabular model, the information is arranged in two-dimensional tables which can thus be more readable for a human. A multidimensional model can contain information with many degrees of freedom, and must be unfolded to increase readability by a human.
Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. Such entities correspond to the mathematical notion of a sparse matrix.
The Oracle Database OLAP Option implements On-line Analytical Processing (OLAP) within an Oracle database environment. Oracle Corporation markets the Oracle Database OLAP Option as an extra-cost option to supplement the "Enterprise Edition" of its database.
A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.
A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.
The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.
The dimensional fact model (DFM) is an ad hoc and graphical formalism specifically devised to support the conceptual modeling phase in a data warehouse project. DFM is extremely intuitive and can be used by analysts and non-technical users as well. A short-term working is sufficient to realize a clear and exhaustive representation of multidimensional concepts. It can be used from the initial data warehouse life-cycle steps, to rapidly devise a conceptual model to share with customers.
An aggregate is a type of summary used in dimensional models of data warehouses to shorten the time it takes to provide answers to typical queries on large sets of data. The reason why aggregates can make such a dramatic increase in the performance of a data warehouse is the reduction of the number of rows to be accessed when responding to a query.
The following is provided as an overview of and topical guide to databases:
An array database management system or array DBMS provides database services specifically for arrays, that is: homogeneous collections of data items, sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.
icCube is a company founded in Switzerland that provides business intelligence (BI) software of the same name. The software can be fully embedded, can be hosted in a managed environment or installed in a customer's machine on premises.
The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational model. The functional model is also distinct from other similarly named concepts, including the DAPLEX functional database model and functional language databases.