SAP IQ

Last updated
SAP IQ
Developer(s) SAP
Initial release1990
Stable release
16 / May 2021;1 year ago (2021-05)
Operating system Microsoft Windows Server, Linux, UNIX
Platform Cross-platform software
Available inMulti-lingual
Type Business intelligence
Data warehouse
Analytics
License Proprietary
Website SAP IQ website

SAP IQ (formerly known as SAP Sybase IQ or Sybase IQ; IQ for Intelligent Query) is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited [1] with pioneering the commercialization of column-store technology.

Contents

At the foundation of SAP IQ lies a column store technology that allows for speed compression and ad-hoc analysis. SAP IQ has an open interface approach towards its ecosystem. SAP IQ is also integrated with SAP's Business Intelligence portfolio of products to form an end-to-end business analytics software stack, and is an integral component of SAP's In-Memory Data Fabric Architecture and Data Management Platform.

History

In the early 1990s, Waltham, Massachusetts-based Expressway Technologies, Inc. developed the Expressway 103, a column-based, engine optimized for analytics, that would eventually become Sybase IQ. Sybase acquired Expressway and re-introduced the product in 1995 as IQ Accelerator, then renamed it shortly thereafter to Sybase IQ, giving it version number 11.0. [2]

By offering the IQ product as part of a collection of related technologies often found in a data warehouse (including Sybase Adaptive Server Enterprise, Replication Server, PowerDesigner PowerDesigner, and SQL Anywhere), Sybase became one of the first mainstream companies to acknowledge the need for specialized products for the data warehouse market. [3]

With version 12.0, Sybase replaced the loosely coupled query interface from Adaptive Server Enterprise with a tight coupling with SQL Anywhere.

Version 16 brings a re-engineered column store for extreme, petabyte scale, data volumes, and more extreme data compression. [4]

In 2014, SAP HANA, together with partners BMMsoft, HP, Intel, NetApp, and Red Hat announced the world's largest data warehouse. A team of engineers from SAP, BMMsoft, HP, Intel, NetApp, and Red Hat, built the data warehouse using SAP HANA and SAP IQ 16, with BMMsoft Federated EDMT running on HP DL580 servers using Intel Xeon E7-4870 processors under Red Hat Enterprise Linux 6 and NetApp FAS6290 and E5460 storage. The development and testing of the 12.1PB data warehouse was conducted by the SAP/Intel Petascale lab in Santa Clara, Calif., and audited by InfoSizing, an independent Transaction Processing Council certified auditor. [5]

Version history

With the release of SP08, the version numbers have been changed to align with and match SAP HANA's version numbers to reflect the product's continuous integration with SAP HANA. The actual release title SP03 is a follow-on to SP02, covering all platforms not affected by the release.

In-memory data fabric

SAP's new approach streamlines and simplifies Data Warehousing into an In-Memory Data Fabric. [6]

SAP's In-Memory Data Fabric Architecture In-Memory Data Fabric Architecture.jpg
SAP's In-Memory Data Fabric Architecture

SAP IQ with SAP HANA

With the advent of big data, SAP IQ has coupled with SAP HANA to deliver a distributed in-memory analytics platform. There are three main applications and use cases which try to capitalize on SAP IQ's strengths concerning scalability and performance as an EDW and big data processor, while leveraging SAP HANA's in-memory speed for operational reporting:

SAP IQ as a Near-line Service (NLS) to SAP HANA

https://blogs.sap.com/2016/10/12/sap-nls-solution-sap-bw

SAP HANA for operational reporting with SAP IQ for big data processing (NLS)

In this scenario, SAP Enterprise Resource Planning (ERP) data goes into SAP HANA which acts as an operational data store for immediate analysis. Once the data is analyzed it is integrated into SAP IQ via Near-line storage mechanisms (as described above). Here SAP IQ acts as an enterprise data warehouse that receives data from a variety of traditional sources (such as OLTP Databases and files systems), and SAP HANA Operational Data Store(ODS) [7]

https://blogs.sap.com/2019/05/22/q-the-easy-installer-for-sap-iq/

SAP IQ as an Enterprise Data Warehouse (EDW) with SAP HANA as Agile Data Mart

When SAP IQ is used as an EDW, it can also be augmented with HANA's in-memory technology. Common uses include planning and analysis reports where simultaneous OLTP processing is needed. In this case, data flows from SAP IQ to SAP HANA. [7] SAP BusinessObjects BI can be used to achieve visibility across both platforms.

Technology

SAP IQ 16 Engine IQ16Engine.png
SAP IQ 16 Engine

To a user, SAP IQ looks just like any relational DBMS with a SQL-based language layer accessible via ODBC/JDBC drivers. However, inside, Sybase IQ is a column-oriented DBMS, which stores data tables as sections of columns of data rather than as rows of data like most transactional databases.

Column-Store Architecture

Column-orientation has a number of advantages. [8] If a search is being done for items matching a particular value in a column of data, only the storage objects corresponding to that data column within the table need to be accessed. A traditional row-based database would have to read the whole table, top to bottom. Another advantage is that when indexed correctly, a value that would have to be stored once in each row of data in a traditional database is stored only once, and in SAP IQ, an n-bitindex is used to access the data. [9] Nbit and tiered indexing is used to allow for increased compression and fast, incremental batch loads.

Additionally, the column-based storage enables SAP IQ to compress data efficiently on the fly. [10]

Indexing Technology

Prior to SAP IQ 16, each data page was structured as an array of cells of a fixed size, so all values have the same data type. While this storage approach is efficient for structured and fixed length data, this does not hold for the more unstructured and variable sized data that is seen today. To combat storage inefficiency and store variable sized data with minimal wasted space, each page is composed of cells of a variable size that are packed tightly together; the column store architecture supports a variable number of cells per page and various page formats within a column. SAP IQ also applies Lempel-Ziv-Welch ( LZW ) compression algorithms [11] to each data page when it is written to disk, to significantly reduce data volume. [12]

Bitmaps are used for secondary indexes. [11]

Massively Parallel Processing Framework

SAP IQ has a massively parallel processing (MPP) framework based on a shared-everything environment that supports distributed query processing. Most other products capable of MPP tend to be based on shared-nothing environments. The benefit of shared-everything is that it's more flexible in terms of the variety of queries that can be optimized—especially for balancing the needs of many concurrent users. The downside is that in extreme cases, competition among processors to access a shared pool of storage (usually a storage-area network), can lead to I/O contention, which affects query performance.[12]

However, the aforementioned storage architecture of SAP IQ allows compute and storage layers to scale out independently of each other and also allows these resources to be provisioned on-demand for better utilization without restructuring the underlying database.

Multiplex Architecture

SAP IQ uses a clustered grid architecture, which is made up of clusters of SAP IQ servers, or Multiplex. These clusters are used to scale performance for large numbers of concurrent queries or queries that are great in complexity. This is built upon a shared-everything architecture where all compute nodes interact with the same shared storage and queries have the ability to distribute across all compute nodes. The Multiplex has a coordinator node which manages the database catalog and coordinates transactional writes to the store. Other nodes can be reader only nodes, or readers and writes, like the coordinator node. The storage fabric can be implemented with numerous technologies that allow sharing amongst the multiplex nodes.

This architecture has multiple uses, including workload balancing and elastic virtual data marts. Workload balancing is achieved by the SAP IQ query engine through dynamically increasing/decreasing parallelism in response to changes in server activity. There is automatic failover if a node stops participating in a query, and other nodes will pick up work originally assigned to the failed node so the query can complete. On the client side, compatibility with external load balances ensures that queries are initiated on physical servers in a balanced fashion to eliminate bottlenecks. Physical nodes in the Multiplex can be grouped together into “logical servers” which allow workloads to be isolated from each other (for security or resource balancing purposes); machines can be added to these as demand changes. The aim of the grid architecture is to enable resiliency even during global transactions.

Loading Engine

The SAP IQ Loading Engine can be used for incremental batch, low latency, concurrent loading, and bulk loading (with both client and server data files). The bulk loading process allows for multiple load process to occur simultaneously, if the loads are of different tables. Data can be loaded from other databases as well as files. Page-level snapshot versioning allows concurrent loads and queries, with locking occurring at the table level only. With SAP Replication Server, now enhanced to optimize loads into SAP IQ, transactions are compiled into the fewest set of operations, and then bulk micro-batch loads into SAP IQ are performed, which gives the appearance of real-time, continuous loads.

The bulk loader now performs all operations in parallel to make full use of all server cores, remove bottlenecks, and keep all threads productive, instead of serializing the process. The loading process remains a two-phase process, first reading raw data and creating FP indexes, and second creating secondary indexes, but everything is executed in parallel. High Group indexes, which the query optimizer relies on for information about which columns/rows contain which data values, are now structured as a set of tiers, increasing as you move down the pyramid.

Lastly, SAP IQ introduces a write-optimized, Row Level Versioned (RLV) Delta store which enables high-velocity data loads and fast availability of data to users. This store is minimally indexes and compressed, with row-level locking for concurrent write, and its own transaction log and is append only, and acts as a companion to the main store, with data being loaded at high speed to the RLV store, and migrating to the main store later, merging to it periodically. To the user it does not appear as though there are two separate entities at work and queries operate transparently across the two stores. To make use of this the users can specify particular “hot” database tables as RLV tables.

Framework and Client APIs

SAP IQ offers query APIs based on pure ANSI SQL standards (with few restrictions), that include OLAP and full-text search support. Stored procedures are supported in both ANSI SQL and Transact-SQL dialects, and can be executed on a scheduled or immediate basis. As well there are database drivers for a variety of programming languages such as JAVA, C/C++m PHP, PERL, Python, Ruby, and ADO.Net.

Handling of Unstructured Data

SAP IQ is an analytics engine that can query both structured and unstructured data and join the results together. SAP IQ introduced a new text index and an SQL “contains” clause to facilitate this by searching for terms within a blob of unstructured text; SAP Sybase's partnerships with vendors allow for various binary forms of text files to be ingested into SAP IQ and text indexes created for them; these text indexes get the data ready for higher level text analysis applications to perform full-text searches within SAP IQ via SELECT statements. SELECT syntax can be used by applications performing tokenization, categorization and further text analysis.

In-Database Analytics / Extensibility Framework

In-database analytics are built upon the fundamental concept of keeping analytics algorithms close to the data for higher performance. The extensibility framework, called “in-database analytics” enables embedding of analytic functions inside the database engine of SAP IQ, moving analytics into the database, instead of to a specialized environment out of the database, a process which is error prone and slower. Pre-built functions are available natively and via partners of SAP IQ providing specialized statistical and data mining libraries that plug into SAP IQ. This framework increases SAP IQ's power to do advance processing and analysis as the data does not have to be moved out of the database into a specialized environment for analytics. All data and results obtained can be shared though the DBMS and can be easily acquired through an SQL interface. Through user defined functions (UDFS) partners can extend the DBMS with custom computations, by providing a specialized statistical and data mining libraries that plug right into SAP IQ to enhance its performance of advanced processing and analysis.

Security

SAP IQ provides several features, both included in the base product and licensable separately, to help protect the security of the user's data. A new feature introduced in IQ 16 is the Role Based Access Control (RBAC) which enables for a separation of duties and upholds the principle of least privilege, by allowing the breakdown of privileged operations into fine-grained sets that can be individually granted to users. Included as part of the base product are: users, groups and permissions, database administration authorities, user login policies, database encryption, transport-layer security, IPV6, role-based access control, and database auditing. Additional features are part of a licensable option called the advanced security option: FIPS encryption, Kerberos authentication, LDAP authentication, and Database column encryption.

Information Lifecycle Management (ILM)

As part of ILM, SAP IQ allows users to create multiple user DBSpaces (logical units of storage/containers for database objects) for organizing data. This can be used to separate structured or unstructured data, group it together according to age and value, or to partition table data. DBSpaces can also be marked as read-only to enable one-time consistency checking and back-up. Another application of ILM is the ability to partition tables, and place moving portions along the storage fabric and backup capabilities; this enables a storage management process where data cycles through tiered storage, moving from faster more expensive storage to slower, cheaper storage as it ages, partitioning data according to value.

High availability and Disaster Recovery

The multiplex set-up provides scalability and High Availability for compute nodes because a multiplex coordinator node can failover to an alternate coordinator node.

The SAP IQ Virtual Backup also allows users to quickly backup data, and along with storage replication technology, data is continuously copied so backups can occur quickly and “behind the scenes”. Once virtual backups are completed they can be verified through test and restore; enterprise data can be copied for development and testing. Then all that's left is to complete the backup at a transactionally consistent point in time. SAP claims that disaster recovery is easier with a shared everything approach to MPP. The SAP Sybase PowerDesigner modelling tool enables users to build an ILM model that can be deployed with SAP IQ. Storage types, DBSpaces, and lifecycle phases can be defined in an ILM model and the tool can be used to generate reports and create partition creation and movement scripts.

Hadoop Integration

SAP IQ provides federation with the Hadoop distributed file system (HDFS), a very popular framework for big data, so that enterprise users can continue to store data in Hadoop and utilize its benefits. Integration is achieved in four different ways, depending on the user's needs, through client-side federation, ETL, data, and query federation. Client-side federation joins data from IQ and Hadoop at a client application level while ETL federation lets the user load Hadoop data into the column store schemas of IQ. HDFS data can also be joined with IQ data on the fly through SQL queries from IQ, and finally results of MapReduce jobs can be combined with IQ data, also on the fly.

SAP Control Center

SAP Control Center replaces Sybase Central as a Web-Based graphical tool for administration and monitoring. SAP Control Center can be used for monitoring SAP Sybase servers and resources (node, multiplex) from any location, as well as monitoring performance, and spotting usage trends, The web application has a multi-tiered plug-in architecture which is made up of a server and product-based agents that bring SAP Sybase performance back to the Control Center Server.

Web-Enabled Analytics

SAP IQ comes with web-enabled application drivers facilitating access to SAP IQ from a variety of Web 2.0 programming and execution environments (Python, Perl, PHP, .Net, Ruby). Through query federation with other databases, developers can build applications that interact with multiple data sources at the same time (as well as with database platforms from other vendors). Federation proxy tables that map to tables in external databases can be created; these are materialized as in-memory tables but that can be interacted with as though they resided within SAP IQ. This way data sources can be combined in a unified view.

Supported Platforms

SAP IQ also supports plugging in external algorithms written in C++ and Java. SQL queries can call these algorithms, allowing for the execution of in-database analytics, which provides better performance and scalability. Additionally, Sybase IQ also provides drivers for access via languages such as PHP, Perl, Python, and Ruby on Rails.

SAP IQ supports most major operating system platforms, including:

Customers

Sybase claims that Sybase IQ is currently installed in over 2000 customer sites. Notable customers include comScore Inc., [13] CoreLogic, Investment Technology Group (ITG), [14] and the U.S. Internal Revenue Service (IRS). [15]

While Sybase IQ has been widely used for focused, data-mart-style deployments, [16] it has also been deployed as an enterprise data warehouse.

Related Research Articles

Database Organized collection of data in computing

In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues including supporting concurrent access and fault tolerance.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using the SQL for querying and maintaining the database.

IBM Db2 Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

SAP ASE , originally known as Sybase SQL Server, and also commonly known as Sybase DB or Sybase ASE, is a relational model database server developed by Sybase Corporation, which later became part of SAP AG. ASE was developed for the Unix operating system, and is also available for Microsoft Windows.

In database computing, Oracle Real Application Clusters (RAC) — an option for the Oracle Database software produced by Oracle Corporation and introduced in 2001 with Oracle9i — provides software for clustering and high availability in Oracle database environments. Oracle Corporation includes RAC with the Enterprise Edition, provided the nodes are clustered using Oracle Clusterware.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and triangulated irregular networks (TINs). While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium (OGC) developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/IEC standard is a part the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations.

A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns, and more options for data compression. However, they are typically less efficient for inserting new data.

SAP SQL Anywhere is a proprietary relational database management system (RDBMS) product from SAP. SQL Anywhere was known as Sybase SQL Anywhere prior to the acquisition of Sybase by SAP.

Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

Vertica Software company

Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later CEOs.

In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

The following is provided as an overview of and topical guide to databases:

SingleStore

SingleStore is a cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

Oracle NoSQL Database

Oracle NoSQL Database (ONDB) is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

SAP HANA Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

ClickHouse Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the Bay Area of California, United States with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

References

  1. C-Store: A column-oriented DBMS Archived 2010-06-19 at the Wayback Machine , Stonebraker et al., Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005
  2. Cole, Barb (November 7, 1994), Sybase makes a data warehousing play via acquisition, Network World
  3. Moore, Trevor (2010), Sybase IQ Survival Guide, p. 16, ISBN   978-1-4466-5758-4
  4. "SAP Sybase IQ 16 for XLDB analytics now available! - SAP Blogs". scn.sap.com.
  5. "SAP and Partners Set New Record for World's Largest Data Warehouse". Press Release. SAP. March 5, 2014. Retrieved August 19, 2016.
  6. "Archived copy". Archived from the original on 2014-11-27. Retrieved 2014-05-27.{{cite web}}: CS1 maint: archived copy as title (link)
  7. 1 2 "How does all this work together – BW, BW on HANA, Suite on HANA, HANA Live….. Part 8 - SAP Blogs". scn.sap.com.
  8. MacNicol, Roger; French, Blaine (August 2004), Sybase IQ Multiplex – Designed For Analytics (PDF), Proceedings of the 31st VLDB Conference, Trondheim, Norway
  9. Sybase IQ#cite note-Moore-1
  10. "Archived copy". Archived from the original on 2014-05-28. Retrieved 2014-05-27.{{cite web}}: CS1 maint: archived copy as title (link)
  11. 1 2 http://blasthemy.com/sap/TechEd13/1_Session_PDFs/RDP/RDP107/RDP107.pdf [ bare URL PDF ]
  12. "Dobler Consulting - Sybase - SQL Server - Oracle - MongoDB" (PDF). www.doblerconsulting.com. Archived from the original (PDF) on 2014-05-29.
  13. Henschen, Doug (November 24, 2010), ComScore's Big Data Deployment In Detail, Information Week
  14. Clark, Don (November 18, 2007), Start-Ups Mine Database Field — Nimble Software Helps Make Sense Of Information Tide (PDF), Wall Street Journal, archived from the original (PDF) on August 16, 2011
  15. Lai, Eric (March 22, 2008), Been audited lately? Blame the IRS's massive, superfast data warehouse, ComputerWorld
  16. Henschen, Doug (July 12, 2011), Sybase IQ Gains Beefier Analysis Capabilities, Information Week