Apache Drill

Apache Drill
Developer(s)	Apache Software Foundation
Initial release	May 19, 2015;9 years ago
Stable release	1.20.3 / January 7, 2023;2 years ago
Repository	Drill Repository
Written in	Java
Operating system	Cross-platform
License	Apache License 2.0
Website	drill.apache.org

Last updated May 19, 2025

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR,^[1]^[2] Drill is inspired by Google's Dremel system.^[3] Drill is an Apache top-level project.^[4]

Features

One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds.^[8]

Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared
Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
Extremely user and developer friendly
Pluggable architecture enables connectivity to multiple datastores
Version 1.9 added dynamic user-defined functions
Version 1.11 added cryptographic-related functions and PCAP file format support

Back-end Support

Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include:

All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR
NoSQL: MongoDB, Apache HBase, Apache Cassandra
Online Analytical Processing: Apache Kudu, Apache Druid, OpenTSDB
Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift, IBM Cloud Object Storage
Diverse data formats, including Apache Avro, Apache Parquet and JSON
RDBMs storage plugins (Using JDBC to connect to MySQL, PostgreSQL, and others)

A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ .^[9]

Front-end Support

Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming.

The dashboard library, Apache Superset,^[10] is particularly well suited for visualization of data queried with Drill.

References

↑ Friedman, Ellen (21 Sep 2015). "Apache Drill: Tracking its history as an open source community". Archived from the original on 18 March 2016.
↑ "Brief About The Differences between Apache Drill Vs Presto". HitechNectar. Retrieved 2023-04-13.
↑ "Spark SQL vs. Apache Drill-War of the SQL-on-Hadoop Tools". ProjectPro. Retrieved 2022-11-15.
↑ "The Apache Software Foundation Announces Apache Drill as a Top-Level Project". 2 December 2014. Retrieved 2014-12-02.
↑ "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.
↑ Vizard, Michael (2021-09-01). "Apache Software Foundation updates Drill for broader SQL queries". VentureBeat. Retrieved 2022-10-20.
↑ "Apache Drill Eliminates ETL, Data Transformation for MapR Database". The New Stack. 2016-04-11. Retrieved 2022-11-15.
↑ "DrillProposal - INCUBATOR - Apache Software Foundation".
↑ "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.
↑ Wayner, James R. Borck, Martin Heller, Steven Nuñez, Andrew C. Oliver, Ian Pointer and Peter (2020-10-05). "The best open source software of 2020". InfoWorld. Retrieved 2022-11-26.{{cite web}}: CS1 maint: multiple names: authors list (link)

Papers

Some papers influenced the birth and design. Here is a partial list:

2005 From Databases to Dataspaces: A New Abstraction for Information Management, the authors highlight the need for storage systems to accept all data formats and to provide APIs for data access that evolve based on the storage system's understanding of the data.
2010 Dremel: Interactive Analysis of Web-Scale Datasets

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Friedman, Ellen (21 Sep 2015). "Apache Drill: Tracking its history as an open source community". Archived from the original on 18 March 2016.

[2] "Brief About The Differences between Apache Drill Vs Presto". HitechNectar. Retrieved 2023-04-13.

[3] "Spark SQL vs. Apache Drill-War of the SQL-on-Hadoop Tools". ProjectPro. Retrieved 2022-11-15.

[announce-4] "The Apache Software Foundation Announces Apache Drill as a Top-Level Project". 2 December 2014. Retrieved 2014-12-02.

[5] "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.

[6] Vizard, Michael (2021-09-01). "Apache Software Foundation updates Drill for broader SQL queries". VentureBeat. Retrieved 2022-10-20.

[7] "Apache Drill Eliminates ETL, Data Transformation for MapR Database". The New Stack. 2016-04-11. Retrieved 2022-11-15.

[8] "DrillProposal - INCUBATOR - Apache Software Foundation".

[9] "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.

[10] Wayner, James R. Borck, Martin Heller, Steven Nuñez, Andrew C. Oliver, Ian Pointer and Peter (2020-10-05). "The best open source software of 2020". InfoWorld. Retrieved 2022-11-26.{{cite web}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category