Company type | Subsidiary |
---|---|
Industry | Computer software |
Founded | 2011 |
Headquarters | , United States |
Products | Hortonworks Data Platform, Hortonworks DataFlow, Hortonworks DataPlane |
Number of employees | ~1,110 (2017) [1] |
Parent | Cloudera |
Website | Hortonworks.com |
Hortonworks, Inc. was a data software company based in Santa Clara, California that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing.
Hortonworks software was used to build enterprise data services and applications such as IoT (connected cars, for example), single view of X (such as customer, risk, patient), and advanced analytics and machine learning (such as next best action and realtime cybersecurity). Hortonworks had three interoperable product lines:
In January 2019, Hortonworks completed its merger with Cloudera. [3]
Hortonworks was formed in June 2011 as an independent company, funded by $23 million venture capital from Yahoo! and Benchmark Capital. Its first office was in Sunnyvale, California. [4] The company employed contributors to the open source software project Apache Hadoop. [5] The Hortonworks Data Platform (HDP) product, first released in June 2012, [6] included Apache Hadoop and was used for storing, processing, and analyzing large volumes of data. The platform was designed to deal with data from many sources and formats. The platform included Hadoop technology such as the Hadoop Distributed File System, MapReduce, Pig, Hive, HBase, ZooKeeper, and additional components. [7]
Eric Baldeschweiler (from Yahoo) was initial chief executive, and Rob Bearden chief operating officer, formerly from SpringSource. Benchmark partner Peter Fenton was a board member. The company name refers to the character Horton the Elephant, since the elephant is the symbol for Hadoop. [4] [8]
In October 2018, Hortonworks and Cloudera announced they would be merging in an all-stock merger of equals. [9] After the merger, the Apache products of Hortonworks became Cloudera Data Platform.
Apache Hadoop is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
Douglass Read Cutting is a software designer, advocate for, and creator of open-source search technology. He founded two technology projects, Lucene and Nutch, with Mike Cafarella. The Apache Software Foundation now manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop.
Horton the Elephant is a fictional character from the 1940 book Horton Hatches the Egg and 1954 book Horton Hears a Who!, both by Dr. Seuss. He is also featured in the short story Horton and the Kwuggerbug, first published for Redbook in 1951 and later rediscovered by Charles D. Cohen and published in the 2014 anthology Horton and the Kwuggerbug and More Lost Stories. In all books and other media, Horton is characterized as a kind, sweet-natured, and naïve elephant who manages to overcome hardships.
Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.
Christophe Bisciglia is an American entrepreneur known for his work with big data and cloud computing. Known for helping to popularize the programming model MapReduce while working at Google, and in addition he co-founded Cloudera and WibiData.
Aster Data Systems was a data management and analysis software company headquartered in San Carlos, California. It was founded in 2005 and acquired by Teradata in 2011.
Cloudera, Inc. is an American data lake software company.
Within database management systems, the record columnar file or RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.
MapR was a business software company headquartered in Santa Clara, California. MapR software provides access to a variety of data sources from a single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management system, and event stream processing, combining analytics in real-time with operational applications. Its technology runs on both commodity hardware and public cloud computing services. In August 2019, following financial difficulties, the technology and intellectual property of the company were sold to Hewlett Packard Enterprise.
The Oracle data appliance consists of hardware and software from Oracle Corporation sold as a computer appliance. It was announced in 2011,and is used for the consolidating and loading unstructured data into Oracle Database software.
Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.
Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.
WibiData was a software company that developed big data applications for enterprises to personalize their customer experiences. It developed applications based on open-source technologies Apache Hadoop, Apache Cassandra, Apache HBase, Apache Avro and the Kiji Project. Wibidata was founded under the name Odiago in 2010 by Christophe Bisciglia, Aaron Kimball, and Garrett Wu. Based in San Francisco, California, WibiData was backed by investors such as Canaan Partners, New Enterprise Associates, SV Angel, and Eric Schmidt.
Platfora, Inc. is a big data analytics company based in San Mateo, California. The firm’s software works with the open-source software framework Apache Hadoop to assist with data analysis, data visualization, and sharing.
PSSC Labs is a California-based company that provides supercomputing solutions in the United States and internationally. Its products include "high-performance" servers, clusters, workstations, and RAID storage systems for scientific research, government and military, entertainment content creators, developers, and private clouds. The company has implemented clustering software from NASA Goddard's Beowulf project in its supercomputers designed for bioinformatics, medical imaging, computational chemistry and other scientific applications.
Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native NoSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of NoSQL stores.
Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Apache ORC is a free and open-source column-oriented data storage format. It is similar to the other columnar-storage file formats available in the Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop.