Hortonworks

Last updated
Hortonworks, Inc.
Type Subsidiary
Industry Computer software
Founded2011;12 years ago (2011)
Headquarters,
United States
ProductsHortonworks Data Platform, Hortonworks DataFlow, Hortonworks DataPlane
Number of employees
~1,110 (2017) [1]
Parent Cloudera
Website Hortonworks.com

Hortonworks was a data software company based in Santa Clara, California that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing.

Contents

Hortonworks software was used to build enterprise data services and applications such as IoT (connected cars, for example), single view of X (such as customer, risk, patient), and advanced analytics and machine learning (such as next best action and realtime cybersecurity). Hortonworks had three interoperable product lines:

In January 2019, Hortonworks completed its merger with Cloudera. [3]

History

Hortonworks was formed in June 2011 as an independent company, funded by $23 million venture capital from Yahoo! and Benchmark Capital. Its first office was in Sunnyvale, California. [4] The company employed contributors to the open source software project Apache Hadoop. [5] The Hortonworks Data Platform (HDP) product, first released in June 2012, [6] included Apache Hadoop and was used for storing, processing, and analyzing large volumes of data. The platform was designed to deal with data from many sources and formats. The platform included Hadoop technology such as the Hadoop Distributed File System, MapReduce, Pig, Hive, HBase, ZooKeeper, and additional components. [7]

Eric Baldeschweiler (from Yahoo) was initial chief executive, and Rob Bearden chief operating officer, formerly from SpringSource. Benchmark partner Peter Fenton was a board member. The company name refers to the character Horton the Elephant, since the elephant is the symbol for Hadoop. [4] [8]

In October 2018, Hortonworks and Cloudera announced they would be merging in an all-stock merger of equals. [9] After the merger, the Apache products of Hortonworks became Cloudera Data Platform.

Related Research Articles

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Doug Cutting</span> American information theorist

Douglass Read Cutting is a software designer, advocate, and creator of open-source search technology. He founded two technology projects, Lucene, and Nutch, with Mike Cafarella. Both projects are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

WANdisco, plc. develops technology that moves large Internet of Things (IoT) datasets, edge data, and Hadoop on-premises data lakes at scale to the cloud so organizations can activate their data for machine learning, artificial intelligence, and data analytics on modern cloud platforms, including Microsoft Azure, Amazon Web Services, Google, Oracle, Databricks, and Snowflake.

Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers.

Within computing database management systems, the RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

<span class="mw-page-title-main">MapR</span>

MapR was a business software company headquartered in Santa Clara, California. MapR software provides access to a variety of data sources from a single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management system, and event stream processing, combining analytics in real-time with operational applications. Its technology runs on both commodity hardware and public cloud computing services. In August 2019, following financial difficulties, the technology and intellectual property of the company were sold to Hewlett Packard Enterprise.

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.

WibiData was a software company that developed big data applications for enterprises to personalize their customer experiences. It developed applications based on open-source technologies Apache Hadoop, Apache Cassandra, Apache HBase, Apache Avro and the Kiji Project. Wibidata was founded under the name Odiago in 2010 by Christophe Bisciglia, Aaron Kimball, and Garrett Wu. Based in San Francisco, California, WibiData was backed by investors such as Canaan Partners, New Enterprise Associates, SV Angel, and Eric Schmidt.

Platfora, Inc. is a big data analytics company based in San Mateo, California. The firm’s software works with the open-source software framework Apache Hadoop to assist with data analysis, data visualization, and sharing.

<span class="mw-page-title-main">PSSC Labs</span>

PSSC Labs is a California-based company that provides supercomputing solutions in the United States and internationally. Its products include "high-performance" servers, clusters, workstations, and RAID storage systems for scientific research, government and military, entertainment content creators, developers, and private clouds. The company has implemented clustering software from NASA Goddard's Beowulf project in its supercomputers designed for bioinformatics, medical imaging, computational chemistry and other scientific applications.

<span class="mw-page-title-main">Big Data Partnership</span> English big data professional services company

Big Data Partnership was a specialist big data professional services company based in London, UK. It provides consultancy, certified training and support to Europe, the Middle East and Africa-based enterprises.

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native NoSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of NoSQL stores.

<span class="mw-page-title-main">Apache Kylin</span> Open-source distributed analytics engine

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

<span class="mw-page-title-main">Apache NiFi</span>

Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. Leveraging the concept of extract, transform, load (ETL), it is based on the "NiagaraFiles" software previously developed by the US National Security Agency (NSA), which is also the source of a part of its present name – NiFi. It was open-sourced as a part of NSA's technology transfer program in 2014.

Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames, all of which are part of the core Apache Spark distribution; he also served as the release manager for Spark's 2.0 release.

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

<span class="mw-page-title-main">Apache ORC</span> Column-oriented data storage format

Apache ORC is a free and open-source column-oriented data storage format. It is similar to the other columnar-storage file formats available in the Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink and Apache Hadoop.

References

  1. "Hortonworks : Quick Facts - Hortonworks" . Retrieved 15 July 2017.
  2. "Hortonworks upgrades DataPlane Services". 17 April 2018. Retrieved April 30, 2018.
  3. "Feb 2019 Cloudera Hortonworks completed planned merger".
  4. 1 2 Charles Babcock (June 29, 2011). "Hadoop Big Data Startup Spins Out Of Yahoo". Information Week. Archived from the original on July 4, 2011. Retrieved February 21, 2017.
  5. Sarah McBride; Alistair Barr (April 20, 2012). "Big-data investors look for the next Splunk". Reuters. Retrieved February 21, 2017.
  6. "Hortonworks Announces General Availability of Hortonworks Data Platform". Hortonworks. 12 June 2012. Archived from the original on 22 September 2012.
  7. Joab Jackson (November 1, 2011). "HortonWorks Hones a Hadoop Distribution". PC World. Retrieved October 14, 2013.
  8. Cade Metz (June 28, 2011). "Yahoo! seeds Hadoop startup on open source dream: Hortonworks hears a Big Data revolution". The Register. Retrieved February 21, 2017.
  9. "Cloudera and Hortonworks Announce Merger to Create World's Leading Next Generation Data Platform and Deliver Industry's First Enterprise Data Cloud". BusinessWire. 3 October 2018. Retrieved 3 October 2018.