Hector (API)

Hector
Original author(s)	Ran Tavory
Final release	2.0 / July 16, 2014;7 years ago
Repository	github.com/hector-client/hector
Written in	Java
Type	Column-oriented DBMS
License	MIT License
Website	prettyprint.me/2010/02/23/hector-a-java-cassandra-client/

Last updated November 18, 2021

Hector is a high-level client API for Apache Cassandra. Named after Hector, a warrior of Troy in Greek mythology, it is a substitute for the Cassandra Java Client, or Thrift,^[2] that is encapsulated by Hector.^[3] It also has Maven repository access.^[4]

History

As Cassandra is shipped with the low-level Thrift (protocol), there was a potential to develop a better protocol for application developers. Hector was developed by Ran Tavory as a high-level interface that overlays the shortcomings of Thrift. It is licensed with the MIT License that allows to use, modify, split and change the design.^{[ dubious – discuss ]}

Features

The high-level features of Hector are^[2]

A high-level object oriented interface to Cassandra: It is mainly inspired by the Cassandra-java-client. The API is defined in the Keyspace interface.
Connection pooling. As in high-scale applications, the usual pattern for DAOs is a large number of reads/writes. It is too expensive for clients to open new connections with each request. So, a client may easily run out of available sockets, if it operates fast enough. Hector provides connection pooling and a nice framework that manages the details.
Failover support: As Cassandra is a distributed data store where hosts (nodes) may go down. Hector has its own failover policy.

Type	Comment
`FAIL_FAST`	If an error occurs, it fails
`ON_FAIL_TRY_ONE_NEXT_AVAILABLE`	Tries one more host before giving up
`ON_FAIL_TRY_ALL_AVAILABLE`	Tries all available hosts before giving up

JMX support: Hector exposes JMX for many important runtime metrics, such as number of available connections, idle connections, error statistics.
Load balancing: A simple load balancing exists in the newer version.^[5]
Supports the command design pattern to allow clients to concentrate on their business logic and let Hector take care of the required plumbing.

Availability metrics

Hector exposes availability counters and statistics through JMX.^[6]

Load balancing

Hector follows two load balancing policies with the LoadBalancingPolicy interface. The default is called RoundRobinBalancingPolicy and is a simple round-robin distribution algorithm. The LeastActiveBalancingPolicy routes requests to the pools having the lowest number of active connections, ensuring a good spread of utilisation across the cluster. . ^[7]

Pooling

The ExhaustedPolicy determines how the underlying client connection pools are controlled. Currently, three options are available:^[8]

Type	Comment
`WHEN_EXHAUSTED_FAIL`	Fails acquisition when no more clients are available
`WHEN_EXHAUSTED_GROW`	The pool is automatically increased to react to load increases
`WHEN_EXHAUSTED_BLOCK`	Block on acquisition until a client becomes available (the default)

Code examples

As an example, an implementation of a simple distributed hashtable over Cassandra is listed.

/**   * Insert a new value keyed by key   * @param key Key for the value   * @param value the String value to insert   */publicvoidinsert(finalStringkey,finalStringvalue)throwsException{execute(newCommand(){publicVoidexecute(finalKeyspaceks)throwsException{ks.insert(key,createColumnPath(COLUMN_NAME),bytes(value));returnnull;}});}/**   * Get a string value.   * @return The string value; null if no value exists for the given key.   */publicStringget(finalStringkey)throwsException{returnexecute(newCommand(){publicStringexecute(finalKeyspaceks)throwsException{try{returnstring(ks.getColumn(key,createColumnPath(COLUMN_NAME)).getValue());}catch(NotFoundExceptione){returnnull;}}});}/**   * Delete a key from cassandra   */publicvoiddelete(finalStringkey)throwsException{execute(newCommand(){publicVoidexecute(finalKeyspaceks)throwsException{ks.remove(key,createColumnPath(COLUMN_NAME));returnnull;}});}

Related Research Articles

In computing, the Java Remote Method Invocation is a Java API that performs remote method invocation, the object-oriented equivalent of remote procedure calls (RPC), with support for direct transfer of serialized Java classes and distributed garbage-collection.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring applications, system objects, devices and service-oriented networks. Those resources are represented by objects called MBeans. In the API, classes can be dynamically loaded and instantiated. Managing and monitoring applications can be designed and developed using the Java Dynamic Management Kit.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, see Comparison of the Java and .NET platforms.

Streaming API for XML (StAX) is an application programming interface (API) to read and write XML documents, originating from the Java programming language community.

The Spring Framework is an application framework and inversion of control container for the Java platform. The framework's core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform. Although the framework does not impose any specific programming model, it has become popular in the Java community as an addition to the Enterprise JavaBeans (EJB) model. The Spring Framework is open source.

Generics are a facility of generic programming that were added to the Java programming language in 2004 within version J2SE 5.0. They were designed to extend Java's type system to allow "a type or method to operate on objects of various types while providing compile-time type safety". The aspect compile-time type safety was not fully achieved, since it was shown in 2016 that it is not guaranteed in all cases.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It forms a remote procedure call (RPC) framework and was developed at Facebook for "scalable cross-language services development". It combines a software stack with a code generation engine to build cross-platform services which can connect applications written in a variety of languages and frameworks, including ActionScript, C, C++, C#, Cappuccino, Cocoa, Delphi, Erlang, Go, Haskell, Java, JavaScript, Objective-C, OCaml, Perl, PHP, Python, Ruby, Elixir, Rust, Scala, Smalltalk and Swift. It was developed at Facebook and it is now (2020) an open source project in the Apache Software Foundation. The implementation was described in an April 2007 technical paper released by Facebook, now hosted on Apache.

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

CUBRID ( "cube-rid") is an open-source SQL-based relational database management system (RDBMS) with object extensions developed by CUBRID Corp. for OLTP. The name CUBRID is a combination of the two words cube and bridge, cube standing for a space for data and bridge standing for data bridge.

Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation. It provides methods to query and update data in a database, and is oriented toward relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the Java virtual machine (JVM) host environment.

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

pycassa is a client library for Apache Cassandra.

A super column is a tuple with a binary super column name and a value that maps it to many columns. They consist of a key–value pairs, where the values are columns. Theoretically speaking, super columns are (sorted) associative array of columns. Similar to a regular column family where a row is a sorted map of column names and column values, a row in a super column family is a sorted map of super column names that maps to column names and column values.

A keyspace in a NoSQL data store is an object that holds together all column families of a design. It is the outermost grouping of the data in the data store. It resembles the schema concept in Relational database management systems. Generally, there is one keyspace per application.

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

In database management systems (DBMS), a prepared statement or parameterized statement is a feature used to pre-compile SQL code, separating it from data. Benefits of prepared statements are:

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems via Kafka Connect and provides Kafka Streams, a Java stream processing library. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

Apache Commons Logging is a Java-based logging utility and a programming model for logging and for other toolkits. It provides APIs, log implementations, and wrapper implementations over some other tools.

The Java programming language's Java Collections Framework version 1.5 and later defines and implements the original regular single-threaded Maps, and also new thread-safe Maps implementing the java.util.concurrent.ConcurrentMapinterface among other concurrent interfaces. In Java 1.6, the java.util.NavigableMap interface was added, extending java.util.SortedMap, and the java.util.concurrent.ConcurrentNavigableMap interface was added as a subinterface combination.

References

↑ "Releases · hector-client/Hector". GitHub .
1 2
Ran Tavory. "Hector – a Java Cassandra client". PrettyPrint.me. Retrieved 2011-03-23.
Out of the box Cassanra provides a raw thrift client, which is OK, but lacks many features essential to real world clients. I’ve built Hector to fill this gap.
Here are the high level features of Hector, currently hosted at github.
A high-level object oriented interface to cassandra.
Failover support.
Connection pooling.
JMX support.
Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.
↑ "Hector Client for Apache Cassandra: Encapsulation of Thrift API" (PDF). DataStax. Retrieved 2011-04-12. Hector now completely encapsulates the Thrift API so developers have to deal only with the Hector client using familiar design patterns. The original API is still available for existing users to transition their current projects as well as for those who are comfortable working with Thrift.
↑ "Hector Client for Apache Cassandra: Fully Mavenized" (PDF). DataStax. Retrieved 2011-04-12. Since the beta release of Cassandra 0.7.0, Riptano has been offering maven repository access for dependencies required for Cassandra usage via Hector.
↑ Ran Tavory. "Load balancing and improved failover in Hector". PrettyPrint.me. Retrieved 2011-03-23. ve added a very simple load balancing feature, as well as improved failover behavior to Hector. Hector is a Java Cassandra client, to read more about it please see my previous post Hector – a Java Cassandra client. In version 0.5.0-6 I added poor-man’s load balancing as well as improved failover behavior.
↑ "Hector Client for Apache Cassandra: Availability of Metrics" (PDF). DataStax. Retrieved 2011-04-12. To facilitate smoother operations and better awareness of performance characteristics, Hector exposes both availability counters and, optionally, performance statistics through JMX.
↑ "Hector Client for Apache Cassandra: Basic Load Balancing" (PDF). DataStax. Retrieved 2011-04-12. Hector provides for plugable load balancing through the LoadBalancingPolicy interface. Out of the box, two basic implementations are provided: LeastActiveBalancingPolicy (the default) and RoundRobinBalancingPolicy. LeastActiveBalancingPolicy routes requests to the pools with the lowest number of active connections. This ensures a good spread of utilization across the cluster by sending requests to the machine that has the fewest connections. RoundRobinBalancingPolicy implements a simple round-robin distribution algorithm.
↑ "Hector Client for Apache Cassandra: Configuration of Pooling" (PDF). DataStax. Retrieved 2011-04-12. The behavior of the underlying pools of client connections can be controlled by the ExhaustedPolicy. […]

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Releases · hector-client/Hector". GitHub .

[Hector-2] 1 2 Ran Tavory. "Hector – a Java Cassandra client". PrettyPrint.me. Retrieved 2011-03-23. Out of the box Cassanra provides a raw thrift client, which is OK, but lacks many features essential to real world clients. I’ve built Hector to fill this gap.
Here are the high level features of Hector, currently hosted at github.
A high-level object oriented interface to cassandra.
Failover support.
Connection pooling.
JMX support.
Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.

[mwhA] A high-level object oriented interface to cassandra.

[mwhQ] Failover support.

[mwhg] Connection pooling.

[mwhw] JMX support.

[mwiA] Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.

[3] "Hector Client for Apache Cassandra: Encapsulation of Thrift API" (PDF). DataStax. Retrieved 2011-04-12. Hector now completely encapsulates the Thrift API so developers have to deal only with the Hector client using familiar design patterns. The original API is still available for existing users to transition their current projects as well as for those who are comfortable working with Thrift.

[4] "Hector Client for Apache Cassandra: Fully Mavenized" (PDF). DataStax. Retrieved 2011-04-12. Since the beta release of Cassandra 0.7.0, Riptano has been offering maven repository access for dependencies required for Cassandra usage via Hector.

[5] Ran Tavory. "Load balancing and improved failover in Hector". PrettyPrint.me. Retrieved 2011-03-23. ve added a very simple load balancing feature, as well as improved failover behavior to Hector. Hector is a Java Cassandra client, to read more about it please see my previous post Hector – a Java Cassandra client. In version 0.5.0-6 I added poor-man’s load balancing as well as improved failover behavior.

[6] "Hector Client for Apache Cassandra: Availability of Metrics" (PDF). DataStax. Retrieved 2011-04-12. To facilitate smoother operations and better awareness of performance characteristics, Hector exposes both availability counters and, optionally, performance statistics through JMX.

[7] "Hector Client for Apache Cassandra: Basic Load Balancing" (PDF). DataStax. Retrieved 2011-04-12. Hector provides for plugable load balancing through the LoadBalancingPolicy interface. Out of the box, two basic implementations are provided: LeastActiveBalancingPolicy (the default) and RoundRobinBalancingPolicy. LeastActiveBalancingPolicy routes requests to the pools with the lowest number of active connections. This ensures a good spread of utilization across the cluster by sending requests to the machine that has the fewest connections. RoundRobinBalancingPolicy implements a simple round-robin distribution algorithm.

[8] "Hector Client for Apache Cassandra: Configuration of Pooling" (PDF). DataStax. Retrieved 2011-04-12. The behavior of the underlying pools of client connections can be controlled by the ExhaustedPolicy. […]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]