Hazelcast

Last updated
Hazelcast
Developer(s) Hazelcast
Stable release
5.3.6 / November 9, 2023;3 months ago (2023-11-09) [1]
Repository
Written in Java
Type in-memory data grid, Data structure store
License Hazelcast: Apache 2.0, [2] Hazelcast Enterprise: Proprietary
Website hazelcast.com

In computing, Hazelcast is a unified real-time data platform [3] based on Java that combines a fast data store with stream processing. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California. [4] [5] [6]

Contents

In a Hazelcast grid, data is evenly distributed among the nodes of a computer cluster, allowing for horizontal scaling of processing and available storage. Backups are also distributed among nodes to protect against failure of any single node. Hazelcast provides central, predictable scaling of applications through in-memory access to frequently used data and across an elastically scalable data grid. These techniques reduce the query load on databases and improve speed.

Hazelcast can run on-premises, in the cloud (Amazon Web Services, Microsoft Azure, Cloud Foundry, OpenShift), virtually (VMware), and in Docker containers. Hazelcast offers technology integrations for multiple cloud configuration and deployment technologies, including Apache jclouds, Consul, etcd, Eureka, Kubernetes, and Zookeeper. The Hazelcast Cloud Discovery Service Provider Interface (SPI) enables cloud-based or on-premises nodes to auto-discover each other.

The Hazelcast platform can manage memory for many types of applications. It offers an Open Binary Client Protocol to support APIs for any binary programming language. The Hazelcast and open-source community members have created client APIs for programming languages that include Java, .NET, C++, Python, Node.js and Go. [7]

Usage

Typical use-cases for Hazelcast include:

Vert.x utilizes it for shared storage. [9]

Hazelcast is also used in academia and research as a framework for distributed execution and storage.

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

<span class="mw-page-title-main">Edge computing</span> Distributed computing paradigm

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data, so that a user of a cloud application is likely to be physically closer to a server than if all servers were in one place. This is meant to make applications faster. More broadly, it refers to any design that pushes computation physically closer to a user, so as to reduce the latency compared to when an application runs on a single data centre. In the extreme case, this may simply refer to client-side computing.

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Within cluster and parallel computing, a cluster manager is usually backend graphical user interface (GUI) or command-line interface (CLI) software that runs on a set of cluster nodes that it manages. The cluster manager works together with a cluster management agent. These agents run on each node of the cluster to manage and configure services, a set of services, or to manage and configure the complete cluster server itself In some cases the cluster manager is mostly used to dispatch work for the cluster to perform. In this last case a subset of the cluster manager can be a remote desktop application that is used not for configuration but just to send work and get back work results from a cluster. In other cases the cluster is more related to availability and load balancing than to computational or specific service clusters.

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

<span class="mw-page-title-main">Cloud computing</span> Form of shared Internet-based computing

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. Large clouds often have functions distributed over multiple locations, each of which is a data center. Cloud computing relies on sharing of resources to achieve coherence and typically uses a pay-as-you-go model, which can help in reducing capital expenses but may also lead to unexpected operating expenses for users.

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

<span class="mw-page-title-main">Redis</span> Open-source in-memory key–value database

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive require large volumes of data and devote most of their processing time to I/O and manipulation of data.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it, as remote service through TCP/IP.

CloudSim is a framework for modeling and simulation of cloud computing infrastructures and services. Originally built primarily at the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, the University of Melbourne, Australia, CloudSim has become one of the most popular open source cloud simulators in the research and academia. CloudSim is completely written in Java. The latest version of CloudSim is CloudSim v6.0.0-beta on GitHub.

Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services (AWS). The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying entirely on slower disk-based databases. ElastiCache supports two open-source in-memory caching engines: Memcached and Redis.

<span class="mw-page-title-main">Apache Ignite</span>

Apache Ignite is a distributed database management system for high-performance computing.

References

  1. "Release v5.3.6". GitHub . Retrieved 2023-12-20.
  2. "Licensing". Hazelcast Reference Manual.
  3. "Streaming and IMDG Coming Together: Hazelcast Platform 5.0 is Released!". Hazelcast. Retrieved 2021-07-14.
  4. "Home". Hazelcast. Retrieved 2022-08-16.
  5. Penchikala, Srini (2013-09-18). "Java In-Memory Grid Hazelcast gets VC Funding from Bain Capital". infoq.com. Retrieved 2013-12-11.
  6. Novet, Jordan (2014-09-18). "Hazelcast adds $11M to grow its business based on an open-source in-memory data grid". VentureBeat . Retrieved 2020-12-28.
  7. "Hazelcast Clients". Hazelcast Platform Reference Manual.
  8. "Memcache Client". Hazelcast IMDG Reference Manual.
  9. Kim, Jaehong (2017-06-16). "Understanding Vert.x Architecture - Part II" . Retrieved 2020-12-28.
  10. Kathiravelu, Pradeeban; Veiga, Luís (9 September 2014). Concurrent and Distributed CloudSim Simulations. IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). Paris. pp. 490–493. CiteSeerX   10.1.1.714.4924 . doi:10.1109/MASCOTS.2014.70.
  11. Kathiravelu, Pradeeban; Veiga, Luís (8 December 2014). An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures. IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC), 2014. London. pp. 79–88. doi:10.1109/UCC.2014.16.
  12. Dixit, Advait Abhay; Hao, Fang; Mukherjee, Sarit; Lakshman, TV; Kompella, Ramana (20 October 2014). ElastiCon: an elastic distributed sdn controller. Tenth ACM/IEEE symposium on Architectures for networking and communications systems. pp. 17–28. Retrieved 2020-12-28.
  13. Kathiravelu, Pradeeban; Galhardas, Helena; Veiga, Luís (28 October 2015). ∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data. On the Move to Meaningful Internet Systems: OTM 2015 Conferences. Rhodes, Greece. pp. 237–256. doi:10.1007/978-3-319-26148-5_14.