Cassandra hybrid system

Laptop

One of the most widely used NoSQL databases is Cassandra, developed by Facebook. The goal of Cassandra was to create a DBMS that does not have a single point of failure and provides maximum availability. Cassandra is primarily a columnar storage database. In some studies, it has been mentioned as a hybrid system based on Google BigTable, which is a column-store database, and Amazon DynamoDB, which is a key-value database. Keys in Cassandra point to a set of column families based on the distributed Google BigTable file system and Dynamo’s availability features (distributed hash table).

The main characteristics of Cassandra include:

  • No single point of failure. To do this, it must run on a cluster of nodes, not on a single machine. This does not mean that the data on each cluster is the same. When one of the nodes fails, the data on it will be inaccessible. However, other nodes and data will be available.
  • Distributed hashing is a scheme that provides the functionality of a hash table in such a way that adding or removing one slot does not significantly change the mapping of keys to slots. This allows you to distribute the load on servers or nodes according to their capacity and minimize downtime.
  • The client interface is relatively easy to use. It uses Apache Thrift for its client interface, which provides an RPC client in several languages, but most developers prefer open source alternatives based on Apple Thrift, such as Hector.
  • Data replication. Essentially, it displays data for other nodes in the cluster. Replication can be random or defined to maximize data protection, for example, by placing it in a node in another data center.
  • The partitioning policy decides where and on which site to place the key. This can be a random or ordered process. With both types of partitioning policies, Cassandra can strike a balance between load and query performance optimization.
  • Consistency. Replication makes consistency difficult. This is due to the fact that all nodes must be updated at any given time with the most recent values or during a read operation.
  • Read/write actions. The client sends a request to one node. The node, according to the replication policy, stores the data in the cluster. Each node first changes the data in the commit log and updates the table structure, and both changes are performed synchronously. The read request is sent to one single node that contains the data in accordance with the partitioning/allocation policy.