Using NoSQL Archives - ID-2Sbo

Indexing structures

Rice John — Wed, 26 Oct 2022 13:30:00 +0000

Indexing is the process of associating a key with the location of the corresponding data record in the database. There are many data indexing structures used in NoSQL databases. B-Tree is one of the most common index structures in a DBMS. In it, internal nodes can have a variable number of children within a certain range.

One of the main differences from other tree structures, such as AVL, is that B-Tree allows for a variable number of children, which means less tree balancing but more space wastage. B+Tree is one of the most popular variants of B-trees. This improvement (unlike B-Tree) requires all keys to be in a leaf.

The T-Trees data structure was developed by combining the features of AVL Trees and B-Trees. AVL trees are a type of self-balancing binary search trees, while B-Trees are unbalanced, and each node can have a different number of children.

The structure of a T-tree is very similar to an AVL-tree and a B-tree. Each node stores a tuple {key-value, pointer}. In addition, binary search is used in combination with multiple nodes and tuples to provide better memory and performance.

A T-tree has three types of nodes: a node with right and left children, a terminal node with no children, and a half-leaf node with only one child. T-Trees are considered to have better overall performance.

The post Indexing structures appeared first on ID-2Sbo.

The evolution of NoSQL

Rice John — Thu, 20 Oct 2022 13:10:00 +0000

NoSQL is a storage that does not conform to the relational database model and its characteristics, such as they do not have schemas, are not joined, and do not guarantee the ACID property. The NO system scales horizontally and uses a wide amount of main memory of the computer, solving the problem of large amounts of information.

Native native types are a new methodology for developing non-relational NoSQL databases implemented by large companies to meet corporate needs, such as Google’s BigTable, which is considered the first NoSQL system, and Amazon DynamoDB. The success of these systems gave rise to the development of a number of similar open source and proprietary database systems, the most popular of which are Hypertable, Cassandra, MongoDB, DynamoDB.

The SQL scalability problem has been recognized by Web 2.0 companies with huge, growing data and infrastructure needs, such as Google, Amazon, and Facebook. They found their own solutions to the problem by implementing BigTable, DynamoDB, and Cassandra technologies. The growing interest has led to the emergence of a number of NoSQL database management systems (DBMS) with a focus on performance, reliability, and consistency. A number of existing indexing structures have been reused and improved in order to improve search and read performance.

The term was coined by Calor Strozzi back in 1998 and revived in 2009 by Rackspace employee Eric Evans to address the problems of web companies with a large volume of transactions and information.

One key difference between NoSQL databases and traditional relational databases is that the former is a form of unstructured storage.

Thus, NoSQLs do not have a fixed table structure, as in a relational system. This table provides a brief comparison of NoSQL and SQL capabilities.

It should be noted that the table shows a comparison at the database level, not the DBMSs that implement both models. These systems provide their own proprietary methods to overcome some of the problems and shortcomings of both systems, which greatly improves performance and reliability.

The post The evolution of NoSQL appeared first on ID-2Sbo.

How to design non-relational databases

Rice John — Wed, 07 Sep 2022 13:33:00 +0000

Each type of NoSQL has its own peculiarities. I will focus on Key-Value as the most common non-relational database. Moreover, this database is used in one of my projects, namely DynamoDB. Its choice, like any other database, can hardly be called rational. The final decision is influenced by many factors: from business requirements to customer preferences. However, this does not negate attempts to make a conscious project evaluation.

How do you know if you need a non-relational database?

The project has a large dataset. When you have more than 1 TB of data and its index does not fit in RAM, it is better to abandon the relational model.
High update speed. If you need more than 100 KUps, choose NoSQL.
Availability. A non-relational database can have a fixed latency of 2 ms and provide high read and ping speeds, as well as no locks. In SQL, this is possible only on small datasets.
Stable access pattern. NoSQL is ideal for business applications where clear access patterns can be identified. These can be queries that do not change or change slowly. For example, a list of certain products in an online store.
Accounting prevails over analysis. This is the OLTP approach mentioned above, which is associated with access patterns.

After choosing the right type of database, proceed to design. In the case of NoSQL, we are limited in the ways we can access data. Therefore, you need to customize the model for frequent and important queries. To do this, analyze the Use Cases and all product requirements in detail. This way, you will be able to correctly assess how to look at your data. This will form the basis of the future data structure.

The post How to design non-relational databases appeared first on ID-2Sbo.

Types of information storages

Rice John — Mon, 20 Jun 2022 13:14:00 +0000

The NoSQL Key-Value database type uses a hash table in which a unique key points to an element. They can be organized into logical groups, requiring uniqueness within their boundaries. This allows you to use identical keys in different logical groups. Some database implementations provide caching mechanisms that significantly increase their performance.

All you need to work with the items stored in the database is a key. The data is stored as a JSON string or BLOB (large binary object). One of the biggest disadvantages of this form is the lack of consistency at the database level. This can be added during the development of a NoSQL database by programmers with their own code, but it also requires more effort, due to the complexity of implementation and time. The most famous NoSQL database built on a key value store is Amazon DynamoDB.

Document stores are similar to key value stores in that they do not contain a schema and are based on a value model. So, both types have the same advantages and disadvantages. Both lack consistency at the database level, which does not allow applications to provide more reliable functions. Nevertheless, there are some key differences between them. In document stores, values (documents) provide the encoding for the stored data. These encodings can be XML, JSON, or BSON (binary JSON). The most popular database application that uses a document store is MongoDB.

In the Column Family database, data is stored in columns, not in rows, as is done in most relational database management systems. A column store consists of one or more column families that logically group certain columns in the database. A key is used to identify and specify the number of columns with a keyspace attribute that defines its scope. Each column contains tuples of names and values, ordered and separated by commas.

Column stores provide quick read/write access to the stored data. In it, the columns of a row correspond to a single column and are stored as a single disk record. This provides faster access during read/write operations. The most popular databases that use NoSQL database column storage are examples: Google BigTable, HBase, and Cassandra.

The NoSQL Graph Bd database uses the structure of an oriented graph to represent data. A graph consists of edges and nodes.

The post Types of information storages appeared first on ID-2Sbo.

Principle of database operation

Rice John — Sun, 09 Jan 2022 13:18:00 +0000

NoSQLs work like a file where all data is stored, they allow you to work with a huge amount of information and organize it so that users can access it anytime they need it. Currently, there are different types of NoSQL, each of them works differently, most are written in C++. We can say that NoSQL databases center their functions on the basis of:

Horizontal scalability with the ability to increase its size, increase the storage space in the database without compromising performance.
Cloud technology. Most NoSQL databases base their storage in the cloud to free up more space. Besides, they have nodes for information replication.
Efficient use of resources. Companies are currently in the process of technological transition, so it is almost necessary that they have a database that allows them to implement new technological tools. NoSQL data works for this very purpose – a flexible model allows you to quickly adapt to new tools.
A free scheme of functioning. NoSQLs do not have a rigid system, so programmers have the freedom to change data as needed. This means that if you need to change the definition of a field or data type, it is no problem, unlike SQL databases, where such changes are associated with great difficulties.
Response speed. The speed in the database is measured by latency, which is the response time, NoSQL is concerned with minimizing the latency as much as possible.
Use of indexes. SQL and NoSQL both need indexes because queries can be made across millions of records if an index has not been configured. In NoSQL, indexes are created in the form of a B-Tree, which means that the nodes are balanced, which means that the search speed increases.

The post Principle of database operation appeared first on ID-2Sbo.

Cassandra hybrid system

Rice John — Thu, 23 Dec 2021 13:27:00 +0000

One of the most widely used NoSQL databases is Cassandra, developed by Facebook. The goal of Cassandra was to create a DBMS that does not have a single point of failure and provides maximum availability. Cassandra is primarily a columnar storage database. In some studies, it has been mentioned as a hybrid system based on Google BigTable, which is a column-store database, and Amazon DynamoDB, which is a key-value database. Keys in Cassandra point to a set of column families based on the distributed Google BigTable file system and Dynamo’s availability features (distributed hash table).

The main characteristics of Cassandra include:

No single point of failure. To do this, it must run on a cluster of nodes, not on a single machine. This does not mean that the data on each cluster is the same. When one of the nodes fails, the data on it will be inaccessible. However, other nodes and data will be available.
Distributed hashing is a scheme that provides the functionality of a hash table in such a way that adding or removing one slot does not significantly change the mapping of keys to slots. This allows you to distribute the load on servers or nodes according to their capacity and minimize downtime.
The client interface is relatively easy to use. It uses Apache Thrift for its client interface, which provides an RPC client in several languages, but most developers prefer open source alternatives based on Apple Thrift, such as Hector.
Data replication. Essentially, it displays data for other nodes in the cluster. Replication can be random or defined to maximize data protection, for example, by placing it in a node in another data center.
The partitioning policy decides where and on which site to place the key. This can be a random or ordered process. With both types of partitioning policies, Cassandra can strike a balance between load and query performance optimization.
Consistency. Replication makes consistency difficult. This is due to the fact that all nodes must be updated at any given time with the most recent values or during a read operation.
Read/write actions. The client sends a request to one node. The node, according to the replication policy, stores the data in the cluster. Each node first changes the data in the commit log and updates the table structure, and both changes are performed synchronously. The read request is sent to one single node that contains the data in accordance with the partitioning/allocation policy.

The post Cassandra hybrid system appeared first on ID-2Sbo.