Database Management/NoSQL

From Wikiversity
Jump to navigation Jump to search

This lesson introduces NoSQL databases.

Objectives and Skills[edit | edit source]

Objectives and skills for this lesson include:

  • Understand the features and uses of document databases
  • Understand the features and uses of key-value databases
  • Understand the features and uses of graph databases
  • Practice using a variety of NoSQL databases

Readings[edit | edit source]

  1. Wikipedia: NoSQL
  2. Wikipedia: Document-oriented database
  3. Wikipedia: Key–value database
  4. Wikipedia: Graph database

Multimedia[edit | edit source]

  1. YouTube: An Introduction To NoSQL Databases
  2. YouTube: SQL vs NoSQL
  3. YouTube: An introduction to Wikidata

Activities[edit | edit source]

  1. Review Wikipedia: MongoDB. Identify the main features and uses of MongoDB.
  2. Practice using MongoDB.
    • Review MongoDB.
    • Use Docker Playground to implement a MongoDB database environment.
    • Use the mongo command to connect to MongoDB.
    • Create and query a MongoDB collection.
    • Insert, update, and remove documents.
  3. Review Wikipedia: Redis. Identify the main features and uses of Redis.
  4. Practice using Redis.
    • Review Redis.
    • Use Docker Playground to implement a Redis database environment.
    • Use the redis-cli command to connect to Redis.
    • Create and query a Redis database.
    • Create, update, and delete Redis keys.
  5. Review Wikipedia: Wikidata. Identify the main features and uses of Wikidata.
  6. Practice using Wikidata.

Lesson Summary[edit | edit source]

NoSQL[edit | edit source]

  • A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.[1]
  • NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases.[2]
  • NoSQL databases are increasingly used in big data and real-time web applications.[3]
  • NoSQL databases support simplicity of design, simpler "horizontal" scaling to clusters of machines, finer control over availability and limiting the object-relational impedance mismatch.[4]
  • The data structures used by NoSQL databases (e.g. key–value pair, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve.[5]
  • Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability, partition tolerance, and speed.[6]
  • Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance), lack of ability to perform ad hoc joins across tables, lack of standardized interfaces, and huge previous investments in existing relational databases.[7]
  • Most NoSQL stores lack true ACID transactions. Instead, most NoSQL databases offer a concept of "eventual consistency", in which database changes are propagated to all nodes "eventually" (typically within milliseconds), so queries for data might not return updated data immediately or might result in reading data that is not accurate, a problem known as stale reads. Additionally, some NoSQL systems may exhibit lost writes and other forms of data loss.[8]

Document-Oriented Database[edit | edit source]

  • A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.[9]
  • Document-oriented databases are one of the main categories of NoSQL databases. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal.[10]
  • Document-oriented databases are inherently a subclass of the key-value store, another NoSQL database concept. The difference lies in the way the data is processed; in a key-value store, the data is considered to be inherently opaque to the database, whereas a document-oriented system relies on internal structure in the document in order to extract metadata that the database engine uses for further optimization.[11]
  • Document databases contrast strongly with the traditional relational database (RDB). Relational databases generally store data in separate tables that are defined by the programmer, and a single object may be spread across several tables. Document databases store all information for a given object in a single instance in the database, and every stored object can be different from every other. This eliminates the need for object-relational mapping while loading data into the database.[12]

Key-Value Database[edit | edit source]

  • A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.[13]
  • Key–value databases work in a very different fashion from the better known relational databases (RDB). RDBs predefine the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key–value systems treat the data as a single opaque collection, which may have different fields for every record.[14]

Graph Database[edit | edit source]

  • A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data.[15]
  • A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.[16]
  • Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.[17]

Key Terms[edit | edit source]

ACID (Atomicity, Consistency, Isolation and Durability)
A set of RDBMS properties of database transactions intended to guarantee validity even in the event of errors.[18]
BASE model (Basically Available, Soft state, Eventual consistency)
Used by NoSQL databases to achieve improved scalability using larger amounts of data.[19]
CAP theorem (Brewer's theorem)
States that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency, Availability of resources and Partition tolerance.[20]
NoSQL
Provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.[21]

See Also[edit | edit source]

References[edit | edit source]