Designed for all purposes
ACID
Strong consistancy, concurrency, recovery
Mathematical background - set theory
Standard Query language (SQL)
Lots of tools to use with i.e. Reporting services, entity frameworks
Era of distributed computing
...but relational databases were not built for distributed applications
Because...
Joins are expensive
Hard to scale horizontally
Impedance mismatch occurs
Expensive (product cost, hardware, Maintenance)
Era of distributed computing
...but relational databases were not built for distributed applications
And...
It's weak in:
Spread-out of web applications or services handling Big Data
Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making
Mobile use of internet
Cloud computing
Collaboration
IP-based communication
Social media
Video streaming & media distribution
Visibility - making big data accessible in a timely fashion to relevant stakeholders
Discover and analyze information - data fusion to generate new information and patterns
Segmentation and customizations - creating highly specific segmentations and tailor products and services. e.g. segmentation of customers
Aid decision making - improve decision making, minimize risks, and unearth valuable insights e.g. Automated Fraud Alert systems in credit card processing
Innovation - innovation of new ideas in form of products and services
Policies and procedures - compliance of data privacy, security, intellectual property and protection of big data
Access to data - access to 3rd parties data can pose a legal, contractual challenge
Technology and techniques - inadequacy of the legacy systems to deal with Big Data & lack of experienced resources in newer technologies
Structure of Big Data - unstructured data such as images, videos, logs etc
Data storage & processing - more memory needs & new analyses algorithms and analytic softwares
Provides:
NoSQL avoids:
Data which requires flexible schema
When ACID support is not really necessary
Object-relational impedance mismatch - conceptual and technical difficulties
Need for distributed or scalable application
Logging data from distributed sources
Storing events/temporal data - shopping carts, wish lists etc.
Polyglot persistence
i.e. best data store depending on nature of data
Financial data
Data requiring strict ACID compliance
Business critical data
In NoSQL Databases
NoSQL databases models
Each database has its own query language
BASE - Basically Available, Soft state, Eventually consistent - allows replicated computer nodes to temporarily hold diverging data versions and only be updated with a delay
CAP Theorem by Eric Brewer - states that in any massive distributed data management system, only two of the three properties consistency, availability, and partition tolerance can be ensured
but, We need a distributed database system having such feature; Fault tolerance, High availability, Consistency, Scalability
ACID | BASE |
---|---|
Consistency is the top priority (strong consistency) | Consistency is ensured only eventually (weak consistency) |
Mostly pessimistic concurrency control methods with locking protocols | Mostly optimistic concurrency control methods with nuanced setting options |
Availability is ensured for moderate volumes of data | High availability and partition tolerance for massive distributed data storage |
Some integrity restraints (e.g., referential integrity) are ensured by the database schema | Some integrity restraints (e.g., referential integrity) are ensured by the database schema |
At the hardware level, CPUs work with registers based on this model
Programming languages use the same concept in associative arrays
Simplest database model possible - is data storage that stores a data object as a value for another data object as key
Uses simple command,e.g., SET, GET
SET User:U17547:firstname John
SET User:U17547:lastname Nzue
SET User:U17547:email john.nzue@jkuat.net
GET User:U17547:email
>>john.nzue@jkuat.netkey-value stores do not support any kind of structure, neither nesting nor references
Use special characters such as colons or slashes
key-value store properties
Example: Amazon DynamoDB, Redis
Data has no required format data may have any format
Data model: (key, value) pairs
Basic Operations: Insert(key,value), Fetch(key), Update(key), Delete(key)
Often, the data matrix needs to be structured with a schema
Column-family stores enhance the key-value concept by providing additional structure
The column is lowest/smallest instance of data
It is a tuple that contains a name, a value and a timestamp
Stores data not in enhanced and structured multidimensional key spaces - column families
Example: Google Bigtable, Cassandra
Column-family store properties
Pair each key (document ID) with complex data structure known as documents e.g. JSON/BSON format such as {"hello":"world"}
Indexes are done via B-Trees
Document stores are completely schema-free - the demerit of not having schema is the missing referential integrity & normalization
Documents can contain many different key-value pairs, or key-array pairs, or even nested documents
Document store properties
Examples: MongoDB, CouchDB, JSON stores
Graph model has a structuring schema as opposed to first 3 models that forgo database schemas and referential integrity for the sake of easier fragmentation (sharding)
Data is stored as nodes & edges, which belong to a node type or edge type, respectively, and contain data in the form of attribute-value pairs
The relationships between data objects are explicitly present as edges, and referential integrity is ensured by the DBMS
Based on Graph Theory
Scale vertically, no clustering
You can use graph algorithms easily
Supports transactions
Observes ACID
Graph data model properties
The advantage of the graph database is the index-free adjacency property - For every node, the database system can find the direct neighbor, without having to consider all edges
Business intelligence - decisions making based on facts gathered from the analysis of the available data
Data analysis is often complex - due to heterogeneity, volatility, and fragmentation of the data, cross-application
Business intelligence makes 3 demands on the data to be analyzed
Data warehouse - a data warehouse or DWH is a distributed information system with the following properties
Integrated
- data from various sources and applications (source systems) is periodically integrated and filed in a uniform schemaRead only
- data in the data warehouse is not changed once it is writtenHistoricized
- thanks to a time axis, data can be evaluated for different points in timeAnalysis-oriented
- all data on different subject areas like customers, contracts, or products is fully available in one placeDecision support
- the information in data cubes serves as a basis for management decisionsClassification
Selection
Prognosis
Knowledge acquisition
Mongo database properties;
{"author":"mike","text":"..."} change to {"author":"eliot","text":"...","tags":["mongodb"]}
Mongo database is less good at;
Queries about this Lesson, please send them to:
*References*
- Database Systems: Design, Implementation, and Project Management, Springer.
Albert K W Yeung & G. Brent Hall
- Database Systems: Design, Implementation, and Management, 12th ed.
Carlos Coronel & Steven Morris
- Database Modeling and Design; Logical Design, 5th ed.
Taby Teorey et.al
- Fundamentals of database systems, 6th ed.
Ramez Elmasri & Shamkant B. Navathe
Courtesy of …