Lecture 8: Distributed Database Management

Database Systems

J Mwaura

DDBMS

Distributed database management system (DDBMS)

  • A DBMS that supports a database distributed across several different sites;
  • a DDBMS governs the storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites

An example of a Centralized Database Management System

data-modals

Centralized Data Access: Problems

  • Performance degradation because of a growing number of remote locations over greater distances
  • High costs associated with maintaining and operating large central (mainframe) database systems and physical infrastructure
  • Reliability problems created by dependence on a central site (single point of failure syndrome) and the need for data replication
  • Scalability problems associated with the physical limits imposed by a single location, such as physical space, temperature conditioning, and power consumption
  • Organizational rigidity imposed by the database, which means it might not support the flexibility and agility required by modern global organizations

Distributed Data Access: Factors

Growing acceptance of the Internet as the platform for data access and distribution

Mobile wireless revolution access data from geographically dispersed locations and require varied data exchanges in multiple formats; data, voice, video, music, & pictures

Accelerated growth of companies using applications as a service

Increased focus on mobile business intelligenceAs companies use social networks to get closer to customers, the need for on-the-spot decision making increases

Emphasis on Big Data analytics Today's organizations are investing in ways to harvest data to discover new ways to effectively & efficiently reach customers

Distributed Processing & Distributed Databases

Distributed processing

  • Sharing the logical processing of a database over two or more sites connected by a network
  • Distributed processing system uses only a single-site database but shares the processing tasks among several sites

Distributed database

  • A logically related database that is stored in two or more physically independent sites
  • In a distributed database system, a database is composed of several parts known as database fragments

Database fragment

  • A subset of a distributed database. Although the fragments may be stored at different sites within a computer network, the set of all fragments is treated as a single database

Distributed Processing Environment

data-modals

Distributed Database Environment

data-modals

Characteristics of DDMS

Application interface to interact with the end user, application programs, and other DBMSs within the distributed database

Validation to analyze data requests for syntax correctness

Transformation to decompose complex requests into atomic data request components

Query optimization to find the best access strategy which database fragments must be accessed by the query, and how must data updates, if any, be synchronized?

Mapping to determine the data location of local and remote fragments

I/O interface to read or write data from or to permanent local storage

Characteristics of DDMS

Formatting to prepare the data for presentation to the end user or to an application program

Security to provide data privacy at both local & remote databases

Backup and recovery to ensure the availability and recoverability of the database in case of a failure

DB administration features for the database administrator

Concurrency control to manage simultaneous data access and to ensure data consistency across database fragments in the DDBMS

Transaction management to ensure that the data moves from one consistent state to another

A Fully Distributed Database Management System

data-modals

DDBMS Components

Computer workstations or remote devices (sites or nodes) that form the network system

  • DDBMS must be independent of the computer system hardware

Network hardware and software components that reside in each workstation or device

  • The network components allow all sites to interact and exchange data

Communications media that carry the data from one node to another

  • DDBMS must be communications media-independent

DDBMS Components

Transaction processor (TP) is the software component found in each computer or device that requests data

  • The transaction processor receives and processes the application’s remote and local data requests
  • The TP is also known as the application processor (AP) or the transaction manager (TM)

Data processor (DP) is the software component residing on each computer or device that stores and retrieves data located at the site

  • The DP is also known as the data manager (DM)
  • A data processor may even be a centralized DBMS

Distributed Database System Components

data-modals

Levels of Data and Process Distribution

Single-site processing, single-site data (SPSD)

  • A scenario in which all processing is done on a single host computer and all data is stored on the host computer's local disk
data-modals

Levels of Data and Process Distribution

Multiple-site processing, single site data (MPSD)

  • A scenario in which multiple processes run on different computers sharing a single data repository
data-modals

Levels of Data and Process Distribution

Multiple-site processing, multiple site data (MPMD)

  • A scenario describing a fully distributed database management system with support for multiple data processors and transaction processors at multiple sites
  • Classifications;
  • Homogeneous DDBMS - A system that integrates only one type of centralized database management system over a network
  • Heterogeneous DDBMS - A system that integrates different types of centralized database management systems over a network

Distributed Database Transparency Features

Distribution transparency

  • allows a distributed database to be treated as a single logical database

Transaction transparency

  • ensures that the transaction will be either entirely completed or aborted, thus maintaining database integrity

Failure transparency

  • ensures that the system will continue to operate in the event of a node or network failure. Functions that were lost because of the failure will be picked up by another network node

Performance transparency

  • The system should be able to scale out in a transparent manner or increase performance capacity by adding more transaction or data-processing nodes, without affecting the overall performance of the system

Heterogeneity transparency

  • allows the integration of several different local DBMSs (relational, network, and hierarchical) under a common, or global, schema

Read on

Distributed Database Design

  • Data Fragmentation
  • Data Replication
  • Data Allocation

The CAP Theorem

C. J. Date's 12 Commandments for Distributed Databases

End of Lecture 8

Database Systems

That's it!

Queries about this Lesson, please send them to: jmwaura@jkuat.ac.ke

*References*

  • Database Systems: Design, Implementation, and Management, 12th ed. Carlos Coronel & Steven Morris
  • Database Modeling and Design; Logical Design, 5th ed. Taby Teorey et.al
  • Fundamentals of database systems, 6th ed. Ramez Elmasri & Shamkant B. Navathe
Courtesy of
Database Systems