Lecture 2: Data or Database Models

Database Systems

J Mwaura

Data Models

Database design focuses on how the database structure will be used to store and manage end-user data

Data model is a relatively simple representation, usually graphical, of more complex real-world data structures

  • A model is an abstraction of a more complex real-world object or event
  • A model helps to understand the complexities of the real-world environment
  • Data model represents data structures and their characteristics, relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain

Data Models

Data modeling is an iterative, progressive process. You start with a simple understanding of the problem domain, and as your understanding increases, so does the level of detail of the data model

Good data modeling, yields a blueprint whose elements are;

  • A description of the data structure that will store the end-user data
  • A set of enforceable rules to guarantee the integrity of the data
  • A data manipulation methodology to support the real-world data transformations

Components of Data Models

An entity is a person, place, thing, or event about which data will be collected and stored

  • An entity represents a particular type of object in the real world, which means an entity is distinguishable i.e., each entity occurrence is unique and distinct e.g. Customer

An attribute is a characteristic of an entity

  • e.g. a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, customer address, and customer credit limit

Components of Data Models

Relationships describes an association among entities e.g. a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent

  1. One-to-many (1:M or 1..*) relationship a customer (the one) may generate many invoices, but each invoice (the many) is generated by only a single customer. The CUSTOMER generates INVOICE relationship is labeled 1:M
  2. Many-to-many (M:N or *..*) relationship a student can take many classes and each class can be taken by many students. The STUDENT takes CLASS relationship is labeled M:N
  3. One-to-one (1:1 or 1..1) relationship a retail company's management structure may require that each of its stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a single store. The EMPLOYEE manages STORE relationship is labeled 1:1

Components of Data Models

Constraints is a restriction placed on the data

Constraints are important because they help to ensure data integrity

Constraints are normally expressed in the form of rules:

  • An employee's salary must have values that are between 6,000 and 350,000
  • A student's grade must be between 0 and 12
  • Each class must have one and only one teacher

Data Models: Business Rules

Business rule description of a policy, procedure, or principle within an organization

  • Helps to identify entities, attributes, relationships, and constraints

Examples of business rules are as follows:

  1. A customer may generate many invoices & an invoice is generated by only one customer
  2. A training session cannot be scheduled for fewer than 10 employees or for more than 30 employees
    • the 1st business rules establish two entities (CUSTOMER and INVOICE) and a 1:M relationship between those two entities
    • The 2nd business rule establishes a constraint (no fewer than 10 people and no more than 30 people), two entities (EMPLOYEE and TRAINING), and also implies a relationship between EMPLOYEE and TRAINING

Translating Business Rules into Data Model Components

Generally, nouns translate into entities

Verbs translate into relationships among entities

Relationships are bidirectional

Two questions to identify the relationship type:

  • How many instances of B are related to one instance of A?
  • How many instances of A are related to one instance of B?

Naming Conventions

Naming occurs during translation of business rules to data model components

Names should make the object unique and distinguishable from other objects

Names should also be descriptive of objects in the environment and be familiar to users

Proper naming:

  • Facilitates communication between parties
  • Promotes self-documentation

Sources of Business Rules

  • Organisation managers
  • Policy makers
  • Department managers
  • Written documentation
    • Procedures
    • Standards
    • Operations manuals
  • Direct interviews with end users

Importance of Business Rules

Standardize company's view of data

Useful as a communications tool between users and designers

Allows the designer to

  • understand the nature, role, and scope of data
  • understand business processes
  • develop appropriate relationship participation rules and constraints

Promotes the creation of an accurate data model

Evolution of Data Models

The quest for better data management has led to several models that attempt to resolve the previous model's critical shortcomings and to provide solutions to ever-evolving data management needs

  • File system
  • Hierarchical and Network Models
  • Relational
  • Object-oriented Object/relational (O/R)
  • XML Hybrid DBMS
  • Key-value store & Column store

Evolution of Data Models

data-models

Data Models

Hierarchical model

  • Contains levels, or segments
  • Depicts a set of one-to-many (1:M) relationships between a parent and its children segments - Each parent can have many children, but each child has only one parent

Network model

  • An early data model that represented data as a collection of record types in 1:M relationships
  • It uses following concepts - still used today;
  • Schema - a logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other
  • Subschema - a portion of the database that interacts with application programs

Data Models

Network model

  • Data Manipulation Language (DML)- a set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK
  • Data Definition Language (DDL)- the language that allows a database administrator to define the database structure, schema, and subschema

Relational model

  • It's based on mathematical set theory and represents data as independent relations
  • Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns
  • The relations are related to each other through the sharing of common entity characteristics (values in columns)

Data Models

Relational model

  • Tuple- In the relational model, a table row
  • Relational database management system (RDBMS)- A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data
  • Relational diagram- A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities
  • Entity Relationship (ER) model (ERM)- A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams

Data Models: Relational Model

data-models

Data Models: Relational Model

data-models

Data Models

Relational model

  • Entity Relationship Diagram (ERD)- A diagram that depicts an entity relationship model's entities, attributes, and relations
  • Entity instance (entity occurrence)- A row in a relational table
  • Entity set- A collection of like entities
  • Connectivity- The type of relationship between entities. Classifications include 1:1, 1:M, and M:N
  • Types of relationships using three ER notations:
    1. The original Chen notation
    2. The Crow's Foot notation
    3. The newer class diagram notation, which is part of the Unified Modeling Language (UML)

Relational Model: ER Model Notations

data-models

Data Models: Object-Oriented (OO) model

Class - A collection of similar objects with shared structure (attributes) and behavior (methods). It encapsulates an object's data representation and a method's implementation

Class hierarchy - The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass

Inheritance - In the object-oriented data model, its the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy

Data Models: Object-Oriented (OO) model

Method - is a named set of instructions to perform an action Methods represent real-world actions, and are invoked through messages

Unified Modeling Language (UML) - A language based on object- oriented concepts that provides tools such as diagrams and symbols to graphically model a system

Class diagram - A diagram used to represent data and their relationships in UML object notation

Comparison: OO, UML & ER Models

oo-models

Data Models: Object/Relational & XML

Extended Relational Data Model (ERDM) - A model that includes the object-oriented model's best features in an inherently simpler relational database structural environment

Object/relational Database Management System (O/R DBMS)- A DBMS based on the extended relational model (ERDM). The ERDM constitutes the relational model's response to the OODM

Extensible Markup Language (XML) - A metalanguage used to represent and manipulate data elements. It permits the manipulation of a document's data elements & facilitates documents sharing over the Internet

Emerging Data Models: Big Data

The need to find new, better ways to manage large amounts of web-generated data & derive business insight from it, while providing high performance & scalability at a reasonable cost

3 Vs - 3 basic characteristics of Big Data databases: volume, velocity, and variety

Big Data technologies; Hadoop, MapReduce, & NoSQL databases

MapReduce - An open-source application programming interface (API) that provides fast data analytics services

Big Data: Hadoop

Hadoop - A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data

Hadoop Distributed File System (HDFS) - A highly distributed, fault- tolerant file storage system designed to manage large amounts of data at high speeds. Consists of name, data & client nodes

  • Name nodes- stores all the metadata about the file system
  • Data nodes- stores fixed-size data blocks (it's a replicative node)
  • client nodes- acts as the interface between the user application and the HDFS

Big Data: NoSQL

A new generation of database management systems that is not based on the traditional relational database mode.Its general characteristics includses;

  • Are based on key-value data model - hence the name NoSQL
  • Support distributed database architectures
  • Provide high scalability, high availability, & fault tolerance
  • Support very large amounts of sparse data
  • Are geared toward performance rather than transaction consistency

NoSQL umbrella; document databases, graph stores, column stores, and key-value stores

Some successful of products; Amazon's SimpleDB, Google's BigTable & Apache's Cassandra

Big Data: NoSQL

Key-value - A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values

  • The key-value data model is also called the associative or attribute-value data model

Sparse data - A case in which the number of table attributes is very large but the number of actual data instances is low

Eventual consistency - A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually

NoSQL: A Key-Value Representation

kv-models

Wrap Up: Data Models Evolution

oo-models

Data model: (De)merits

dems-models

Data Model: Assignments

Design a conceptual model for COETEC college

Design a conceptual model for GEGIS student registration

Design a conceptual model for GEGIS student Class scheduling

Design a conceptual model for GEGIS department student's project management

End of Lecture 2

Database Systems

That's it!

Queries about this Lesson, please send them to: jmwaura@jkuat.ac.ke

*References*

  • Database Systems: Design, Implementation, and Management, 12th ed. Carlos Coronel & Steven Morris
  • Database Modeling and Design; Logical Design, 5th ed. Taby Teorey et.al
  • Fundamentals of database systems, 6th ed. Ramez Elmasri & Shamkant B. Navathe
Courtesy of
Database Systems