Lecture 2: Data or Database Models

GIS Database Systems

J Mwaura

Data Models

Database design focuses on how the database structure will be used to store and manage end-user data

Data model is a relatively simple representation, usually graphical, of more complex real-world data structures

  • A model is an abstraction of a more complex real-world object or event
  • A model helps to understand the complexities of the real-world environment
  • Data model represents data structures and their characteristics, relations, constraints, transformations, and other constructs with the purpose of supporting a specific problem domain

Data Models

Data modeling is an iterative, progressive process. You start with a simple understanding of the problem domain, and as your understanding increases, so does the level of detail of the data model

Good data modeling, yields a blueprint whose components are;

  • A description of the data structure that will store the end-user data
  • A set of enforceable rules to guarantee the integrity of the data
  • A data manipulation methodology to support the real-world data transformations

Data Models: Basic Building Blocks

An entity is a person, place, thing, or event about which data will be collected and stored

  • An entity represents a particular type of object in the real world, which means an entity is distinguishable i.e., each entity occurrence is unique and distinct e.g. Customer

An attribute is a characteristic of an entity

  • e.g. a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, customer address, and customer credit limit

Data Models: Basic Building Blocks

Relationships describes an association among entities e.g. a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent

  1. One-to-many (1:M or 1..*) relationship a customer (the one) may generate many invoices, but each invoice (the many) is generated by only a single customer. The CUSTOMER generates INVOICE relationship is labeled 1:M
  2. Many-to-many (M:N or *..*) relationship a student can take many classes and each class can be taken by many students. The STUDENT takes CLASS relationship is labeled M:N
  3. One-to-one (1:1 or 1..1) relationship a retail company's management structure may require that each of its stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a single store. The EMPLOYEE manages STORE relationship is labeled 1:1

Data Models: Basic Building Blocks

Constraints is a restriction placed on the data

Constraints are important because they help to ensure data integrity

Constraints are normally expressed in the form of rules:

  • An employee's salary must have values that are between 6,000 and 350,000
  • A student's grade must be between 0 and 12
  • Each class must have one and only one teacher

Data Models: Business Rules

Business rule description of a policy, procedure, or principle within an organization

  • Helps to identify entities, attributes, relationships, and constraints

Examples of business rules are as follows:

  1. A customer may generate many invoices & an invoice is generated by only one customer
  2. A training session cannot be scheduled for fewer than 10 employees or for more than 30 employees
    • the 1st business rules establish two entities (CUSTOMER and INVOICE) and a 1:M relationship between those two entities
    • The 2nd business rule establishes a constraint (no fewer than 10 people and no more than 30 people), two entities (EMPLOYEE and TRAINING), and also implies a relationship between EMPLOYEE and TRAINING

Sources of Business Rules

Company managers

Policy makers

Department managers

Written documentation; procedures, standards & operations manuals

Direct interviews with end users

Importance of Business Rules

  1. Standardize company's view of data
  2. Communication tool between users & designers
  3. Allows the designers to
    • understand the nature, role, & scope of data
    • understand business processes
    • develop appropriate relationships, participation, rules & constraints
  4. Promotes the creation of an accurate data model

Data Models: Business Rules

Read on;

  • Translating Business Rules into Data Model Components
  • Naming Conventions

Evolution of Data Models

The quest for better data management has led to several models that attempt to resolve the previous model’s critical shortcomings and to provide solutions to ever-evolving data management needs

  • File system
  • Hierarchical and Network Models
  • Relational
  • Object-oriented Object/relational (O/R)
  • XML Hybrid DBMS
  • Key-value store Column store

Evolution of Data Models

data-models

Data Models

Hierarchical model

  • Contains levels, or segments
  • Depicts a set of one-to-many (1:M) relationships between a parent and its children segments - Each parent can have many children, but each child has only one parent

Network model

  • An early data model that represented data as a collection of record types in 1:M relationships
  • It uses following concepts - still used today;
  • Schema - a logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other
  • Subschema - a portion of the database that interacts with application programs

Data Models

Network model

  • Data Manipulation Language (DML)- a set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK
  • Data Definition Language (DDL)- the language that allows a database administrator to define the database structure, schema, and subschema

Relational model

  • It's based on mathematical set theory and represents data as independent relations
  • Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns
  • The relations are related to each other through the sharing of common entity characteristics (values in columns)

Data Models

Relational model

  • Tuple- In the relational model, a table row
  • Relational database management system (RDBMS)- A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data
  • Relational diagram- A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities
  • Entity Relationship (ER) model (ERM)- A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams

Data Models: Relational Model

data-models

Data Models: Relational Model

data-models

Data Models

Relational model

  • Entity Relationship Diagram (ERD)- A diagram that depicts an entity relationship model's entities, attributes, and relations
  • Entity instance (entity occurrence)- A row in a relational table
  • Entity set- A collection of like entities
  • Connectivity- The type of relationship between entities. Classifications include 1:1, 1:M, and M:N
  • Types of relationships using three ER notations:
    1. The original Chen notation
    2. The Crow's Foot notation
    3. The newer class diagram notation, which is part of the Unified Modeling Language (UML)

Relational Model: ER Model Notations

data-models

Data Models: Object-Oriented (OO) model

Class - A collection of similar objects with shared structure (attributes) and behavior (methods). It encapsulates an object's data representation and a method's implementation

Class hierarchy - The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass

Inheritance - In the object-oriented data model, its the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy

Data Models: Object-Oriented (OO) model

Method - is a named set of instructions to perform an action Methods represent real-world actions, and are invoked through messages

Unified Modeling Language (UML) - A language based on object- oriented concepts that provides tools such as diagrams and symbols to graphically model a system

Class diagram - A diagram used to represent data and their relationships in UML object notation

Comparison: OO, UML & ER Models

oo-models

Emerging Data Models: Big Data

The need to find new, better ways to manage large amounts of web-generated data & derive business insight from it, while providing high performance & scalability at a reasonable cost

3 Vs - 3 basic characteristics of Big Data databases: volume, velocity, and variety

Big Data technologies; Hadoop, MapReduce, & NoSQL databases

MapReduce - An open-source application programming interface (API) that provides fast data analytics services

Big Data: Hadoop

Hadoop - A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data

Hadoop Distributed File System (HDFS) - A highly distributed, fault- tolerant file storage system designed to manage large amounts of data at high speeds. Consists of name, data & client nodes

  • Name nodes- stores all the metadata about the file system
  • Data nodes- stores fixed-size data blocks (it's a replicative node)
  • client nodes- acts as the interface between the user application and the HDFS

Big Data: NoSQL

A new generation of database management systems that is not based on the traditional relational database mode.Its general characteristics includses;

  • Are based on key-value data model - hence the name NoSQL
  • Support distributed database architectures
  • Provide high scalability, high availability, & fault tolerance
  • Support very large amounts of sparse data
  • Are geared toward performance rather than transaction consistency

NoSQL umbrella; document databases, graph stores, column stores, and key-value stores

Some successful of products; Amazon's SimpleDB, Google's BigTable & Apache's Cassandra

Big Data: NoSQL

Key-value - A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values

  • The key-value data model is also called the associative or attribute-value data model

Sparse data - A case in which the number of table attributes is very large but the number of actual data instances is low

Eventual consistency - A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually

NoSQL: A Key-Value Representation

kv-models

Wrap Up: Data Models Evolution

oo-models

Data model: (De)merits

dems-models

End of Lecture 2

GIS Database Systems

Data Model: Assignments

Design a conceptual model for COETEC college

Design a conceptual model for GEGIS department student's project management

That's it!

Queries about this Lesson, please send them to: jmwaura@jkuat.ac.ke

*References*

  • Database Systems: Design, Implementation, and Management, 12th ed. Carlos Coronel & Steven Morris
  • Database Modeling and Design; Logical Design, 5th ed. Taby Teorey et.al
  • Fundamentals of database systems, 6th ed. Ramez Elmasri & Shamkant B. Navathe
Courtesy of
GIS Database Systems