Learning the Relational Model – The Model (Part III)

In the previous two parts we have seen the concept of Set theory and predicate logic. In this third part we will discuss how these two theories combined to form the relational model.

The path to relational model

Here is a fictional conversation between two people met in the cafeteria of the office floor.

“Excuse me! I’m Fred. Do you work in Accounts?”

“No I’m in Purchase Dept.”

“I haven’t seen you before. I’m in Marketing”

“I’m Mark and I just joined last week on 16th as the manager”

“Oh! So how do you feel about the new job?”

… and the conversation continues…

There some data points hiding in the above conversation. There is a set of department like Accounts, Purchase, and Marketing etc. and there is a set of employee like Fred, Mark etc. Every employee is related to any one of the department from the set of departments. Each employee has a name and a data of join.

In natural language we make assertions about entities of interest by statements of fact—or, in logic, by propositions. For example, this is a proposition:

“The employee with ID number 215 is named Mark, works in Purchase department, and was hired on December 16th, 2013.”

Generalized forms of propositions are predicates. For example, this is a predicate :

P: The employee with ID number (Emp#) is named (Name), works in department (Dept#), and was joined on (Joindate).

The four terms in parentheses are placeholders or parameters that correspond to the four values in the preceding proposition. We can substitute values to the parameters. When you substitute parameters with specific values, a predicate reduces to an individual proposition. For the above predicate let me substitute the below parameter values respectively:

 (215; Mark; Purchase; December 16th, 2013)

You can immediately see that the above set of parameters form a tuple! (We have discussed about tuple in Part I). Let us call this set as Employee relation E. The relation E can be a set of tuple as below.

E =       {

            (215; Mark; Purchase; December 16th, 2013),

            (107; Fred; Marketing; April 08th, 2005),

            (183; Bob; Sales; October 10th, 2010),

            }

Each tuple’s values when replaced with the parameters in the above predicate satisfy the proposition or we can say the proposition is true for the above relation E. In other words a relation E expressing the predicate P would be the set of all sets of values which satisfy the predicate truthfully (in our enterprise or universe).

So how do you define your data model? Just describe a business problem, find predicates, and write them down— the data to the predicate parameters forms a set – you have your data model. Now we can relate the set theory and predicate logic to describe data.

 However there are still missing pieces to make the relational model. Is the 215 for the parameter Emp#, a numeric value or a literal text value? Is the Joindate December 16th, 2013 represents a date or a text? So a formal definition of type of each parameter is required now.

 Type and Domain

 The type that usually called as a data type is a classification identifying the various types of data like integer, real valued, string, date, Boolean (representing true or false) etc., the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored. For example an int data type in a relational database system represents an exact numeric data type and this type supports addition, subtraction, multiplication, and division on the values of that type.

 In addition to data type, there can be a possible list of value applicable to support by the data type or the business requirement. A data domain refers to all the unique values which a data element may contain. For example the Emp# parameter can have values ranging from 1 to 500 for a medium organization. So type of Emp# is int and domain ranges from 1 to 500.

Type is the basic building block of relational model. Domain is usually confused with the term data type. To make the distinction let us consider an example of Age. The age can be represented using the data type int. However the domain of age for a company can have values ranging from 18, the minimum age for employment and 60, the retirement age.

The parameters Emp#, Name, Joineddate etc. we can call it attribute names. Each attribute name has a data type and a domain. A set of an attribute name, corresponding data type, and associated domain is called an attribute. So an attribute A for storing gender information can be represented by the below set:

 A = { (Gender, CHAR, {‘M’, ‘F’}) }

 In other words an attribute is an ordered set of attribute name, type and domain.

Each attribute is associated with a value. The value can be a scalar value (single value) and can have a complex values. For example the value of attribute Gender will be ‘M’.

A set of attribute is called a heading. So the heading for Employee E can be represented as the below set:

 HE =     {

             (Emp#, int, {1,2,3,4…..498,499,500}),

             (Name, char, {alphabets and space}),

             (Dept, char, {Purchase, Marketing, Sales, IT, Admin}),

             (Joindate, date, {01/01/2002 to 12/31/2022})

            }

Note that the header mathematically represents the predicate.

A set of n-Tuple is called a body: so the body B of employee can be represented by the below set:

B = {E1, E2, ….. En}

A relation consists of a heading and a body. The heading of the relation is also the heading of each of its tuples. A named relation is called a relational variable or relvar.

 Here is a pictorial representation of the relation named R, we have learned so far.

Image 

Strength of the relational model

 Several times I have mentioned that the relational model’s power lies in the mathematical foundations of set theory and predicate logic. From the granular level to higher levels this is visible.

 Attribute           = a set of attribute name, type and domain

Tuple                = a set of attributes and values

Body                = a set of tuples

Heading           = a set of attributes

Relation            = a set of heading and body

Based on predicate logic;

Relation            = AND (^) of tuples;

The employee with ID 215 is named Mark, works in department Purchase, was joined on December 16th, 2013 AND

The employee with ID 107 is named Fred, works in department Marketing, was joined on April 8th , 2005 AND

The employee with ID 283 is named Bob, works in department Sales, was joined on October 10th, 2010 AND

…………

This can also be extended to any level, viz. a tuple is the logical AND of all the attributes and values, an attribute is a logical AND of attribute name, type and domain etc.

 If each tuple represents an axiom (a statement of truth or a fact about our universe) we can combine these axioms with mathematical operators, and using logical rules of inference and we can derive new facts that are mathematically provable.

Extracting information from relational model 

E.F Codd introduced two different languages to query the relational model. One is relational algebra, which is a procedural language and the second one is relational calculus, which is a declarative language. In relational algebra a collection of relational operators are applied to the relations, resulting in another relation. The following are the basic relational operators 

Set Operators 

  1. Union (U) – it takes two relations, say, Bike B and Car C, both of which must have the same set of attributes. The union, B U C, consists of all the tuples in B and all the tuples in C and result in a relation Vehicle V. 
  1. Cartesian Product (X) – Cartesian product of A X B, result in a relation in which all possible ordered pairs (a,b) where a is a member of A and b is a member of B. 
  1. Intersection (∩) – The intersection, A ∩ B, consists of those tuples that are both in A and B. 
  1. Difference (-) – The difference, A – B, consists of those tuples in A which are not in B.

 Relational Operators

   5. Selection (σ) – (also called restrict) operator is used to choose a subset of tuples.

          σDept = Sales (E), will return a relation from Employee E in which Dept is Sales. 

  1. Projection (π) – when applied to a relation, the projection operator do the following
  • Removes all attributes from the relation that do not appear in the attribute list
  • Reorder the attribute based on the order in attribute list
  • Eliminate any duplicate rows 

     πEmp#, Name(Employee), will return a relation from Employee E, in which only the Emp# and Name available in that order 

  1. Join (|><|) – Join is a binary operator that combines Cartesian product and Selection operator in a single operation. The result of A |><| B is combination of tuples a from A and b from B where certain conditions are satisfied. 
  1. Division (/) – Extract rows whose column values match those in the second table, but only returns columns that don’t exist in the second table. Suitable for queries that include “for all”. Example, let A represents all the assignments, C represented completed assignments of students, then C / A represents the students who completed “all” the assignments.

 By combining these basic operators we can retrieve information from relational model.

Next

In the next part of this series we will how the above described concepts are actually implemented in relational database systems.

 

Advertisements

Leave a comment

Filed under Basics, Theory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s