In this fourth part of the series we will briefly discuss how the relational model has been implemented in relational databases.
We are now familiar with the conceptual relational model. The conceptual model deals with unordered sets. In practical aspect, specifically in computer, we cannot physically represent an unordered set. So a practical implementation is a bit different from the relational model.
- A named relation (relvar) is implemented as Table. A table has a predefined internal structure. The structure has variance in implementation by the different RDBMS vendors
- An attribute is formed by a name, data type (can be different for different vendors), and constraints (enforcing domain integrity) and is called a column. The lists of columns form the heading of the relation (table).
- A tuple is implemented as a row. Each row in a table also has fixed format (still vendor specific) and has some order (“… row has some order” is actually a misnomer. A row is usually identified by a row pointer or RID. Though the pointers or RIDs have some order, due to the implementations of data manipulation statements and data concurrency models, we cannot expect an order for row when we fetch data from a table)
Data is stored in a table and in respective columns as a series of rows.
How do you manipulate the data? How do you save and retrieve information from tables? We have already a seen a language to retrieve information from the relational model; i.e. relational algebra! But for the practical purposes, a new special purpose language is required to query the relational model. So IBM has developed a query language called SEQUEL (Structured English Query Language) based upon the relational algebra for their System R database system that is considered to be the first implementation of Codd’s relational model. This name was later changed to SQL (Structured Query Language).
SQL is still based on relational model and relational algebra (but not fully). However some features are added to the language. For example, based on set theory, the elements in a set are unique. When applying some combinations of relational operators on relations, the resulting relation may contain duplicate tuples. Avoiding duplicates is not practical since this may degrade performance of database applications. So another mathematical concept called Multiset (bag) is introduced to solve this problem. A multiset is very much similar to set; the difference is elements in a multiset may not necessarily be unique. The SQL language is then extend to support multisets and introduced more operators to manage the multisets and other real world functionalities. Some of them are;
- Rename operator – to rename and attribute (column) name
- Extended projection operator – new attributes are creating from existing attributes of relations. This can be arithmetic operations or applying function on attributes.
- Aggregate operator (grouping operator) –produces new relation by summarizing attributes. Usually this grouping is followed by applying aggregate functions on attributes such as sum, average, count, minimum, and maximum to get new insight of data.
- Duplicate elimination (δ) – Returns relation by elimination all but exactly one copy of each tuple. SQL is using DISTINCT and UNION language operators to eliminate duplicates
- Outer join – The outer join is defined in such a way that every tuple from either of the original relations appears in some way in the joined relation. The outer join operator addresses the fact that some tuples in a relation are not represented in a natural join involving the relation. This is because there is no matching tuple in the relation being joined. This is called ‘dangling tuple’
- Data modification operators – These are used for inserting, updating and deleting data.
Based on these basic and extended operators, the SQL language is standardized by ANSI and ISO and still expanding to add new language features. Some features include, Common Table Expressions, read-only INFORMATION_SCHEMA tables for metadata (data dictionary) querying, temporal data retrieval etc. Each RDBMS vendor is actually may or may not implement all the ANSI/ISO version of SQL and add their own features to the SQL language implementation for user’s ease-of-use. So this is up to the individual to select a particular RDBMS based on the requirement and the features.
There is no commercial RDBMS product that we can call as truly or fully relational model complaint. There are some implementation challenges that hinder the RDBMS product become fully relational. Hope I can write another post on this subject in the future.
One more important point is relational algebra and SQL is declarative; i.e. you are asking what you want from the data stored instead of how you want to retrieve the data stored. This is a big paradigm change for a procedural language developer. Thinking in sets and relation is important when working with RDBMS and SQL, though procedural elements are included in RDBMS by many vendors in the form of cursors and loops.
- RDBMS products are built on top of strong mathematical foundations on set and predicate theory
- By applying basic and extended relational operators we can retrieve many complex information from data stored in a relational model
- SQL is used to query and manipulate data from RDBMS is based on strong mathematical language called relational algebra and extended to support practical applications
- SQL is based on multiset (bag) theory and hence SQL is a multiset theoretic language
- A multiset result can be converted to a set by applying duplicate elimination operators like DISTINCT or UNION. However performance may affect
The below table lists the relational algebra operators and SQL syntax similar to that
|No.||Relational Operator||SQL syntax|
|1||SELECT||SELECT * FROM Employee WHERE Dept_ID = 18|
|2||PROJECT||SELECT JoinDate, F_Name FROM Employee WHERE Dept_ID = 18|
|3||RENAME||SELECT F_Name AS FirstName FROM Employee|
|4||UNION||SELECT F_Name FROM Contract_Emp UNION ALL SELECT F_Name FROM Permenant_Emp|
|5||INTERSECT||SELECT F_Name FROM Employee WHERE Skill = ‘C#’ INTERSECT SELECT F_Name FROM Employee WHERE Skill = ‘SQL’|
|6||DIFFERENCE/MINUS||SELECT F_Name FROM Employee WHERE Skill = ‘C#’ EXCEPT SELECT F_Name FROM Employee WHERE Skill = ‘SQL’|
|7||PRODUCT||SELECT Y.YearID, S.SeasonFROM Season S CROSS JOIN Year Y|
|8||JOIN||SELECT E.Name, E.EmpID, E.DeptName, D.ManagerName FROM Employee E INNER JOIN Dept D ON D.DeptName = E.DeptName|
|9||DIVISON||There is no SQL operator that translates to a DIVISION operator. However we can implement relational division using many SQL statements available. A full discussion of relational division is out of the scope of this article. Read Joe Celko’s excellent article on this subject here (https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/)|
|10||Extended Project||SELECT F_Name + ‘ ‘ + L_Name AS EmpName, BasicPay * (12.0 / 100.0) AS TravelAllowance FROM Employee|
|11||Aggregate||SELECT DeptName, AVG(Salary) AS AverageSalary FROM EmpSalary GROUP BY DeptName|
|12||Duplicate removal||SELECT DISTINCT Dept_ID FROM Employee or SELECT EmpName FROM Permenant_Emp UNION SELECT EmpName FROM ManagementCouncil|
If you learn the relational model and relational algebra and the operators, you can design good databases and make the queries simple and portable. When you execute a query, the RDBMS’ query optimizer is converting the SQL dialect to a relational algebra tree and retrieving the data from the underlying physical tables based on the algebra tree (though I’ve simply said it, the query optimization and creating efficient and simple algebra tree is a big topic and still researches are going on in this area). Hope you enjoyed the series.