Random Notes On Database Design Just some notes I'm jotting-down for myself, that the principles will "stick" better and so I can use this page for a cheat-sheet from wherever I might happen to be :). One-To-Many Relationships Say you have an employee telephone extension database that lists all the extensions for all the employees in multiple facilities. Each facility will have multiple extension numbers, but each extension number can "belong to" only one facility. In this case: separate tables for facilities and extensions. The extensions table has a column for facility table primary keys. Many-To-Many Relationships Using the employee telephone extension database example again: Multiple employees use some of the same extension numbers and some employees may have more than one extension. Create: . An employee table, . A separate extension table and . An association table that has columns for employee_id and extension_id. These together will form composite primary keys in this table. The Normalization Guidelines "Every nonkey attribute must be functionally dependent upon the key, (1NF) the whole key, and (2NF) nothing but the key, (3NF) so help you Codd." Following Quoted Bits From: "The Practical Guide SQL Handbook: Using Structured Query Language, Third Ed.," Bowman, et al. First Normal Form "At each row-and-column intersection, there must be one and only one value, and that value must be atomic: there can be no repeating groups in a table that satisfies the first normal form." As I Understand It, This Means: You can't have multiple values in cells. You can't have repeating columns in a table. You can't have rows that repeat all the data, save a single column. Second Normal Form "Every nonkey column must depend on, and provide a fact about, the primary key, the whole key, and nothing but the key. Therefore, a table must not contain a nonkey column that pertains to only part of a composite primary key. [All] non-primary key columns (the columns that give information about the subject but do not uniquely define it) [must] relate to the entire primary key and not just to one of its components." For example, given the following record definition ------------------------------------ | PART | WHSE | QTY | WHSE-ADDRESS | ===============--------------------- The key consists of the PART and WAREHOUSE fields together, but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. This needs to be --------------------- -------------------- | PART | WHSE | QTY | | WHSE | WHSE-ADDR | ===============------ ========------------ Third Normal Form "... requires that no nonkey column depend on another nonkey column. Each nonkey column must be a fact about the primary key column." For example Using the employee phone extension example: each employee may or may not have voicemail. Suppose this is dependent on the telephone extension number, rather than the particular employee that may be using it? In such a case: a column containing such information must be placed in a table relating to the extension numbers themselves, as opposed to the employee table. See Also: http://www.troubleshooters.com/littstip/ltnorm.html Forth Normal Form "[This form] forbids independent one-to-many relationships between primary key columns and nonkey columns." "[A] record type should not contain two or more independent multi-valued facts about an entity." For example An employee may have several skills and several languages. Bad ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | =============================== Creates a many-to-many relationship *within* the table. Good -------------------- ----------------------- | EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE | ==================== ======================= Fifth Normal Form This form demands that tables be broken up "into the smallest possible pieces in order to eliminate all redundancy within a table." They go on to note that an advantage of this form is that "...you are assured that each piece of nonkey data ... is likely to occur only once in the database ..." See Also: http://www.databasejournal.com/sqletc/article.php/1428511 http://www.bkent.net/Doc/simple5.htm SELECT Statements SELECT expr(s) WHERE expr(s) qualifies rows FROM table(s) GROUP BY col(s) emits sets HAVING expr(s) qualifies sets ORDER BY col(s) Difference between WHERE and HAVING (tricky, this) select pub_id, type, advance from titles; pub_id | type | advance --------+--------------+---------- 1389 | popular_comp | 8000.00 1389 | business | 5000.00 0736 | psychology | 4000.00 0736 | psychology | 2000.00 1389 | business | 5000.00 0877 | mod_cook | 0.00 *** 0877 | trad_cook | 8000.00 0877 | trad_cook | 4000.00 1389 | popular_comp | 7000.00 0736 | business | 10125.00 *** 0736 | psychology | 2275.00 0736 | psychology | 6000.00 0877 | mod_cook | 15000.00 *** 0877 | trad_cook | 7000.00 0877 | | 1389 | business | 5000.00 0736 | psychology | 7000.00 1389 | popular_comp | select pub_id, type, count(advance) from titles where advance > 10000 group by pub_id, type; pub_id | type | count --------+--------------+------- 0736 | business | 1 0877 | mod_cook | 1 select pub_id, type, count(advance) from titles group by pub_id, type having advance > 10000; pub_id | type | count --------+--------------+------- 0736 | business | 1 0877 | mod_cook | 2 Why are there two 0877/mod_cook's in the 2nd example? (Which is invalid in most SQLs, btw.) In the first (valid) example: All rows not having an advance over 10000 are eliminated, then the results are grouped. In the second example: The results are grouped, then all result sets that contain *any* advance over 10000 are selected. Since the group 0822/mod_cook has a row with an advance over 10000, the group qualifies, and the advances in both rows are counted. Outer Joins PostgreSQL does OUTER JOINs in the FROM clause, as opposed to the WHERE clause. "FROM table1 LEFT OUTER JOIN table2 ON (expr = expr)" is equivalent to "FROM table1, table2 WHERE expr *= expr "FROM table1 RIGHT OUTER JOIN table2 ON (expr = expr)" is equivalent to "FROM table1, table2 WHERE expr =* expr Non-Correlated vs. Corrrelated Sub-Queries table "publishers" pub_id char(4) not null, pub_name varchar(40) null, ... table "titles" ... type char(12) null, // "business," "computer," etc. pub_id char(4) null, ... Non-Correlated Query: select pub_name from publishers where pub_id in (select pub_id from titles where type = 'business') order by pub_name ; Correlated Query: select pub_name from publishers p where 'business' in (select type from titles where pub_id = p.pub_id) order by pub_name ; The non-correlated query will be much faster, as a rule, as only a single query is executed on the table titles. In the correlated query, above, "tables" will be re-queried for each row (pub_id) in "publishers." Sometimes, however, a correlated query is the only way to get the result set you want in a single query. Created: Apr. 2, 2003 Updated: Sep. 21, 2006