Random Notes On Database Design

    Just some notes I'm jotting-down for myself, that the principles
    will "stick" better and so I can use this page for a cheat-sheet
    from wherever I might happen to be :).

    One-To-Many Relationships

	Say you have an employee telephone extension database that lists
	all the extensions for all the employees in multiple facilities.
	Each facility will have multiple extension numbers, but each
	extension number can "belong to" only one facility.  In this case:
	separate tables for facilities and extensions.  The extensions
	table has a column for facility table primary keys.

    Many-To-Many Relationships

	Using the employee telephone extension database example again:
	Multiple employees use some of the same extension numbers and some
	employees may have more than one extension.

	Create:

	    . An employee table,

	    . A separate extension table and

	    . An association table that has columns for
	      employee_id and extension_id.  These together will form
	      composite primary keys in this table.

    The Normalization Guidelines

	    "Every nonkey attribute must be functionally dependent upon
	     the key, (1NF)
	     the whole key, and (2NF)
	     nothing but the key, (3NF)
	     so help you Codd."

	Following Quoted Bits From: "The Practical Guide SQL Handbook:
				     Using Structured Query Language,
				     Third Ed.," Bowman, et al.

	First Normal Form

	    "At each row-and-column intersection, there must be one and
	     only one value, and that value must be atomic: there can be
	     no repeating groups in a table that satisfies the first
	     normal form."

	    As I Understand It, This Means:

		You can't have multiple values in cells.
		You can't have repeating columns in a table.
		You can't have rows that repeat all the data, save
		  a single column.

	Second Normal Form

	    "Every nonkey column must depend on, and provide a fact about,
	     the primary key, the whole key, and nothing but the key.
	     Therefore, a table must not contain a nonkey column that
	     pertains to only part of a composite primary key.  [All]
	     non-primary key columns (the columns that give information
	     about the subject but do not uniquely define it) [must]
	     relate to the entire primary key and not just to one of its
	     components."

	    For example, given the following record definition

		------------------------------------
		| PART | WHSE | QTY | WHSE-ADDRESS |
		===============---------------------

		The key consists of the PART and WAREHOUSE fields
		together, but WAREHOUSE-ADDRESS is a fact about the
		WAREHOUSE alone.

		This needs to be

		---------------------  -------------------- 
		| PART | WHSE | QTY |  | WHSE | WHSE-ADDR |
		===============------  ========------------


	Third Normal Form

	    "... requires that no nonkey column depend on another
	     nonkey column.  Each nonkey column must be a fact about the
	     primary key column."

	    For example

		Using the employee phone extension example: each employee
		may or may not have voicemail.  Suppose this is dependent
		on the telephone extension number, rather than the
		particular employee that may be using it?  In such a case:
		a column containing such information must be placed in a
		table relating to the extension numbers themselves, as
		opposed to the employee table.

	See Also: http://www.troubleshooters.com/littstip/ltnorm.html

	Forth Normal Form

	    "[This form] forbids independent one-to-many relationships
	     between primary key columns and nonkey columns."

	    "[A] record type should not contain two or more independent
	     multi-valued facts about an entity."

	    For example

		An employee may have several skills and several languages.

		Bad

		    -------------------------------
		    | EMPLOYEE | SKILL | LANGUAGE |
		    ===============================

		    Creates a many-to-many relationship *within* the
		    table.

		Good

		    --------------------  -----------------------
		    | EMPLOYEE | SKILL |  | EMPLOYEE | LANGUAGE |
		    ====================  =======================

	Fifth Normal Form

	    This form demands that tables be broken up "into the smallest
	    possible pieces in order to eliminate all redundancy within a
	    table."  They go on to note that an advantage of this form is
	    that "...you are assured that each piece of nonkey data ... is
	    likely to occur only once in the database ..."

    See Also:

	http://www.databasejournal.com/sqletc/article.php/1428511
	http://www.bkent.net/Doc/simple5.htm


    SELECT Statements

	SELECT expr(s)
	 WHERE expr(s)     qualifies rows
	 FROM table(s)
	  GROUP BY col(s)  emits sets
	  HAVING expr(s)   qualifies sets
	ORDER BY col(s)


	Difference between WHERE and HAVING (tricky, this)

	    select pub_id, type, advance from titles;
	     pub_id |     type     | advance  
	    --------+--------------+----------
	     1389   | popular_comp |  8000.00
	     1389   | business     |  5000.00
	     0736   | psychology   |  4000.00
	     0736   | psychology   |  2000.00
	     1389   | business     |  5000.00
	     0877   | mod_cook     |     0.00 ***
	     0877   | trad_cook    |  8000.00
	     0877   | trad_cook    |  4000.00
	     1389   | popular_comp |  7000.00
	     0736   | business     | 10125.00 ***
	     0736   | psychology   |  2275.00
	     0736   | psychology   |  6000.00
	     0877   | mod_cook     | 15000.00 ***
	     0877   | trad_cook    |  7000.00
	     0877   |              |         
	     1389   | business     |  5000.00
	     0736   | psychology   |  7000.00
	     1389   | popular_comp |         

	    select pub_id, type, count(advance) from titles 
	    where advance > 10000
	    group by pub_id, type;
	     pub_id |     type     | count 
	    --------+--------------+-------
	     0736   | business     |     1
	     0877   | mod_cook     |     1


	    select pub_id, type, count(advance) from titles 
	    group by pub_id, type
            having advance > 10000;
	     pub_id |     type     | count 
	    --------+--------------+-------
	     0736   | business     |     1
	     0877   | mod_cook     |     2


	    Why are there two 0877/mod_cook's in the 2nd example?  (Which
	    is invalid in most SQLs, btw.)

	    In the first (valid) example: All rows not having an advance
	    over 10000 are eliminated, then the results are grouped.
	    
	    In the second example: The results are grouped, then all
	    result sets that contain *any* advance over 10000 are
	    selected.  Since the group 0822/mod_cook has a row with an
	    advance over 10000, the group qualifies, and the advances
	    in both rows are counted.

	Outer Joins

	    PostgreSQL does OUTER JOINs in the FROM clause, as opposed to
	    the WHERE clause.

	    "FROM table1 LEFT OUTER JOIN table2 ON (expr = expr)" is
		equivalent to "FROM table1, table2 WHERE expr *= expr

	    "FROM table1 RIGHT OUTER JOIN table2 ON (expr = expr)" is
		equivalent to "FROM table1, table2 WHERE expr =* expr

	Non-Correlated vs. Corrrelated Sub-Queries

	    table "publishers"

		pub_id   char(4) not null,
		pub_name varchar(40) null,
		...

	    table "titles"

		...
		type      char(12) null,    // "business," "computer," etc.
		pub_id    char(4) null,
		...

	    Non-Correlated Query:

		select pub_name
		from publishers
		where pub_id in
		 (select pub_id
		  from titles
		  where type = 'business')
		order by pub_name
		;

	    Correlated Query:

		select pub_name
		from publishers p
		where 'business' in
		 (select type
		  from titles
		  where pub_id = p.pub_id)
		order by pub_name
		;

	    The non-correlated query will be much faster, as a rule, as
	    only a single query is executed on the table titles.  In the
	    correlated query, above, "tables" will be re-queried for each
	    row (pub_id) in "publishers."

	    Sometimes, however, a correlated query is the only way to get
	    the result set you want in a single query.


Created: Apr.  2, 2003
Updated: Sep. 21, 2006