Text example: CAP database comprising 4 tables: fig. 2.1, fig. 2.2
Notation:
CAP= {CUSTOMERS, AGENTS, PRODUCTS, ORDERS}
Head(CUSTOMERS)= cid cname city discnt
Domain of City: all the city names in the U.S.
Domain of DIscnt: all float #s between 0.00 and 20.00.
Column type (Column Domain) A table is DECLARED in SQL. Columns have certain TYPES as in SQL: float4, integer, char(13) [This concept is not implemented in commercial systems today. Only have types such as char(13) and float4. ]
Mathematical Interpretation:
Say CID = Domain(cid), CNAME = Domain(cname), CITY and DISCNT are the domains for CUSTOMERS table columns, then consider:
CID x CNAME x CITY x DISCNT (Cartesian Product of Domains)
consisting of all tuples: (w, x, y, z), w in CID, x in CNAME, . . .
ALL POSSIBLE tuples: (c999, Beowulf, Saskatoon, 0.01),…
A superkey is a set of columns that has the uniqueness property, and a key is a minimal superkey: no subset of columns also has uniqueness property.
A superkey for CUSTOMERS is: cid, cname; A key is: cid (alone)
Rigorous definitions on pg. 37, Def 2.4.1. (This is a rephrasing.)
· Let A1, A2, . . . , An be the attributes of a table T (subset of Head(T)). Let S be a subset of these attributes, Ai1, Ai2, . . . , Aik. Then S is a SUPERKEY for T if the set S distinguishes rows of T by designer intention, i.e. if u and v are two rows then for some Aim in S, u[Aim] ? v[Aim] (new notation).
· A set of attributes K is a KEY (a CANDIDATE KEY) if it is a minimal superkey, i.e., no proper subset of K will also act as a superkey.
The Relational Algebra defines some operations that may be performed on compatible tables (homogeneous tuple collections). These operations are UNION, INTERSECTION, DIFFERENCE, SELECTION, PRODUCT (TIMES), PROJECTION, JOIN, and DIVISION.
· The first four produce a compatible (same heading) or empty table with the fourth, SELECTION, being unary.
R where Condition. The Condition is a logical condition that can be determined from the values of a single row of C.
Ai µ Aj or Ai µ a, where Ai and Aj are attributes of R, a is a constant.
µ is any of: <, >, =, <=, >=, <>
If C, C', also get C and C', C or C', and finally: not C.
Example: Give the cid and cname of all customers living in Dallas with discount greater than 8.
(CUSTOMERS where city = 'Dallas' and discnt > 8) [cid, cname]
· The PRODUCT (TIMES) is a binary operation that builds the Cartesian product of the two tuple collections, resulting in a table whose head in the concatenation of the heads of the two tables.
· PROJECTION is a unary operation that reduces the head of a table (drops coordinates or columns).
· JOIN is like product in that the head of the resulting table is the concatenation of the two heads, but it contains only a subset of the tuples of PRODUCT.
o The theta-JOIN operation specifies a coordinate in each table and selects from the PRODUCT only those tuples that show the same value in those two coordinates (or satisfy a relationship other than equal).
o A JOIN without specifying paired columns is interpreted as a join over all of the columns with identical names in the head of each table.
Given two tables R and S, Head(R) = AB, Head(S) = BC,
R S
|
A |
B |
|
B |
C |
|
a1 |
b1 |
|
b1 |
c1 |
|
a2 |
b2 |
|
b2 |
c2 |
|
a3 |
b5 |
|
b3 |
c3 |
|
|
|
|
b4 |
c4 |
R JOIN S
|
A |
B |
C |
|
a1 |
b1 |
c1 |
|
a2 |
b2 |
c2 |
§ A JOIN of two tables with no common column names is the same as the PRODUCT of the two tables (as if the JOIN were performed using default columns in each table containing a single identical value).
§ If the two tables are compatible, their JOIN is the same as their INTERSECTION since the JOINed table does not duplicate the coordinates involved in determining the JOIN (which, by definition would be equal).
§ An OUTER JOIN will insure that a tuple is created for every value in either column by mating and unmatched value to a null. This matching could be restricted to preserving unmatched values on only the right side or the left side of the operation.
o
DIVISION (DIVIDEBY)
of table R by table S requires that the head of R include the head of S,
and the resulting table T has the tuples required so that T TIMES S is
contained in R. This means that T is
the “largest” (has the most number of rows) table such that T TIMES S is a
subset of R, but the resulting PRODUCT may not be identical to R.
Example 2.7.9. (In book) Start with the table R given by:
R
|
A |
B |
C |
|
a1 |
b1 |
c1 |
|
a2 |
b1 |
c1 |
|
a1 |
b2 |
c1 |
|
a1 |
b2 |
c2 |
|
a2 |
b1 |
c2 |
|
a1 |
b2 |
c3 |
|
a1 |
b2 |
c4 |
|
a1 |
b1 |
c5 |
We list a number of possible tables S and the resulting table T := R ÷ S.
S T
|
C |
|
|
A |
B |
|
c1 |
|
|
a1 |
b1 |
|
|
|
|
a2 |
b1 |
|
|
|
|
a1 |
b2 |
Note that all of the rows in S JOIN T are in R, and there couldn't be any larger a set of rows in T for which that is true because there are only three rows in R with a c1 value in column C.
S T
|
C |
|
|
A |
B |
|
c1 |
|
|
a1 |
b2 |
|
c2 |
|
|
a2 |
b1 |
All of the rows in S JOIN T are in R, and there couldn't be any larger a set of rows in T with this true: look at rows in R with C values c2.
S T
|
C |
|
|
A |
B |
|
c1 |
|
|
a1 |
b2 |
|
c2 |
|
|
|
|
|
c3 |
|
|
|
|
|
c4 |
|
|
|
|
S x T in R, and by looking at the rows in table R with C value c3 and c4, we see why T has the maximal content with this property.
S T
|
B |
C |
|
|
A |
|
b1 |
c1 |
|
|
a1 |
|
|
|
|
|
a2 |
Example of dividing R with three columns by a table S with two columns, resulting in a table T with a single column. T is maximal.
S T
|
B |
C |
|
|
A |
|
b1 |
c1 |
|
|
a1 |
|
b2 |
c1 |
|
|
|
The Relational Algebra is a system for combining tables to form other tables. Therefore each operation defines a new table: what is its head and what are it rows, in terms of the head and rows of the operand table(s). For convenience, each operation has a symbol
SET THEORETIC OPERATIONS
KEYBOARD
NAME SYMBOL FORM EXAMPLE
UNION UNION R UNION S
INTERSECTION INTERSECT R INTERSECT S
DIFFERENCE - or MINUS R - S, or R MINUS S
PRODUCT x orTIMES R x S, or R TIMES S
SPECIAL OPERATIONS
KEYBOARD
NAME SYMBOL FORM EXAMPLE
PROJECT R [ ] R [ ] R [Ai1. . .Aik]
SELECT R where C R where C R where A1 = 5
JOIN |><| JOIN R |><|S, or R JOIN S
DIVISION -:- DIVIDEBY R DIVIDEBY S
(p. 42) and there is a precedence of execution of the operations when more than one appears in an expression (fig. 2.7): PROJECT before SELECT before TIMES before JOIN or DIVIDEBY before INTERSECTION before UNION or DIFFERENCE. In addition, it is convenient to define an assignment operator, := and a variable called an alias so that we can refer to the table that is the partial result of an expression evaluation, or so that we can produce a copy of a table with a different name. As with the algebra of mathematics, the definitions of the operations imply the equivalence of certain expressions (different expressions ALWAYS produce the same table, hence one expression may ALWAYS be substituted for the other when convenient). For example, (R x S) x T = R x (S x T) (TIMES is associative).
SQL is a standard syntax for expressing statements of the Relational Algebra. Actually, SQL is motivated by the Relational Algebra but is derived to express queries that extract values from a database of tables. Therefore SQL has features that were not deemed necessary in the abstract Relational Algebra that make writing query statements easier. SQL builds all of the Relational Algebra operations into various phases of the SELECT statement.
Figure 3.1 describes the formation of a SELECT operation as a subquery expression optionally followed by an ORDER BY phrase that would list one or more column names and the term ASC (ending) or DESC (ending). A subquery expression is constructed with two or more phrases, the first of which begins SELECT, and the second of which is FROM.
The search condition of a WHERE phrase can include certain subqueries. The conditions stated in the WHERE phrase can be Boolean expressions or the predicates (fig. 3.19) IN (fig. 3.6) comparision and quantified comparision (fig. 3.7), BETWEEN, IS NULL, LIKE, or EXISTS (fig 3.8), each with a NOT option and whose values are TRUE, FALSE, or UNKNOWN (fig. 3.21), or a scalar subquery.
v A row is ruled out in a WHERE clause if the search_condition is not T, i.e., if it is F or U. However, U is not equivalent to F.
|
AND |
T |
F |
U |
|
OR |
T |
F |
U |
|
NOT |
|
|
T |
T |
F |
U |
|
T |
T |
T |
T |
|
T |
F |
|
F |
F |
F |
F |
|
F |
T |
F |
U |
|
F |
T |
|
U |
U |
F |
U |
|
U |
T |
U |
U |
|
U |
U |
U just acts like there's doubt, might be T or F, and remains U in result if doubt remains. But of course F AND U is not in doubt: it results in F, since this is the case whether U is T or U is F.
The expressions that may be used to define a search condition include numeric valued expressions (fig 3.16a), string valued expressions (fig 3.16b), date expressions, and various built-in mathematical functions.
|
NameArgument type Result type Description |
|
Count any (can be *) numeric count of occurrences |
|
sum numeric numeric sum of arguments |
|
avg numeric numeric average of arguments |
|
max char or numeric same as arg maximum value |
|
min char or numeric same as arg minimum value |
Figure 3.12 The set functions in SQL (pg. 124)
Note max and min return char max and min.
select min(city) from products;
Note avg(x) has same value as sum(x)/count(x); probably more efficient.
Note that these three MIGHT NOT give same values:
select count(*) from customers;
select count(cid) from customers; /* null values not counted */
/* but cid not null in customers */
select count(city) from customers; /* only if no null city values */
Surprising? Doesn't insist on different city values. But:
select count(distinct city) from customers;
is how we get count of distinct city names. Could also do:
select avg(distinct dollars) from orders;
But it would be VERY UNUSUAL if you really wanted this.
It is not legal to use an aggregate function directly in a where clause.
E.g., try to list all customers with maximum discount.
select cid, cname from customers c where c.discnt = max(c.discnt);
/*** NOT LEGAL ***/
select cid, cname from customers where discnt =
(select max(discnt) from customers);
The PROJECT operation is denoted by appending a [ ] containing the head of the final table to a table specification at the end of the SELECT statement.
The UNION operation joins the results of subqueries but cannot appear in the subquery expressions within predicates.
v The UNION operation removes duplicate rows unless the ALL option is specified forcing the production of a “nonstandard” table as a result.
v INTERSECION and EXCEPT perform the same connective role as UNION with the same ALL prerogative (fig. 3.10). With ALL the INTERSECTION will return the common duplicates instead of just 0 (no duplicates) or 1 (one or more duplicates). EXCEPT ALL returns the difference in the common duplicates instead of 0 for common and 1 for no common rows.
In SQL a JOIN is done by specifying the matching columns in the WHERE phrase (using aliases to be able to designate the same column name in different tables:
select distinct cname, pname from customers c, orders o, products p
where c.cid = o.cid and o.pid = p.pid;
In SQL a PRODUCT is implied by the FROM phrase where the sequence of table references and their aliases is presented from which the select list is drawn.
SQL is non-procedural to the greatest extent possible, so a statement can't depend on an earlier statement.
Need qualifier when column names are ambiguous from one table to another; not otherwise (above cname, pname, but o.cid, c.cid). But it never hurts to qualify column names.
Conceptual process in Select.
(1) Take product of tables in from clause,
(2) (2) apply selection in where clause,
(3) (3) project on attributes in select-list and report.
But this in not necessarily (usually not) the actual order of events. Query optimization is applied. If customers has 1000 rows, orders 100,000, and products 1000, then Cartesian product has 100 billion rows. But eventual answer will have at most 100,000 rows.
Can make this much more efficient: index customers by cid and products by pid; now go through orders one row at a time and put out cname, pname into temporary file. Finally, sort cname, pname pairs and cast out duplicates.
This is what the query optimizer might do in its query plan. Basically, a query plan is like a program that retrieves the data you want. You don't have to worry about it. Only type SQL, query optimizer makes the program.
Can perform calculations in select.
Example 3.3.5. retrieve a table based on orders, with columns ordno, cid, aid, pid, and profit on order.
select ordno, x.cid, x.aid, x.pid, 0.40*(x.qty*p.price)
- 0.01*(c.discnt + a.percent)*(x.qty*p.price) as profit
from orders x, customers c, agents a, products p
where c.cid = x.cid and a.aid = x.aid and p.pid = x.pid;
(Profit is quantity * price, where subtract off 60% of cost, discnt for customer, and percent commission for agent. Expressions allow: +, -, *, / **, and functions: upper( ) or ceil( ) (Oracle, smalles integer greater than float -- functions with numeric arguments are not part of standard).
Without the "as profit" clause, Oracle will head the column with the expression. After clause put in, heading will be "profit" In ORACLE, this is called a column alias, comparable to the way table aliases are created.
-------------------------------------------------Jan 27-----------------------------------------------------------
Example 3.6.3. Retrieve all customer names where the customer places at least two orders for the same product. Here's a new way:
select cname from
(select o.cid as spcid from orders o, orders x
where o.cid = x.cid and o.pid = x.pid and o.ordno <> x.ordno) as y,
customers c where y.spcid = c.cid;
This is of the form SELECT ….FROM (subquery) as y, customers c WHERE …
Note that c and y are table aliases (the table y has the one column spcid) while spcid is a column alias.
GROUP BY phrase:
select pid, sum(qty) --aggregate function will be applied to groups
from orders
where aid = 'a03'
group by pid; --collapses rows with same pid value to a single output row
ORDER: (1) Take Cartesian product; (2) cross out rows not selected in WHERE clause; (3) Group remaining rows in accordance with GROUP BY clause; (4) evaluate expressions in target list (aggregate function values depend on groups).
Would see contributions from four rows of the table on pg. 130, where aid = a03, but two of these are both for product p03 and so the sum of their qty values would be reported for the single row with p03 in this result.
Example 3.8.4 (variant). Get ids of all products purchased by more than two customers.
select pid from orders group by pid having count(distinct cid) > 2;
Subquery General Form, see pg. 135.
SELECT [ALL|DISTINCT] expr [[AS] c_alias] {, expr [[AS] c_alias]}
FROM tableref {, tabletrf}
[WHERE search_condition]
[GROUP BY column {, column}]
[HAVING search_condition]
| subquery [UNION [ALL] | INTERSECT [ALL] | EXCEPT [ALL]]
[CORRESPONDING [BY] (colname {, colname . . .})] subquery;
Recall tableref in FROM clause was defined in Section 3.6, Fig. 3.11, pg. 117. (BUT NOT ALL SYNTAX IS PRESENT IN ANY PRODUCT.) The most basic SQL form (supported by all databases) is just
Tableref := tablename [corr_name]
Square brackets means don't have to have the item. WHERE, GROUP BY, etc, all optional.
Thus [ALL|DISTINCT]: the phrase is optional; if we specify, we must choose exactly one form, ALL (duplicates allowed in target-list) or DISTINCT (no duplicates. But since ALL is underlined, it is the default, so if the phrase is not specified, it will be as if ALL was specified.
search_condition in the WHERE clause is a complex object; most of what follows gives details on this, all the various Subqueries and predicates, with a few new ones.
OK, this is a Subquery form: means can occur in subquery (part of search_condition), also part of full select as follows.
Full Select General Form
subquery
[ORDER BY result_col [ASC|DESC] {, result_col [ASC|DESC]}]
The ORDER BY clause is new, and allows us to order the rows output by a succession of result-column values, leftmost first.
See Figure 3.15. Conceptual order of evaluation of a Selectstatement.
o First the Cartesian product of all tables in the FROM clause is formed.
o From this, rows not satisfying the WHERE condition are eliminated.
o The remaining rows are grouped in accordance with the GROUP BY clause.
o Groups not satisfying the HAVING clause are then eliminated.
o The expressions of the Selectclause target list are evaluated.
o If the key word DISTINCT is present, duplicate rows are now eliminated.
o The UNION, INTERSECT, EXCEPT is taken after each subquery is evaluated.
o Finally, the set of all selected rows is sorted if an ORDER BY is present.
Example 3.9.1. List all customers, agents, and the dollar sales for pairs of customers and agents, and order the result from largest to smallest sales totals. Retain only those pairs for which the dollar amount is at least 900.00.
select c.cname, c.cid, a.aname, a.aid, sum(o.dollars) as casales
from customers c, orders o, agents a where c.cid = o.cid and o.aid = a.aid
group by c.cname, c.cid, a.aname, a.aid
having sum(o.dollars) >= 900.00
order by casales desc;
In addition to posing queries there are other tasks to maintaining a database:
v Create table. Result is empty table.
create table customers (cid char(4) not null, cname varchar(13),
city varchar(20), discnt real, primary key(cid));
Names and gives types to each of columns: Explain char(4), varchar(20).
Primary key clause means can't duplicate in table, foreign keys will refer to this column by default.
Note "not null" is an integrity constraint: must specify value on insert, can't leave null. More integrity constraints in Chapter 7, on DBA stuff.
v Next, load from OS file into table. (Called sqlldr in ORACLE) No-one wants to type in one line at a time: everything is moved around on tape or disk (say from one DBMS to another). Note expected format:
c001,TipTop,Duluth,10.00 <-- comma separated, no spaces
c002,Basics,Dallas,12.00
You have to create a .ctl table for each table, custs.ctl, . . . Then use "sqlldr" command (or sqlload in Oracle version 7) from the UNIX prompt (not from within SQL*PLUS).
v After think have table loaded, use the sqlplus command and within that environment (page 727)
v Insert, Update, and Delete statements perform data modifications to existing tables. Called as a group the "update statements".
Need update privilege on a table to update it, different than read privilege. You always have all privileges if you created the table.
v insert into tablename [(column {, column…})]
{values (expr | null {expr | null…})] | subquery};
One of the two forms must be used, values … or subquery. Note that null itself does not qualify as an expression, so it has to be explicitly allowed as an alternative here. Columns named and expressions must match in number and type.
Example 3.10.1. insert into orders (ordno, month, cid, aid, pid)
values('1107', 'aug', 'c006', 'a04', 'p01');
Notice, no qty or dollars, so on this new row they are null. But if we are inserting ALL the columns in a table, can leave out column list (note it's optional?)
Create new table swcusts, like customers, then write:
insert into swcusts select * from customers where
city in ('Dallas', 'Austin');
v update tablename [corr_name]
set colname = {expr | null | (subquery)} {, {column = expr |null | (subquery)}…}
[where search_condition];
update agents set percent = 1.1*percent where city = 'New York';
update swcusts set discnt = (select discnt from customers c
where c.cid = swcusts.cid);
v delete from tablename
[where search_condition];
delete from agents where city = 'New York';
delete from agents where aid in (select aid from orders group by
aid having sum(dollars) < 600);