Data Management Class Notes for 10/3/2014

Scribe: Hanieh Soleimani & Michele Chersich

Relational Algebra and SQL

SQL is based on the Relational Algebra(RA), also adds some extension to RA, some of which are necessary and some are not.
Relational Algebra is based on sets and operations.

BASIC OPERATORS : IF A AND B are sets, then we will have:

These operations form an algebra : the output of the operations is another relation. This means that we can write expressions, not just programs
For example: (A X B) ∩ C

RELATIONS IN RELATIONALS ALGEBRA

  • Relations are set of tuples (i.e: "rows" in the table).
  • Relational Algebra deals with relations, which look like tables with a fixed number of columns.
  • We assume that each attribute (column) comes from some domain.
  • We assume the following operators are defined for the domain: x = y , x < y , x > y. With these operators we are able to compare values in a column.
  • We assume these operations are meaningful even if the values come from different columns.
  • In the following example since we know B and F are integers then we can compare them.

    A B C D
    (...) (int) (...) (...)

    E F G
    (...) (int) (...)

  • In sets of relations the ordering of the attributes does not matter:
  • R A B
    10 1000
    20 2000

    R B A
    1000 10
    2000 20

  • In set theory there is one empty set. For us, a different empty set for every relation, because relations are different from one another (e.g. they have a different number of columns).
  • Relational Algebra vs SQL

    Relational Algebra is restricted to querying the database.

  • No Support for :
  • PRIMARY KEYS, FOREIGN KEYS, INDEXING, INSERTING, DELETING, RECOVERY, CONCURRENCY, SECURITY.

  • Relational Algebra is not concerned with efficiency, only concerned with specification (i.e. how to get data out of the database).
  • Other operators

  • PROJECTION
  • Usually written by π (greek pi), it allows to choose columns (i.e. the input is a relations and the attributes (columns) we want to keep; output is a relation where undesired columns have been discarded).
    FORMALLY : πA,B(R) = projection of R with desired attributes A and B.
    Example :
    The output of the projection πA,B(R) is a relation over attributes A and B:

    R A B C
    a 1 100
    b 2 200
    c 3 300
    d 4 400

    R A B
    a 1
    b 2
    c 3
    d 4

    as we can see, column C has been discarded.
    In SQL we write the following:

    "SELECT  A , B  FROM  R"

    SELECT and FROM are keywords in SQL.
    Programmatically, projection can be described the following way:

    for row : table

    keep attributes you want

    write new row


  • SELECTION
  • Usually written by σ (greek sigma), it allows to choose rows by evaluation of a predicate.
    FORMALLY : σC > 200(R) = selection of R with desired rows the ones that satisfy C > 200.
    Example :
    The output of selection is the following (the input relation being the same as in the projection example) :

    R A B C
    c 3 300
    d 4 400

    In SQL :

    "SELECT *  FROM  R  WHERE  c > 200"

    SELECT, FROM and WHERE are keywords in SQL and * means everything.
    Programmatically, selection can be described the following way:

    for row : relation

    if predicate is true :

    then write row

    else :

    do nothing

    N.B.: any operation performed on the relations produces a relation. That is why it's called an algebra.

    EXAMPLE : composing projection and selection
    If we take both projection and selection, we have: πA,BC > 200(R))
    In SQL :

    "SELECT A , B FROM R WHERE C > 200"

    N.B.: reverse (i.e. σC > 200A,B(R))) is incorrect, as the projection yields a relation that has no attribute C.


  • UNION
  • Example :

    R1 A B
    a 1
    b 2

    R2 A B
    a 1
    c 3

    R1∪ R2 A B
    a 1
    b 2
    c 3

    In SQL :

    "SELECT  *  FROM  R1  UNION  SELECT  *  FROM  R2"

    In the union the attributes and the 'arity' MUST be the same : this is called "Union Compatibility".
    ARITY: means the number of columns.
    "Probably" require that the columns are drawn from the same domain (i.e. same data type: both attributes are of type Integer, for instance).
    Good idea if the columns have the same names.


  • DIFFERENCE
  • Example :

    R A B
    a 1
    b 2

    S A B
    b 2
    c 3

    R - S A B
    a 1

    In SQL :

    "SELECT  * FROM  R  MINUS  SELECT  *  FROM  S"

    N.B.: keyword MINUS in SQLite would be EXCEPT
    The output is a new table and we also here require "Union Compatibility" between R and S.


  • INTERSECTION
  • Example (input relations are R and S from the previous example):

    R ∩ S A B
    b 2

    In SQL :

    "SELECT  *  FROM  R  INTERSECT  SELECT  *  FROM  S"

  • CARTESIAN PRODUCT
  • Example (again, inputs are R and S):

    R × S A B A' B'
    a 1 b 2
    a 1 c 3
    b 2 b 2
    b 2 c 3

    In SQL :

    "SELECT  *  FROM  R , S"

    Or we can also write:

    "SELECT  *  FROM  R  CROSS  S"


  • JOIN
  • FORMALLY : R ⋈R.B = S.B S

    In SQL :

    "SELECT  R , S   WHERE  R.B = S.B"

    We will see it in more details later.

    The rest of the class is about SQLite3 in the command line: see "sqlite.txt"