Introduction to Relational Databases, SQL

Lecture Notes for CS 142
Fall 2010
John Ousterhout

  • Readings for this topic:
    • If you would like to learn more about relational databases, a good textbook is Database Management Systems, Third Edition, by Ramakrishnan and Gehrke. Even better, take CS 145.
  • Relational database: collection of tables (also called relations)
  • Table:
    • Collection of rows (also called tuples or records).
    • Each row in a table contains a set of columns (also called fields or attributes).
    • Each column has a type:
      • String: VARCHAR(20)
      • Integer: INTEGER
      • Floating-point: FLOAT, DOUBLE
      • Date/time DATE, TIME, DATETIME
      • Several others...
    • Primary key: provides a unique identifier for each row (need not be present, but almost always is in practice).
    • Schema: the structure of the database, including for each table
      • The table name
      • The names and types of its columns
      • Various optional additional information (constraints, etc.)
      • The number of rows in a table is not part of the schema
  • SQL: a language for creating and manipulating relational databases
    • Initially created at IBM as part of System-R
    • Implemented with modifications in numerous products: Oracle, Sybase, DB-2, SQL Server, MySQL
    • Vendor versions are not completely compatible
    • Has been (partially) standardized
    • Examples that follow use MySQL syntax
  • Create a table for the students:
    CREATE TABLE students (
        id INT AUTO_INCREMENT,
        name VARCHAR(30),
        birth DATE,
        gpa FLOAT,
        grad INT,
        PRIMARY KEY(id));
    
  • Add rows to the students table:
    INSERT INTO students(name, birth, gpa, grad)
         VALUES ('Anderson', '1987-10-22', 3.9, 2009);
    INSERT INTO students(name, birth, gpa, grad)
         VALUES ('Jones', '1990-4-16', 2.4, 2012);
    INSERT INTO students(name, birth, gpa, grad)
        VALUES ('Hernandez', '1989-8-12', 3.1, 2011);
    INSERT INTO students(name, birth, gpa, grad)
        VALUES ('Chen', '1990-2-4', 3.2, 2011);
    
  • Delete row(s):
    DELETE FROM students WHERE name='Anderson';
    
  • Delete table:
    DROP TABLE students;
    
  • Queries: the strength of relational databases
    • Lots of ways to extract information
    • You specify what you want
    • The database system figures out how to get it efficiently
    • Refer to data by contents, not just name
    • Show entire contents of a table:
      SELECT * FROM students;
      +----+-----------+------------+------+------+
      | id | name      | birth      | gpa  | grad |
      +----+-----------+------------+------+------+
      |  1 | Anderson  | 1987-10-22 |  3.9 | 2009 |
      |  2 | Jones     | 1990-04-16 |  2.4 | 2012 |
      |  3 | Hernandez | 1989-08-12 |  3.1 | 2011 |
      |  4 | Chen      | 1990-02-04 |  3.2 | 2011 |
      +----+-----------+------------+------+------+
      
    • Show just a few columns from a table:
      SELECT name, gpa FROM students;
      +-----------+------+
      | name      | gpa  |
      +-----------+------+
      | Anderson  |  3.9 |
      | Jones     |  2.4 |
      | Hernandez |  3.1 |
      | Chen      |  3.2 |
      +-----------+------+
      
    • Filtering: only display a subset of the rows:
      SELECT name, gpa
          FROM students
          WHERE gpa > 3.0;
      +-----------+------+
      | name      | gpa  |
      +-----------+------+
      | Anderson  |  3.9 |
      | Hernandez |  3.1 |
      | Chen      |  3.2 |
      +-----------+------+
      
    • Sorting:
      SELECT gpa, name, grad
          FROM students
          WHERE gpa > 3.0
          ORDER BY gpa DESC;
      +------+-----------+------+
      | gpa  | name      | grad |
      +------+-----------+------+
      |  3.9 | Anderson  | 2009 |
      |  3.2 | Chen      | 2011 |
      |  3.1 | Hernandez | 2011 |
      +------+-----------+------+
      
  • Updates:
    UPDATE students
        SET gpa = 2.6, grad = 2013
        WHERE id = 2;
    
  • Joins: how to manage relationships between tables?
    • Join: a query that merges the contents of 2 or more tables, displays information from the results.
    • Can produce the equivalent of a linked list in a programming language, and many other effects.
  • Join example: many-to-one relationship
    • Students have advisors; introduce new table describing faculty.
      +----+----------+-----------+
      | id | name     | title     |
      +----+----------+-----------+
      |  1 | Fujimura | assocprof |
      |  2 | Bolosky  | prof      |
      +----+----------+-----------+
      
    • Add new column advisor_id to the students table. This is a foreign key.
      +----+-----------+------------+------+------+------------+
      | id | name      | birth      | gpa  | grad | advisor_id |
      +----+-----------+------------+------+------+------------+
      |  1 | Anderson  | 1987-10-22 |  3.9 | 2009 |          2 |
      |  2 | Jones     | 1990-04-16 |  2.4 | 2012 |          1 |
      |  3 | Hernandez | 1989-08-12 |  3.1 | 2011 |          1 |
      |  4 | Chen      | 1990-02-04 |  3.2 | 2011 |          1 |
      +----+-----------+------------+------+------+------------+
      
    • Example query:
      SELECT s.name, s.gpa
          FROM students s, advisors p
          WHERE s.advisor_id = p.id AND p.student = 'Fujimura';
      +-----------+------+
      | name      | gpa  |
      +-----------+------+
      | Jones     |  2.4 |
      | Hernandez |  3.1 |
      | Chen      |  3.2 |
      +-----------+------+
      
    • A join creates the cross-product of 2 or more tables
      • Potentially very expensive!
      • In practice, optimized carefully by the database system.
  • Join example: many-to-many relationship
    • Courses: students take many courses, courses have many students
    • Add new table describing courses:
      +----+--------+-----------------+-------------+
      | id | number | name            | quarter     |
      +----+--------+-----------------+-------------+
      |  1 | CS142  | Web stuff       | Winter 2009 |
      |  2 | ART101 | Finger painting | Fall 2008   |
      |  3 | ART101 | Finger painting | Winter 2009 |
      |  4 | PE204  | Mud wrestling   | Winter 2009 |
      +----+--------+-----------------+-------------+
      
    • Create a join table courses_students describing which students took which courses.
      +-----------+------------+
      | course_id | student_id |
      +-----------+------------+
      |         1 |          1 |
      |         3 |          1 |
      |         4 |          1 |
      |         1 |          2 |
      |         2 |          2 |
      |         1 |          3 |
      |         2 |          4 |
      |         4 |          4 |
      +-----------+------------+
      
    • Find all students who took a particular course:
      SELECT s.name, c.quarter
          FROM students s, courses c, courses_students cs
          WHERE c.id = cs.course_id AND s.id = cs.student_id
          AND c.number = 'ART101';
      +----------+-------------+
      | name     | quarter     |
      +----------+-------------+
      | Jones    | Fall 2008   |
      | Chen     | Fall 2008   |
      | Anderson | Winter 2009 |
      +----------+-------------+
      
  • Additional database features:
    • Indexes: used to speed up searches (exact matches, ranges)
    • Transactions: used to group operations together to provide predictable behavior even when there are concurrent operations on the database.
    • Many other things.