Joining Tables

Understanding Joins

Relational tables are designed so that information is split into multiple tables, one for each data type. The tables are related to each other through common values (and thus the relational in relational design).

Using INNER JOIN

Most of the time, to join tables together, you will use INNER JOIN. When connecting two tables with INNER JOIN, only the rows from the tables that match on the joining columns will show up in the results. If you join the customer and order tables, the query will return only the customers who have placed orders, along with the orders that have been placed. Only the rows where the customer ID is common in both tables will show up in the results

Creating a Join

Joining table is easy, you just add the table after the JOIN keyword. Here is the syntax for joining two tables (the keyword INNER is optional):

SELECT <select list>
FROM <table1>
[INNER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>

Joining Tables

Here is an example join on the two tables :

SELECT s.SalesOrderID, s.OrderDate, s.TotalDue, d.SalesOrderDetailID,
    d.ProductID, d.OrderQty
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID;

Creating a join is very simple. You must specify all the tables to be included and how they are related to each other. Look at the following example

The SELECT list may contain columns from either of the tables. In the FROM clause, you list one of the tables followed by the words INNER JOIN and the second table name. To define how the two tables join together, use the keyword ON and an equality expression.

If a row exists in the Sales.SalesOrderHeader table with no matches in the Sales.SalesOrderDetail table, the Sales.SalesOrderHeader row will not show up in the results.

Because the column name, SalesOrderID, is the same in both tables, it must be fully qualified with the table name anywhere it is used in the query.

Here is another example form the small Training database :

SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;

The WHERE clause acts as a filter to only include rows that match the specified filter condition—the join condition, in this case. Without the WHERE clause, every row in the first table will be paired with every row in the second table, regardless of if they logically go together or not.

Cartesian Product

The results returned by a table relationship without a join condition. The number of rows retrieved will be the number of rows in the first table multiplied by the number of rows in the second table.

SELECT vend_name, prod_name, prod_price
FROM Vendors, Products;

Cross Joins

Sometimes you’ll hear the type of join that returns a Cartesian Product referred to as a cross join.

Inner Join

The join you have been using so far is called an equijoin—a join based on the testing of equality between two tables. This kind of join is also called an inner join. In fact, you may use a slightly different syntax for these joins, specifying the type of join explicitly. The following SELECT statement returns the exact same data as the preceding example :

SELECT vend_name, prod_name, prod_price
FROM Vendors INNER JOIN Products
 ON Vendors.vend_id = Products.vend_id;

The SELECT in the statement is the same as the preceding SELECT statement, but the FROM clause is different. Here the relationship between the two tables is part of the FROM clause specified as INNER JOIN. When using this syntax the join condition is specified using the special ON clause instead of a WHERE clause. The actual condition passed to ON is the same as would be passed to WHERE.

Joining Multiple Tables

SQL imposes no limit to the number of tables that may be joined in a SELECT statement. The basic rules for creating the join remain the same. First list all the tables, and then define the relationship between each. Here is an example:

SELECT prod_name, vend_name, prod_price, quantity
FROM OrderItems, Products, Vendors
WHERE Products.vend_id = Vendors.vend_id
AND OrderItems.prod_id = Products.prod_id
AND order_num = 20007;

Joining on a Different Column Name

You can join tables even if they have different column names. The data only needs to be related, e.g the Person.Person and Sales.Customer are related on the BusinessEntityID and CustomerID columns.

Joining on different column names

SELECT c.CustomerID, c.PersonID, p.BusinessEntityID, p.LastName
FROM Sales.Customer AS c
INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

Joining on More Than One Column

Although a JOIN frequently involves joining on a single column, sometimes you might need to join on multiple columns.

Joining in multiple columns

The Sales.SpecialOfferProduct table has a composite primary key composed of SpecialOfferID plus ProductID. To identify a row in this table, you must use both columns. When joining Sales.SalesOrderDetail to the Sales.SpecialOfferProduct table, you specify both columns in the join.

To determine which row matches the rows from Sales.SalesOrderDetail, both columns are used in the join condition. If the join contained only one of the columns, the results would be similar to the incorrect results

Joining Three or More Tables

You would join more than two tables in situations that involve a many to many relationship.

Joining multiple tables

Here is the syntax to add more tables to the JOIN clause :

SELECT <SELECT list> FROM <table1>
[INNER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>
[INNER] JOIN <table3> ON <table2>.<col2> = <table3>.<col3>

Here is an example query joining multiple tables :

SELECT soh.SalesOrderID, soh.OrderDate, p.ProductID, p.Name
FROM Sales.SalesOrderHeader as soh
INNER JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Production.Product AS p ON sod.ProductID = p.ProductID
ORDER BY soh.SalesOrderID;

Notice that even though the query joins three tables, the query displays columns from only two of the tables.

Exercises

Use the AdventureWorks database to complete this exercise.

  1. The HumanResources.Employee table does not contain the employee names. Join that table to the _Person.Person _table on the BusinessEntityID column. Display the job title, birth date, first name, and last name.

  2. The customer names also appear in the Person.Person table. Join the_ Sales.Customer_ table to the Person.Person table. The BusinessEntityID column in the Person.Person table matches the PersonID column in the Sales.Customer table. Display the_ CustomerID, StoreID_, and TerritoryID columns along with the name columns.

  3. Extend the query written in question 2 to include the Sales.SalesOrderHeader table. Display the SalesOrderID column along with the columns already specified. The Sales.SalesOrderHeader table joins the Sales.Customer table on CustomerID.

  4. Write a query that joins the Sales.SalesOrderHeader table to the Sales.SalesPerson table. Join the BusinessEntityID column from the Sales.SalesPerson table to the SalesPersonID column in the Sales.SalesOrderHeader table. Display the SalesOrderID along with the SalesQuota and Bonus.

  5. Add the name columns to the query written in question 4 by joining on the Person.Person table. See whether you can figure out which columns will be used to write the join.

  6. The catalog description for each product is stored in the Production.ProductModel table. Display the columns that describe the product such as the color and size, along with the catalog description for each product.

  7. Write a query that displays the names of the customers along with the product names they have purchased. Hint: Five tables will be required to write this query!

Solution

Use the AdventureWorks database to complete this exercise.

1.

The HumanResources.Employee table does not contain the employee names. Join that table to the Person.Person table on the BusinessEntityID column. Display the job title, birth date, first name, and last name.

SELECT E.JobTitle, E.BirthDate, P.FirstName, P.LastName
FROM HumanResources.Employee AS E
INNER JOIN Person.Person AS P ON
          E.BusinessEntityID = P.BusinessEntityID;

2.

The customer names also appear in the Person.Person table. Join the Sales.Customer table to the Person.Person table. The BusinessEntityID column in the Person.Person table matches the PersonID column in the Sales.Customer table. Display the CustomerID, StoreID, and TerritoryID columns along with the name columns.

SELECT C.CustomerID, C.StoreID, C.TerritoryID,
          P.FirstName, P.MiddleName, P.LastName
FROM Sales.Customer AS C
INNER JOIN Person.Person AS P
          ON C.PersonID = P.BusinessEntityID;

3.

Extend the query written in question 2 to include the Sales.SalesOrderHeader table. Display the SalesOrderID column along with the columns already specified. The Sales.SalesOrderHeader table joins the Sales.Customer table on CustomerID.

SELECT C.CustomerID, C.StoreID, C.TerritoryID,
          P.FirstName, P.MiddleName,
          P.LastName, S.SalesOrderID
FROM Sales.Customer AS C
INNER JOIN Person.Person AS P
          ON C.PersonID = P.BusinessEntityID
INNER JOIN Sales.SalesOrderHeader AS S
          ON S.CustomerID = C.CustomerID;

4.

Write a query that joins the Sales.SalesOrderHeader table to the Sales.SalesPerson table. Join the BusinessEntityID column from the Sales.SalesPerson table to the SalesPersonID column in the Sales.SalesOrderHeader table. Display the SalesOrderID along with the SalesQuota and Bonus.

SELECT S.SalesOrderID, SP.SalesQuota, SP.Bonus
FROM Sales.SalesOrderHeader AS S
INNER JOIN Sales.SalesPerson AS SP
          ON S.SalesPersonID = SP.BusinessEntityID;

5.

Add the name columns to the query written in question 4 by joining on the Person.Person table. See whether you can figure out which columns will be used to write the join.

SELECT SalesOrderID, SalesQuota, Bonus, FirstName,
          MiddleName, LastName
FROM Sales.SalesOrderHeader AS S
INNER JOIN Sales.SalesPerson AS SP
          ON S.SalesPersonID = SP.BusinessEntityID
INNER JOIN Person.Person AS P
          ON SP.BusinessEntityID = P.BusinessEntityID;

6.

The catalog description for each product is stored in the Production.ProductModel table. Display the columns that describe the product such as the color and size, along with the catalog description for each product.

SELECT PM.CatalogDescription, P.Color, P.Size
FROM Production.Product AS P
INNER JOIN Production.ProductModel AS PM
          ON P.ProductModelID = PM.ProductModelID;

7.

Write a query that displays the names of the customers along with the product names that they have purchased. Hint: Five tables will be required to write this query!

SELECT FirstName, MiddleName, LastName, Prod.Name
FROM Sales.Customer AS C
INNER JOIN Person.Person AS P
          ON C.PersonID = P.BusinessEntityID
INNER JOIN Sales.SalesOrderHeader AS SOH
          ON C.CustomerID = SOH.CustomerID
INNER JOIN Sales.SalesOrderDetail AS SOD
          ON SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN Production.Product AS Prod
          ON SOD.ProductID = Prod.ProductID;

Subqueries vs Joins

From the last section we ran the following query :

SELECT cust_name, cust_contact
FROM Customers
WHERE cust_id IN (SELECT cust_id
                  FROM Orders
                  WHERE order_num IN (SELECT order_num
                                      FROM OrderItems
                                      WHERE prod_id =
                                            'RGAN01'));

this query can be re-written using a JOIN as follows :

SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num
AND prod_id = 'RGAN01';

Using Table Aliases

You are allowed to aliases table names in addition to aliasing columns as well. The advantages of table aliases is :

  • To shorten the SQL syntax
  • To enable multiple uses of the same table within a single SELECT statement

Here is an example of using a table alias :

SELECT cust_name, cust_contact
FROM Customers AS C, Orders AS O, OrderItems AS OI
WHERE C.cust_id = O.cust_id
 AND OI.order_num = O.order_num
 AND prod_id = 'RGAN01';

Using Different Join Types

They are other types of joins in addition to the equi-join/ INNER JOIN :

  • SELF JOIN
  • CROSS JOIN
  • NATURAL JOIN

Self Joins

A self-join is a special type of query that joins a table back to itself.

Suppose you wanted to send a mailing to all the customer contacts who work for the same company for which Jim Jones works. This query requires that you first find out which company Jim Jones works for, and next which customers work for that company. The following is one way to approach this problem:

SELECT cust_id, cust_name, cust_contact
FROM Customers
WHERE cust_name = (SELECT cust_name
                   FROM Customers
                   WHERE cust_contact = 'Jim Jones');

This first solution uses subqueries. The inner SELECT statement does a simple retrieval to return the cust_name of the company that Jim Jones works for. That name is the one used in the WHERE clause of the outer query so that all employees who work for that company are retrieved. The same query can be accomplished used an inner join :

SELECT c1.cust_id, c1.cust_name, c1.cust_contact
FROM Customers AS c1, Customers AS c2
WHERE c1.cust_name = c2.cust_name
 AND c2.cust_contact = 'Jim Jones';

Self-Joins Instead of Subqueries

Self-joins are often used to replace statements using subqueries that retrieve data from the same table as the outer statement. Although the end result is the same, many DBMSs process joins far more quickly than they do subqueries. It is usually worth experimenting with both to determine which performs better.

Natural Joins

Whenever tables are joined, at least one column will appear in more than one table (the columns being used to create the join). Standard joins (the inner joins that you learned about in the last lesson) return all data, even multiple occurrences of the same column. A natural join simply eliminates those multiple occurrences so that only one of each column is returned.

A natural join is a join in which you select only columns that are unique. This is typically done using a wildcard (SELECT *) for one table and explicit subsets of the columns for all other tables. The following is an example:

SELECT C.*, O.order_num, O.order_date,
       OI.prod_id, OI.quantity, OI.item_price
FROM Customers AS C, Orders AS O,
     OrderItems AS OI
WHERE C.cust_id = O.cust_id
 AND OI.order_num = O.order_num
 AND prod_id = 'RGAN01';

The truth is, every inner join you have created thus far is actually a natural join, and you will probably never need an inner join that is not a natural join.

Outer Joins

Most joins relate rows in one table with rows in another. But occasionally, you want to include rows that have no related rows. For example, you might use joins to accomplish the following tasks:

  • Count how many orders were placed by each customer, including customers that have yet to place an order.
  • List all products with order quantities, including products not ordered by anyone.
  • Calculate average sale sizes, taking into account customers that have not yet placed an order.

Using LEFT OUTER JOIN

When writing OUTER JOIN, you must specify either LEFT or RIGHT. If the main table, the table you want to see all the rows from even if there is not a match, is on the left side of the join, you will specify LEFT.

The syntax for the LEFT OUTER JOIN is as follows :

SELECT <SELECT list>
FROM <table1>
LEFT [OUTER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>

Here is an example of the LEFT OUTER JOIN :

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s ON c.CustomerID = s.CustomerID
WHERE c.CustomerID IN (11028,11029,1,2,3,4);

All the rows from the table on the left side of the join, the Sales.Customer table, that meet the criteria in the WHERE clause show up in the results. The query returns rows from the right side of the join, the Sales.SalesOrderHeader table, only if they match on CustomerID.

To retrieve a list of all customers including those who have placed no orders, you can do the following from the Training database.

SELECT Customers.cust_id, Orders.order_num
FROM Customers LEFT OUTER JOIN Orders
 ON Customers.cust_id = Orders.cust_id;

Here is a diagram showing the relation between the tables in the AdventureWorks database :

Left outer join

The tan area of the Venn diagram illustrates how all the CustomerID values in the Sales.Customer table will be returned whether or not there is a matching CustomerID value in the Sales.SalesOrderHeader table.

When using OUTER JOIN syntax you must use the RIGHT or LEFT keywords to specify the table from which to include all rows (RIGHT for the one on the right of OUTER JOIN, and LEFT for the one on the left). The previous example uses LEFT OUTER JOIN to select all the rows from the table on the left in the FROM clause (the Customers table).

Using RIGHT OUTER JOIN

RIGHT OUTER JOIN differs from LEFT OUTER JOIN in just the location of the tables. If the main table, the table in which you want to see all the rows, even if there is not a match, is on the right side of the join, you will specify RIGHT. Here is the syntax:

SELECT <SELECT list>
FROM <table2>
RIGHT [OUTER] JOIN <table1> ON <table1>.<col1> = <table2>.<col2>

Here is a query that will give the same result but with table position re-arranged :

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.SalesOrderHeader AS s
RIGHT OUTER JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.CustomerID IN (11028,11029,1,2,3,4);

The only difference between this query and the previous one is the order of the tables within the FROM clause and the direction keyword.

To select all the rows from the table on the right, you use a RIGHT OUTER JOIN as seen in this next example:

SELECT Customers.cust_id, Orders.order_num
FROM Customers RIGHT OUTER JOIN Orders
ON Orders.cust_id = Customers.cust_id;

Using OUTER JOIN to Find Rows with No Match

Sometimes it’s useful to find all the rows in one table that don’t have corresponding rows in another table. For example, you may want to find all the customers who have never placed an order. Because the columns from the nonmatching rows contain NULL values, you can use OUTER JOIN to find rows with no match by checking for NULL. The syntax is as follows:

SELECT <SELECT list>
FROM <table1>
LEFT [OUTER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>
WHERE <col2> IS NULL

Here is the query that returns a list of all customers who have not placed an order :

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL;

Adding a Table to the Right Side of a LEFT JOIN

The next step is to understand what to do when additional tables are added to the query. For example, you might want to display all the customers and their orders even if an order has not been placed, along with the ProductID from those orders that were placed. To keep the customers with no orders from dropping out of the results, you must continue to use LEFT JOIN.

SELECT C.CustomerID, SOH.SalesOrderID, SOD.SalesOrderDetailID, SOD.ProductID
FROM Sales.Customer AS C
LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH ON C.CustomerID = SOH.CustomerID
LEFT OUTER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID
WHERE C.CustomerID IN (11028,11029,1,2,3,4);

Full outer join

Retrieves all rows from both tables and relates those that can be related. Unlike a left outer join or right outer join, which includes unrelated rows from a single table, the full outer join includes unrelated rows from both tables.

Here is a query from the Training database :

SELECT Customers.cust_id, Orders.order_num
FROM Orders FULL OUTER JOIN Customers
 ON Orders.cust_id = Customers.cust_id;

Using Joins with Aggregate Functions

Aggregate functions can be used with joins as well. To demonstrate this, let’s look at an example. You want to retrieve a list of all customers and the number of orders that each has placed. The following code uses the COUNT() function to achieve this:

SELECT Customers.cust_id,
       COUNT(Orders.order_num) AS num_ord
FROM Customers INNER JOIN Orders
ON Customers.cust_id = Orders.cust_id
GROUP BY Customers.cust_id;

Aggregate functions can be used just as easily with other join types. See the following example:

SELECT Customers.cust_id,
       COUNT(Orders.order_num) AS num_ord
FROM Customers LEFT OUTER JOIN Orders
ON Customers.cust_id = Orders.cust_id
GROUP BY Customers.cust_id;

Exercise

Use the AdventureWorks database to complete this exercise. You can find the solutions at the end of the chapter.

  1. Write a query that displays all the products along with the SalesOrderID even if an order has never been placed for that product. Join to the Sales.SalesOrderDetail table using the ProductID column.

  2. Change the query written in question 1 so that only products that have not been ordered show up in the query.

  3. Write a query that returns all the rows from the Sales.SalesPerson table joined to the Sales.SalesOrderHeader table along with the SalesOrderID column even if no orders match. Include the SalesPersonID, SalesYTD and SalesOrderID columns in the results.

  4. Change the query written in question 3 so that the salesperson’s name also displays from the Person.Person table.

  5. The Sales.SalesOrderHeader table contains foreign keys to the Sales.CurrencyRate and Purchasing.ShipMethod tables. Write a query joining all three tables, and make sure it contains all rows from Sales.SalesOrderHeader. Include the CurrencyRateID, AverageRate, SalesOrderID, and ShipBase columns.

  6. Write a query that returns the BusinessEntityID column from the Sales.SalesPerson table along with every ProductID from the Production.Product table.

Solution

Use the AdventureWorks database to complete this exercise.

1.

Write a query that displays all the products along with the SalesOrderID even if an order has never been placed for that product. Join to the Sales.SalesOrderDetail table using the ProductID column.

SELECT SalesOrderID, P.ProductID, P.Name
FROM Production.Product AS P
LEFT OUTER JOIN Sales.SalesOrderDetail
          AS SOD ON P.ProductID = SOD.ProductID;

2.

Change the query written in question 1 so that only products that have not been ordered show up in the query.

SELECT SalesOrderID, P.ProductID, P.Name
FROM Production.Product AS P
LEFT OUTER JOIN Sales.SalesOrderDetail
         AS SOD ON P.ProductID = SOD.ProductID
WHERE SalesOrderID IS NULL;

3.

Write a query that returns all the rows from the Sales.SalesPerson table joined to the Sales.SalesOrderHeader table along with the SalesOrderID column even if no orders match. Include the SalesPersonID, SalesYTD and SalesOrderID columns in the results.

SELECT SalesOrderID, SalesPersonID, SalesYTD, SOH.SalesOrderID
FROM Sales.SalesPerson AS SP
LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH
      ON SP.BusinessEntityID = SOH.SalesPersonID;

4.

Change the query written in question 3 so that the salesperson’s name also displays from the Person.Person table.

SELECT SalesOrderID, SalesPersonID, SalesYTD, SOH.SalesOrderID,
            FirstName, MiddleName, LastName
FROM Sales.SalesPerson AS SP
LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH
        ON SP.BusinessEntityID = SOH.SalesPersonID
LEFT OUTER JOIN Person.Person AS P
          ON P.BusinessEntityID = SP.BusinessEntityID;

5.

The Sales.SalesOrderHeader table contains foreign keys to the Sales.CurrencyRate and Purchasing.ShipMethod tables. Write a query joining all three tables, and make sure it contains all rows from Sales.SalesOrderHeader. Include the CurrencyRateID, AverageRate, SalesOrderID, and ShipBase columns.

SELECT CR.CurrencyRateID, CR.AverageRate,
          SM.ShipBase, SalesOrderID
FROM Sales.SalesOrderHeader AS SOH
LEFT OUTER JOIN Sales.CurrencyRate AS CR
          ON SOH.CurrencyRateID = CR.CurrencyRateID
LEFT OUTER JOIN Purchasing.ShipMethod AS SM
          ON SOH.ShipMethodID = SM.ShipMethodID;

6.

Write a query that returns the BusinessEntityID column from the Sales.SalesPerson table along with every ProductID from the Production.Product table.

SELECT SP.BusinessEntityID, P.ProductID
FROM Sales.SalesPerson AS SP
CROSS JOIN Production.Product AS P;