In the last post (Demystifying T-SQL Subqueries – Part I) we looked at how to use scalar subqueries. Let’s continue our subquery adventure.
As with Part I, the following T-SQL query samples are using the Chinook database.
Subqueries which return a single list of values
Result |
Description |
Sample |
B |
Single Column |
Single list subquery in FROM
A subquery in a FROM clause is also often referred to as a "derived table". The scope of the derived table is only at the query level; once the query is done executing you cannot access the records from this derived table anymore. The important thing to remember whenever you are embedding a subquery in a FROM clause is to put aliases in your columns (especially calculated columns or expressions), and to also assign an alias to your derived table.
-- average number of albums -- of artists SELECT AVG(x.NumAlbums) FROM ( SELECT COUNT(DISTINCT AlbumId) NumAlbums FROM Album GROUP BY ArtistId )x |
Single list subquery in WHERE
The most common operators you can use with single list subqueries in the WHERE clause is IN or NOT IN. You can also use the SOME, ANY or ALL keywords with the regular comparison operators, like > ANY, < ALL
-- get all invoices for Johann -- Sebastian Bach's tracks SELECT InvoiceId, Quantity, UnitPrice FROM InvoiceLine WHERE TrackId IN ( SELECT TrackId FROM Track WHERE Composer = 'Johann Sebastian Bach' ) |
Single list subquery in HAVING
We can use the single list subquery in the HAVING similar to how we use it in the WHERE clause.
-- get all artist IDs that -- have more albums than Audioslave -- and AC/DC SELECT ArtistId, COUNT(DISTINCT AlbumId) NumAlbums FROM Album GROUP BY ArtistId HAVING COUNT(DISTINCT AlbumId) > ALL ( SELECT COUNT(DISTINCT al.AlbumId) FROM Album al INNER JOIN Artist ar ON al.ArtistId = ar.ArtistId WHERE ar.Name IN ('Audioslave', 'AC/DC') GROUP BY ar.ArtistId ) |
Subqueries which return a table
Subqueries that return a table are often called derived table.
Result |
Description |
Sample |
C |
Table |
Derived table in FROM
As mentioned in previous sections, when using subqueries in the FROM clause, you MUST remember to provide aliases to your columns, especially the calculated ones. You MUST also provide an alias to the derived table.
-- first customers per country -- who bought tracks SELECT i.BillingCountry, i.InvoiceDate, c.CustomerId, c.FirstName, c.LastName FROM Invoice i INNER JOIN Customer c ON i.CustomerId = c.CustomerId INNER JOIN ( SELECT BillingCountry, MIN(InvoiceDate) MinInvoiceDate FROM Invoice GROUP BY BillingCountry )x ON i.BillingCountry = x.BillingCountry AND i.InvoiceDate = x.MinInvoiceDate ORDER BY i.BillingCountry |
Derived table in WHERE
If a subquery returns a table, it can still be used in the WHERE clause, but you have to use it with an EXISTS or NOT EXISTS operator. EXISTS is a special operator – it simply tests the existence of a row or rows – that’s why it does not care whether one column or multiple columns are returned by the subquery. EXISTS does not really look at that. If there are any rows returned, then EXISTS evaluates to true. Otherwise, it evaluates to false. In most cases, you will need the subquery to be correlated, ie the subquery needs to refer back to the outer query.
You have to be careful when using correlated subqueries though. Correlated subqueries will execute once for each record in the outer query, therefore acting similar to a foreach loop. If you have a million records in your outer query, the inner query will execute a million times (hence, makes for interesting times whether you’re a DBA or a developer).
-- get all invoices of -- customers who live in Brazil SELECT * FROM Invoice i WHERE EXISTS ( SELECT * FROM Customer c -- note in where clause -- we refer to outer table alias i WHERE Country = 'Brazil' AND i.CustomerId = c.CustomerID ) |
What about performance of JOINs vs subqueries?