Craig S. Mullins 

Return to Home Page

September 2000

 

The "Top Ten" Problem
By Craig S. Mullins

A commonly occurring problem for application developers is the desire to retrieve a limited number of qualifying rows from the database. For example, maybe you need to list the top ten best selling items from inventory, or a list of the ten employees with the highest salary. There are several ways to accomplish this using SQL.

The first reaction is to simply use the WHERE clause to eliminate non-qualifying rows. But this is simplistic, and often is not sufficient to produce the results desired in an optimal manner.

For example, what if the program only requires that the top ten results be returned? This can be a somewhat difficult request to formulate using SQL alone. Consider, for example, an application that needs to retrieve only the top ten highest priced book titles from the pubs database. You could simply issue a SQL request that retrieves all of the books in order by price, but only use the first ten retrieved. That is easy, for example:

SELECT TITLE, PRICE

FROM   TITLES

ORDER  BY PRICE DESC

Be sure to specify the ORDER BY clause with the DESC key word. This sorts the results into descending order, instead of the default, which is ascending. Without the DESC key word, the "top ten" would be at the very end of the results set.

But that does not satisfy the requirements of retrieving only the top ten. It merely sorts the results into descending sequence. So the results would still be all pubs in the table, but in the correct order so you can view the "top ten" prices very easily. But once again, this is not a complete solution. The ideal solution should return only the top ten book titles with the highest price and not merely a sorted list of all titles. There are several Transact-SQL solutions that can be used to produce the correct solution.

Using SET ROWCOUNT

The first Transact-SQL solution is to use the SET ROWCOUNT command. This command is used to limit the number of rows returned by a query. It is a separate command that must be issued before the query is run. By specifying a numeric to ROWCOUNT, SQL Server will limit the results set for subsequent queries to that number of rows.

So, we could extend the previous example as follows:

SET ROWCOUNT 10

SELECT TITLE, PRICE

FROM TITLES

ORDER BY PRICE DESC

SET ROWCOUNT 0

Be sure to set the ROWCOUNT both before and after the query. Setting the ROWCOUNT to zero indicates that all rows should be returned. If you do not reset the ROWCOUNT after issuing the query, subsequent queries will also limit the number of rows returned according to the value of ROWCOUNT.

Using the TOP Key Word

Using ROWCOUNT, however, probably is not the best solution. It requires multiple commands and might be confusing. Instead, you should consider using another Transact-SQL extension, the TOP key word. The TOP key word is specified right in the SELECT statement. Its primary benefit is that it is including in the SQL code of the query and will not impact any other subsequent queries.

The TOP Transact-SQL feature can be used as follows:

SELECT TOP 10 TITLE, PRICE

FROM TITLES

ORDER BY PRICE DESC

Of course, the TOP value can be any number, not just 10. For example, to return the only the top five highest-priced titles simply change the query as follows:

SELECT TOP 5 TITLE, PRICE

FROM TITLES

ORDER BY PRICE DESC

This is probably the simplest and most elegant solution for limiting the number of rows returned by a SQL Server query.

Avoiding Transact-SQL Extensions

But, you might want to avoid using Transact-SQL extensions altogether. This might be important if you want a solution that is portable across multiple RDBMS products. The following SQL will also return the top ten titles by price:

SELECT TITLES, PRICE

FROM TITLES A

WHERE 10 > (SELECT COUNT(*)

            FROM TITLES B

            WHERE A.PRICE < B.PRICE)

AND PRICE IS NOT NULL

ORDER BY PRICE DESC

 

For the query, no Transact-SQL extensions are used, so the SQL is portable from SQL Server to other database servers, such as DB2 and Oracle. And, of course, you can change the constant 10 to any number you wish, thereby retrieving the top 20, or top 5, as deemed necessary by the needs of your application. Since the PRICE column is nullable in the pubs database, you must remove the nulls from the results set. And the ORDER BY is required to sort the results in the right order. If it is removed from the query, the results will still contain the top ten, but they will be in no particular order.

There is one difference between this query and the others we have examined, and that is the way "ties" are handled. A tie occurs when more than one row contains the same value. This query may return more than 10 rows if there are multiple rows with the same value for price within the top ten. The previous two queries that used Transact-SQL extensions always will limit the number of rows returned to ten, even if there are other rows with the same value for price as the number ten row in the results set. The needs of your application will dictate whether ties are to be ignored or included in the result set. If ties should not be included in the results set, do not use the last SQL because it will always include ties.

Want to Return the Bottom Ten?

Any of these queries can be modified to return the bottom ten instead of the top ten. With the Transact-SQL versions, simply remove the DESC from the ORDER BY clause. This will cause the rows to be sorting into ascending sequence, which is the default. Then the ROWCOUNT or TOP formulations will return the bottom ten results.

For the final query, using standard SQL alone, simply reverse the less than sign (>) to a greater than sign (<) in the subquery, and remove the DESC from the ORDER BY clause, as follows:

SELECT TITLES, PRICE

FROM TITLES A

WHERE 10 > (SELECT COUNT(*)

            FROM TITLES B

            WHERE A.PRICE > B.PRICE)

AND PRICE IS NOT NULL

ORDER BY PRICE

 

Bottom Line

SQL is a very flexible language, and Transact-SQL provides many extensions to make the job of querying SQL Server databases using nothing but SQL simple and efficient.

The "top ten" request is a common application requirement. Any application that needs to return an ordered subset of a given entity can take advantage of the "top ten" query. Consider using SQL to return only the results you need instead of writing an application program that reads query results to limit the results set. "SQL only" solutions can be more efficient and easier to use than bulky application programs.

 

From SQL Server Update (Xephon) September 2000.

2000 Craig S. Mullins, All rights reserved.
Home.