Home Services Articles Presentations Books Speaking 2021 Social Media Database Links Contact Us

An Introduction to Recursive SQL

by Craig S. Mullins

If you are a SQL programmer, learning recursive SQL techniques can be a boon to your productivity. A recursive query is one that refers to itself. I think the best way to quickly grasp the concept of recursion is to think about a mirror that is reflected into another mirror and when you look into it you get never-ending reflections of yourself. This is recursion in action.

Different DBMS products implement recursive SQL in different ways. Recursion is implemented in standard SQL-99 using common table expressions (CTEs). DB2, Microsoft SQL Server, Oracle and PostgreSQL all support recursive queries using CTEs. Note that Oracle also offers an alternative syntax using the CONNECT BY construct, which we will not discuss here.

A CTE can be thought of as a named temporary table within a SQL statement that is retained for the duration of that statement. There can be many CTEs in a single SQL statement but each must have a unique name. A CTE is defined at the beginning of a query using the WITH clause.

Now before we dive into recursion, let’s first look at some data that would benefit from being read recursively. Figure 1 shows a hierarchic organization chart.

A table holding this data could be
set up as follows:

CREATE TABLE ORG_CHART

(MGR_ID SMALLINT,

EMP_ID SMALLINT,

EMP_NAME CHAR(20)

);

Of course, this is a simple implemen-tation and many more columns would likely be needed for a production hierarchy. But the simplicity of this table will suit our purposes for learning recursion. To make the data in this table match our diagram, we would load the table as shown in Table 1.

Figure 1. A sample hierarchy

Table 1. Sample data

The MGR_ID for the top-most node is set to some value indicating that there is no parent for this row, in this case –1 is used. Now that we have loaded the data we can code a query to walk the hierarchy using recursive SQL. Suppose we need to report on the entire organizational structure under LIL BOSS. The following recursive SQL using a CTE will do the trick:

WITH EXPL (MGR_ID, EMP_ID, EMP_NAME) AS

(

SELECT ROOT.MGR_ID, ROOT.EMP_ID, ROOT.EMP_NAME

FROM ORG_CHART ROOT

WHERE ROOT.MGR_ID = 3

UNION ALL

SELECT CHILD.MGR_ID, CHILD.EMP_ID, CHILD.EMP_NAME

FROM EXPL PARENT, ORG_CHART CHILD

WHERE PARENT.EMP_ID = CHILD.MGR_ID

)

SELECT DISTINCT MGR_ID, EMP_ID, EMP_NAME

FROM EXPL

ORDER BY MGR_ID, EMP_ID;

The results of running this query would be as follows:

MGR_ID EMP_ID EMP_NAME

1 3 LIL BOSS

3 6 TEAM LEAD

6 7 LOW MAN

6 8 SCRUB

Let’s break this somewhat complex query down into its constituent pieces to help understand what is going on. First of all, a recursive query is implemented using the WITH clause (using a CTE). The CTE is named EXPL. The first SELECT primes the pump to initialize the “root” of the search. In our case, to start with EMP_ID 3, that is LIL BOSS.

The next SELECT is an inner join combining the CTE with the table upon which the CTE is based. This is where the recursion comes in. A portion of the CTE definition refers to itself. Finally, we SELECT from the CTE. Similar queries can be written to completely explode the hierarchy to retrieve all the descendants of any given node.

Recursive SQL can be very elegant and efficient. However, because of the difficulty developers can have understanding recursion, it is sometimes thought of as “too inefficient to use frequently.” But, if you have a business need to walk or explode hierarchies in your database, recursive SQL will likely be your most efficient option. What else are you going to do? You can create pre-exploded tables, but this requires denormalization and a lot of pre-processing which will not be efficient. Or you might write your own code to walk a hierarchy. This, too, is fraught with potential problems. You will probably retrieve more data than you need, causing inefficient I/O. And how will you assure that your code is more efficient than the DBMS?

If every row processed by the query is required in the answer set (“find all employees who work for LIL BOSS”), then recursion will most likely be quite efficient. If only a few of the rows processed by the query are actually needed (“find all flights from Houston to Pittsburgh, but show only the three fastest”) then a recursive query can be quite costly. The bottom line is that you should consider recursive SQL when business requirements call for it. But be sure that suitable indexes are available and always examine your access paths.

Real-time analytics is another interesting issue to consider. The adjective “real-time” refers to a level of responsiveness that is immediate or nearly immediate. Market forces, customer requirements, governmental regulations, and technology changes collectively conspire to ensure that data that is not up-to-date is not acceptable. As a result, today’s leading organizations are constantly working to improve operations with access to, and analysis of, real-time data.

As good as real-time analytics sounds, it is not without its challenges to implement. One such challenge is reducing the latency between data creation and when it is recognized by analytics processes. Time-to-market issues can be another potential pitfall of an advanced analytics project. A large part of any analytical process is the work involved with gathering, cleansing, and manipulating data required as input to the final model or analysis. However, this up-front work is essential to the overall success of any advanced analytics project.

ADVANCED ANALYTICS DEPLOYMENTS OFFER PRODUCTIVITY GAINS

The end result of deploying advanced analytics is increased productivity with the ability to gather and analyze large volumes of data to deliver faster, more-effective business decisions.

Indeed, the whole big data phenomenon is largely focused on the analytical processing that is conducted on the data. But data need not be “big” in order to implement advanced analytics at your organization. And, ignoring the value that advanced analytics can provide your organization is not an option … because you can bet that your competition is not ignoring it!

From Database Trends and Applications, May 2014.

May 2014

MGR_ID	EMP_ID	EMP_NAME
–1	1	BIG BOSS
1	2	LACKEY
1	3	LIL BOSS
1	4	BOOTLICKER
2	5	GRUNT
3	6	TEAM LEAD
6	7	LOW MAN
6	8	SCRUB

Consulting Writing and Analysis Education

DBA Corner The Database Report Data Perspectives The Buffer Pool Db2 Articles Mainframe Articles Database Administration Articles Big Data Articles Articles 2020 Articles 2019 Articles 2018 Articles 2017 Articles 2016 Articles 2015 Articles 2012 thru 2014 Articles 2010 thru 2011 Articles 2005 thru 2009 Articles 2000 thru 2004 Articles 1996 thru 1999 Articles 1990 thru 1995 Craig's Quotes and Interviews

A Guide to Db2 Application Performance for Developers Db2 Developer's Guide Database Administration

Speaking 2020 Speaking 2019 Speaking 2018 Speaking 2017 Speaking 2016 Speaking 2015