Logic, e-Business and the Procedural eDBA

Craig S. Mullins

October 2001

The eDBA Series... as published in:

Logic, e-Business and the Procedural eDBA

by Craig S. Mullins

Until recently, the domain of a database management system was, appropriately enough, to store, manage, and access data. Although these core capabilities are still required of a modern DBMS, additional procedural functionality is becoming not just a nice-to-have feature, but a necessity.

A modern DBMS has the ability to define business rules to the DBMS instead of in a separate, application program. Specifically, all of the most popular RDBMS products support an array of complex features and components to facilitate procedural logic. Procedural DBMS facilities are being driven by organizations as they move to become e-businesses.

As the DBMS adapts to support more procedural capabilities organizations must modify and expand the way they handle database management and administration. Typically, as new features are added, the administrative, design, and management of these features is assigned to the database administrator (DBA) by default. But simply dumping these new administrative burdens on the already overworked DBA staff may not be the best approach. But “DBA-like duties” are required to effectively manage these procedural elements.

The Classic Role of the DBA

Every database programmer has their favorite “curmudgeon DBA” story. You know, those famous anecdotes that begin with “I have a problem...” and end with “...and then he told me to stop bothering him and read the manual.” DBAs simply do not have a “warm and fuzzy” image. This probably has more to do with the nature and scope of the job than anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization.

To make matters worse, the role of the DBA has expanded over the years. In the pre-relational days, both database design and data access was complex. Programmers were required to code program logic to navigate through the database and access data. Typically, the pre-relational DBA was assigned the task of designing the hierarchic or network database design. This process usually consisted of both logical and physical database design, although it was not always recognized as such at the time. After the database was designed and created, and the DBA created backup and recovery jobs, little more than space management and reorganizations were required. I do not want to belittle these tasks. Pre-relational DBMS products (such as IMS and IDMS) require a complex series of utility programs to be run in order to perform backup, recovery, and reorganization. This can consume a large amount of time, energy, and effort.

As RDBMS products gained popularity, the role of the DBA expanded. Of course, DBAs still designed databases, but increasingly these were generated from logical data models created by data administrators and data modelers. Now the DBA has become involved in true logical design and must be able to translate a logical design into a physical database implementation. Relational database design still requires physical implementation decisions such as indexing, denormalization, and partitioning schemes. But, instead of merely concerning themselves with physical implementation and administration issues, relational DBAs must become more intimately involved with procedural data access. This is so because the RDBMS creates data access paths.

As such, the DBA must become more involved in the programming of data access routines. No longer are programmers navigating through data; now the RDBMS does that. Optimizer technology embedded in the RDBMS is responsible for creating the access paths to the data. And these optimization choices must be reviewed – usually by the DBA. Program and SQL design reviews are now a vital component of the DBA’s job. Furthermore, the DBA must tackle additional monitoring and tuning responsibilities. Backup, recover, and REORG are just a start. Now, DBAs use EXPLAIN, performance monitors, and SQL analysis tools to proactively administer RDBMS applications.

Oftentimes, DBAs are not adequately trained in these areas. It is a distinctly different skill to program than it is to create well-designed relational databases. DBAs must understand application logic and programming techniques to succeed.

And now the role of the DBA expands even further with the introduction of database procedural logic.

The Trend of Storing Process With Data

Today’s modern RDBMS stores procedural logic in the database, further complicating the job of the DBA. The popular RDBMSs of today support database-administered procedural logic in the form of stored procedures, triggers, and user-defined functions (UDFs).

Stored procedures can be thought of as programs that are maintained, administered, and executed through the RDBMS. The primary reason for using stored procedures is to move application code off of a client workstation and on to the database server to reduce overhead. A client can invoke the stored procedure and then the procedure invokes multiple SQL statements. This is preferable to the client executing multiple SQL statements directly because it minimizes network traffic thereby enhancing performance. A stored procedure can access and/or modify data in one or more tables. Basically, stored procedures work like "programs" that "live" in the RDBMS.

Triggers are event-driven specialized procedures that are stored in, and executed by, the RDBMS. Each trigger is attached to a single, specified table. Triggers can be thought of as an advanced form of "rule" or "constraint" written using procedural logic. A trigger can not be directly called or executed; it is automatically executed (or "fired") by the RDBMS as the result of an action—usually a data modification to the associated table. Once a trigger is created it is always executed when its "firing" event occurs (update, insert, delete, time, etc.).

A user-defined function, or UDF, is procedural code that works within the context of SQL statements. Each UDF provides a result based upon a set of input values. UDFs are programs that can be executed in place of standard, built-in SQL scalar or column functions. A scalar function transforms data for each row of a result set; a column function evaluates each value for a particular column in each row of the results set and returns a single value. Once written, and defined to the RDBMS, a UDF can be used in SQL statements just like any other built-in functions.

Stored procedures, triggers, and UDFs are just like other database objects such as tables, views, and indexes, in that they are controlled by the DBMS. These objects are often collectively referred to as database code objects, or DBCOs, because they are actually program code that is stored and maintained by a database server as a database object. Depending upon the particular RDBMS implementation, these objects may or may not “physically” reside in the RDBMS. They are, however, always registered to, and maintained in conjunction with, the RDBMS.

Database Code Objects and e-Business

The drive to develop Internet-enabled applications has led to increased usage of database code objects. DBCOs can reduce development time and everyone knows that web-based projects are tasked out in web time – there is a lot to do but little time in which to do it. DBCOs help because using they promote code reusability. Instead of replicating code on multiple servers or within multiple application programs, DBCOs enable code to reside in a single place: the database server. DBCOs can be automatically executed based upon context and activity or can be called from multiple client programs as required. This is preferable to cannibalizing sections of program code for each new application that must be developed. DBCOs enable logic to be invoked from multiple processes instead of being re-coded into each new process every time the code is required.

An additional benefit of DBCOs is increased consistency. If every user and every database activity (with the same requirements) is assured of using the DBCO instead of multiple, replicated code segments, then you can assure that everyone is running the same, consistent code. If each individual user used his or her own individual and separate code, no assurance could be given that the same business logic was being used by everyone. Actually, it is almost a certainty that inconsistencies will occur. Additionally, DBCOs are useful for reducing the overall code maintenance effort. Because DBCOs exist in a single place, changes can be made quickly without requiring propagation of the change to multiple workstations.

Another common reason to employ DBCOs to enhance performance. A stored procedure, for example, may result in enhanced performance because it may be stored in parsed (or compiled) format thereby eliminating parser overhead. Additionally, stored procedures reduce network traffic because multiple SQL statements can be invoked with a single execution of a procedure instead of sending multiple requests across the communication lines.

UDFs in particular are used quite often in conjunction with multimedia data. And many e-business applications require multimedia instead of static text pages. UDFs can be coded to manipulate multimedia objects that are stored in the database. For example, UDFs are available that can play audio files, search for patterns within image files, or manipulate video files.

Finally, DBCOs can be coded to support database integrity constraints, implement security requirements, and support remote data access. DBCOs are useful for creating specialized management functionality for the multimedia data types required of leading-edge e-business applications. Indeed, there are many benefits provided by DBCOs.

Database Code Object Programming Languages

Being application logic, most server code objects must be created using some form of programming language. Check constraints and assertions do not require procedural logic as they can typically be coded with a single predicate. Although different RDBMS products provide different approaches for DBCO development, there are three basic tactics employed:

Use a proprietary dialect of SQL extended to include procedural constructs
Use a traditional programming language (either a 3GL or a 4GL)
Use a code generator to create DBCOs

The most popular approach is to use a procedural SQL dialect. One of the biggest benefits derived from moving to a RDBMS is the ability to operate on sets of data with a single line of code. Using a single SQL statement, multiple rows can be retrieved, modified, or removed. But this very capability limits the viability of using SQL to create server code objects. All of the major RDBMS products support procedural dialects of SQL that add looping, branching, and flow of control statements. The Sybase and Microsoft language is known as Transact-SQL, Oracle provides PL/SQL, and DB2 uses a more ANSI standard language simply called SQL procedure language. Procedural SQL has major implications on database design.

Procedural SQL will look familiar to anyone who has ever written any type of SQL or coded using any type of programming language. Typically, procedural SQL dialects contain constructs to support looping (while), exiting (return), branching (goto), conditional processing (if...then...else), blocking (begin...end), and variable definition and usage. Of course, the procedural SQL dialects (Transact-SQL, PL/SQL, and SQL Procedure Language) are incompatible and can not interoperate with one another.

The second approach is one supported by DB2 for OS/390: using a traditional programming languages to develop for stored procedures. Once coded the program is registered to DB2 and can be referenced by SQL procedure calls.

A final approach is to use a tool to generate the logic for the server code object. Code generators can be used for any of RDBMS that supports DBCOs, as long as the code generator supports the language required by the RDBMS product being used. Of course, code generators can be created for any programming language.

Which is the best approach? Of course, the answer is “It depends!” Each approach has its strengths and weaknesses. Traditional programming languages are more difficult to use but provide standards and efficiency. Procedural SQL is easier to use and more likely to be embraced by non-programmers, but is non-standard from product to product and can result in sub-optimal performance.

It would be nice if the developer had an implementation choice, but the truth of the matter is that he must live with the approach implemented by the RDBMS vendor.

The Duality of the DBA

Once DBCOs are coded and made available to the RDBMS, applications and developers will begin to rely upon them. Although the functionality provided by DBCOs is unquestionably useful and desirable, DBAs are presented with a major dilemma. Now that procedural logic is being stored in the DBMS, DBAs must grapple with the issues of quality, maintainability, and availability. How and when will these objects be tested? The impact of a failure is enterprise-wide, not relegated to a single application. This increases the visibility and criticality of these objects. Who is responsible if they fail? The answer must be—a DBA.

With the advent of DBCOs, the role of the DBA is expanding to encompass far too many duties for a single person to perform the role capably. The solution is to split the DBA’s job into two separate parts based upon the database object to be supported: data objects or database code objects.

Administering and managing data objects is more in line with the traditional role of the DBA, and is well-defined. But DDL and database utility experts can not be expected to debug procedures and functions written in C, COBOL, or even procedural SQL. Furthermore, even though many organizations rely upon DBAs to be the SQL experts in the company, often times they are not – at least not DML experts. Simply because the DBA knows the best way to create a physical database design and DDL, does not mean he will know the best way to access that data.

The role of administering the procedural logic in the RDBMS should fall upon someone skilled in that discipline. A new type of DBA must be defined to accommodate DBCOs and procedural logic administration. This new role can be defined as a procedural DBA.

The Role of the Procedural DBA

The procedural DBA should be responsible for those database management activities that require procedural logic support and/or coding. Of course, this should include primary responsibility for DBCOs. Whether DBCOs are actually programmed by the procedural DBA will differ from shop-to-shop. This will depend on the size of the shop, the number of DBAs available, and the scope of DBCO implementation. At a minimum, the procedural DBA should participate in and lead the review and administration of DBCOs. Additionally, he should be on call for DBCO failures.

Other procedural administrative functions that should be allocated to the procedural DBA include application code reviews, access path review and analysis (from EXPLAIN or show plan), SQL debugging, complex SQL analysis, and re-writing queries for optimal execution. Off-loading these tasks to the procedural DBA will enable the traditional, data-oriented DBAs to concentrate on the actual physical design and implementation of databases. This should result in much better designed databases.

The procedural DBA should still report through the same management unit as the traditional DBA and not through the application programming staff. This enables better skills sharing between the two distinct DBA types. Of course, there will need to be a greater synergy between the procedural DBA and the application programmer/analyst. In fact, the typical job path for the procedural DBA should come from the application programming ranks because this is where the coding skill-base exists.

Synopsis

As organizations begin to implement more procedural logic using the capabilities of the RDBMS, database administration will become increasingly more complicated. The role of the DBA is rapidly expanding to the point where no single professional can be reasonably expected to be an expert in all facets of the job. It is high time that the job be explicitly defined into manageable components.

From DBAzine, October 2001.

Home.