Database Trends: "Database Fragmentation & Disorganization"

Craig S. Mullins Database Performance Management
Return to Home Page
December 2001
	The DBA Corner by Craig S. Mullins Database Fragmentation and Disorganization Relational technology and SQL make data modification easy. Just issue an insert, update, or delete statement with the appropriate WHERE clause, and the RDBMS takes care of the actual data navigation and modification. In order to provide this level of abstraction, the RDBMS handles the physical placement and movement of data on disk. Theoretically, this makes everyone happy. The programmer's interface is simplified and the RDBMS takes care of the hard part — manipulating the data and coordinating its actual placement on disk. However, things are not quite that simple. The manner in which the RDBMS physically manages data can cause subsequent performance problems to arise. Every DBA has experienced a situation where an application slows down after it has been in production for awhile. But the reason why is not always evident. Perhaps the number of transactions issued has increased or maybe the volume of data has increased. But for some problem these factors may not seem to be cause enough for such a large performance degradation. The problem might be with disorganized data in the database. Database disorganization occurs when a database's logical and physical storage allocations contain many scattered areas of storage that are too small, not physically contiguous, or too disorganized to be used productively. To understand how performance can be impacted by database disorganization, let's examine a “sample” database as modifications are made to data. Assume that a table space exists that consists of three tables across multiple blocks. As we begin our experiment each table is contained in contiguous blocks on disk. No table shares a block with any other. Of course, the actual operational specifics will depend on the DBMS being used as well as the type of table space, but the scenario is generally applicable to any database at a high level. Now let's make some data changes to the tables in this table space. First, let's add six rows to the second table. But no free space exists into which these new rows can be stored. How can the rows be added? Oracle requires that another extent is taken into which the new rows can be placed. For the second change, let's update a row in the first table to change a variable character column; for example, let's change the LASTNAME column from "SMITH" to "DOSTOEVSKY". This update results in an expanded row size because the value for LASTNAME is longer in the new row: " DOSTOEVSKY" contains 10 characters whereas "SMITH" only consists of 5. After these changes the resultant table space most likely will be disorganized. The type of data changes that were made can result in fragmentation, row chaining, and declustering. Fragmentation is a condition in which there are many scattered areas of storage in a database that are too small to be used productively. It results in wasted space, which can hinder performance. When updated data does not fit in the space it currently occupies Oracle will find space for the row using either row chaining or row migration. With row chaining Oracle will move a part of the new, larger row to a location within the table space where free space exists. With row migrations the full row is placed elsewhere in the segment. In each case a block-resident pointer is used to locate either the rest of the row or the full row. Both row chaining and row migration will result in multiple I/Os being issued to read a single row. This will cause performance to suffer because multiple I/Os are more expensive than a single I/O. Finally, declustering occurs when there is no room to maintain the order of the data on disk. When clustering is used a clustering key is specified composed of one or more columns. When data is inserted to the table the DBMS attempts to insert the data in sequence by the values of the clustering key. If no room is available the DBMS will insert the data where it can find room. Of course, this declusters the data which can significantly impact the performance of sequential I/O operations. Reorganizing Table Spaces To minimize fragmentation and row chaining, as well as to re-establish clustering, database objects need to be restructured on a regular basis. This process is also known as reorganization. The primary benefit is the resulting speed and efficiency of database functions because the data is organized in a more optimal fashion on disk. In short, reorganization is useful for any database because data inevitably becomes disorganized as it is used and modified. DBAs can reorganize “manually” by completely rebuilding databases. But such a reorganization requires a complex series of steps to accomplish, for example: · Backup the database · Export the data · Delete the database object(s) · Re-create the database object(s) · Sort the exported data (by the clustering key) · Import the data Reorganization usually requires the database to be down. The high cost of downtime creates pressures both to perform and to delay preventive maintenance — a familiar quandary for DBAs. Third party tools are available that automate the manual process of reorganizing tables, indexes, and entire table spaces — eliminating the need for time- and resource-consuming database rebuilds. In addition to automation, these type of tools typically can analyze whether a reorganization is needed at all. Furthermore, ISV reorg tools operate at very high speeds to reduce the duration of outages. Some modern reorganization tools enable database structures to be reorganized while the data is up and available. To accomplish an online reorganization, the database structure being reorganized is copied to another location on disk. Then this “shadow” copy is reorganized. When the shadow reorganization is complete, the reorg tool “catches up” by reading the log to apply any changes that were made during the online reorganization process. Performing database maintenance online in this manner enhances availability. And that is crucial for database objects that support e-business applications. Synopsis Reorganizations can be costly in terms of downtime and computing resources. And it can be difficult to determine when a reorganization will actually create performance gains. However, the performance gains that can be accrued are tremendous when fragmentation and disorganization exist. The wise DBA will plan for regular database reorganization based on an examination of the data to determine if the above types of disorganization exist within their corporate databases. From Database Trends and Applications, November 2001. © 2001 Craig S. Mullins, All rights reserved. Home.