Craig S. Mullins Database Performance Management |
|
December 2001 | |
The DBA Corner by Craig S. Mullins Database
Fragmentation and Disorganization Relational
technology and SQL make data modification easy. Just issue an insert,
update, or delete statement with the appropriate WHERE clause, and the
RDBMS takes care of the actual data navigation and modification. In
order to provide this level of abstraction, the RDBMS handles the
physical placement and movement of data on disk. Theoretically, this
makes everyone happy. The programmer's interface is simplified and the
RDBMS takes care of the hard part — manipulating the data and
coordinating its actual placement on disk. However, things are not
quite that simple. The manner in which the RDBMS physically manages
data can cause subsequent performance problems to arise. Every
DBA has experienced a situation where an application slows down after
it has been in production for awhile. But the reason why is not always
evident. Perhaps the number of transactions issued has increased or
maybe the volume of data has increased. But for some problem these
factors may not seem to be cause enough for such a large performance
degradation. The problem might be with disorganized data in the
database. Database disorganization occurs when a database's logical
and physical storage allocations contain many scattered areas of
storage that are too small, not physically contiguous, or too
disorganized to be used productively. To
understand how performance can be impacted by database
disorganization, let's examine a “sample” database as
modifications are made to data. Assume that a table space exists that
consists of three tables across multiple blocks. As we begin our
experiment each table is contained in contiguous blocks on disk. No
table shares a block with any other. Of course, the actual operational
specifics will depend on the DBMS being used as well as the type of
table space, but the scenario is generally applicable to any database
at a high level. Now
let's make some data changes to the tables in this table space. First,
let's add six rows to the second table. But no free space exists into
which these new rows can be stored. How can the rows be added? Oracle
requires that another extent is taken into which the new rows can be
placed. For
the second change, let's update a row in the first table to change a
variable character column; for example, let's change the LASTNAME
column from "SMITH" to "DOSTOEVSKY". This update
results in an expanded row size because the value for LASTNAME is
longer in the new row: " DOSTOEVSKY" contains 10 characters
whereas "SMITH" only consists of 5. After
these changes the resultant table space most likely will be
disorganized. The type of data changes that were made can result in
fragmentation, row chaining, and declustering. Fragmentation
is a condition in which there are many scattered areas of storage in a
database that are too small to be used productively. It results in
wasted space, which can hinder performance. When
updated data does not fit in the space it currently occupies Oracle
will find space for the row using either row chaining or row
migration. With row chaining Oracle will move a part of the new,
larger row to a location within the table space where free space
exists. With row migrations the full row is placed elsewhere in the
segment. In each case a block-resident pointer is used to locate
either the rest of the row or the full row. Both row chaining and row
migration will result in multiple I/Os being issued to read a single
row. This will cause performance to suffer because multiple I/Os are
more expensive than a single I/O. Finally,
declustering occurs when there is no room to maintain the order of the
data on disk. When clustering is used a clustering key is specified
composed of one or more columns. When data is inserted to the table
the DBMS attempts to insert the data in sequence by the values of the
clustering key. If no room is available the DBMS will insert the data
where it can find room. Of course, this declusters the data which can
significantly impact the performance of sequential I/O operations.
Reorganizing Table Spaces To
minimize fragmentation and row chaining, as well as to re-establish
clustering, database objects need to be restructured on a regular
basis. This process is also known as reorganization. The primary
benefit is the resulting speed and efficiency of database functions
because the data is organized in a more optimal fashion on disk. In
short, reorganization is useful for any database because data
inevitably becomes disorganized as it is used and modified. DBAs
can reorganize “manually” by completely rebuilding databases. But
such a reorganization requires a complex series of steps to
accomplish, for example: · Backup the database ·
Export the data ·
Delete the database object(s) ·
Re-create the database object(s) ·
Sort the exported data (by the clustering key) ·
Import the data Reorganization
usually requires the database to be down. The high cost of downtime
creates pressures both to perform and to delay preventive maintenance
— a familiar quandary for DBAs. Third party tools are available that
automate the manual process of reorganizing tables, indexes, and
entire table spaces — eliminating the need for time- and
resource-consuming database rebuilds. In addition to automation, these
type of tools typically can analyze whether a reorganization is needed
at all. Furthermore, ISV reorg tools operate at very high speeds to
reduce the duration of outages. Some
modern reorganization tools enable database structures to be
reorganized while the data is up and available. To accomplish an
online reorganization, the database structure being reorganized is
copied to another location on disk. Then this “shadow” copy is
reorganized. When the shadow reorganization is complete, the reorg
tool “catches up” by reading the log to apply any changes that
were made during the online reorganization process. Performing
database maintenance online in this manner enhances availability. And
that is crucial for database objects that support e-business
applications. Synopsis Reorganizations
can be costly in terms of downtime and computing resources. And it can
be difficult to determine when a reorganization will actually create
performance gains. However, the performance gains that can be accrued
are tremendous when fragmentation and disorganization exist. The wise
DBA will plan for regular database reorganization based on an
examination of the data to determine if the above types of
disorganization exist within their corporate databases.
From Database
Trends and Applications, November 2001. © 2001 Craig S. Mullins, All rights reserved. |