Craig S. Mullins Database Performance Management |
||||||||||||||||||||||||||||||
April 1998 |
||||||||||||||||||||||||||||||
|
The Age of the VHDB By Craig S. Mullins Databases are growing in size. There is no denying that simple fact of life. I have talked to hundreds of DBAs and visited many sites and not a single one of them report that their databases are getting smaller. This has been true for many years, but the pace at which database sizes are growing is at an all time high. For several years now database experts have used the acronym VLDB, or very large database, to refer to the largest databases in production environments. However, some production databases are now approaching a petabyte in size. Refer to Figure 1 for an idea of how large a petabyte is and where we will inevitably go from there. It demeans a database of this size to call it "large"; this is a "huge" database. Hence the updated terminology VHDB, or very huge database.
There are many factors influencing organizations to support and foster this growth. People are creating data warehouses to enable analytical processing on vast amounts of historical data. And they are creating a lot of indexes on this data to enable rapid data access. Indexes require even more storage space, further increasing the overall size of the database. Data mining is fast becoming a requirement whereby heuristic algorithms are applied to historical data to automatically discover patterns in the data that can be exploited for competitive advantage. The more data there is, the better the quality of the data is, and the quality of the pattern discovery algorithms determines the value of the data mining applications. So people are inclined to store more data for a longer period of time. Hardware improvements also spur this growth along. The hard drive in my laptop computer is bigger than the first mainframe hard drives I worked with years ago. The ability to cheaply store multiple gigabytes of information enables the creation, storage, and access of these VHDBs. Since the cost is so minimal, why not store more data? But, unfortunately the speed of access has not kept up with the volume of storage available. The amount of storage space on a disk drive has grown nearly three orders of magnitude in the past 25 years. But the data exchange rate has changed only one order of magnitude in that same time. The increase in storage space vastly outpaced the increases in disk access speed. This causes hardware and DBMS vendors to keep pace by requiring additional main storage, caching data in memory, enabling parallel data access, and other techniques. This complicates database administration. The net result of increasing database size is that the largest production databases are unmanageable even with the best tools that money can buy. Manageability includes, but is not limited to:
From Computing News and Review, April 1998. © 1999 Mullins Consulting, Inc. All rights reserved. |