Craig S. Mullins Database Performance Management |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
September 1999 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What's
In A Name By
Craig S. Mullins The
establishment and enforcing of naming conventions is often one of the
first duties to be tackled when implementing new software. Adequate thought and preparation is required in order for
such a task to be successful. What
amount of effort should be extended in the creation of appropriate
database naming standards? Are
current industry standards acceptable?
Shakespeare, many, many years ago, may have said it best when
he wrote: "What's in a name?
That which we call a rose by any other name would
smell as sweet." But,
if that is true, then why do those of us in the DP industry spend so
much time and effort developing and enforcing naming standards? Maybe what something is called is not quite so trivial a
matter after all! Well,
we know what Shakespeare was trying to convey:
the name by which we call something has no affect upon the
actual object. Calling a desk a lamp will not turn
it into a lamp. It is
still a desk. Sometimes
DP professionals, and database developers in particular, forget this. This article will provide a novel approach to database naming
standards by ignoring all of the basic generalizations that were
foisted upon us in Database 101 class and taking a fresh look at the
naming of relational database objects.
The article will concentrate on DB2 objects, but the arguments
made in the article are just as valid for other relational databases
as well. Tables
and Their Cousins In
a relational database management system the primary data store is the
table. A table consists
of multiple rows, each with a fixed and unchanging number of defined
columns. However, most RDBMS provide alternate means of accessing data
from these tables. For
example, DB2 allows the following:
Each
of these alternate means of access are similar in one way:
they all present data by means of values in rows and
columns. An end
user need not know whether he is querying a table, an alias, a
synonym, or a view. To
the user the results are the same – data represented by values
in rows and columns. This
brings us to our first revision to the common wisdom for
database naming standards: Revision #1:
Use the exact same naming convention for tables, views,
aliases, and synonyms. These
four objects all logically refer to the same thing in the
relational model – a representation of data in terms of
columns and rows. It
is common for most shops to implement different naming
conventions for each of these objects.
Does it really make sense for them to named differently?
Let's examine the pros and cons. Consider
your current table naming conventions.
If your shop is typical, you will find a convention that
is similar to the one depicted in Figure 1.
If they are different, pause for a moment to ask yourself
why. This is almost
an industry standard for table naming.
You most surely do not want force every DB2 table to
begin with a T (or have a strategically embedded T within the
table name). The
name of a DB2 table should accurately and succinctly convey the
contents of the data it contains.
The naming convention displayed in Figure 1 accomplishes
this. Figure
1. Typical Table Naming Standard Prototype:
loc.cr.aaa_tab Each
table name is composed of the following segments:
loc.
Location Identification;
Unique Remote or Local
Note separator (.) between location and creator.
cr.
Creator Identification;
creator or owner.
aaa_
Application System;
3 characters.
Note separator ( _ ) between appl system and table
identification.
tab
Table Identification;
limited to 14 characters. So
this brings us to our second rule for revising database naming
conventions: Revision #2:
Avoid embedding a "T", or any other character
, into table names to indicate that the object is a table.
Likewise, indicator characters should be avoided for
any other table-like
object (alias, synonym, view). Although
most shops avoid embedding a "T" in table name, many
of these same shops do embed a character into alias, synonym,
and view names. The
primary reason given is that the character makes it easy to
determine what type of object is being accessed just by looking
at the name. There
are two reasons why this is a bad idea.
The first is a semantic reason; the second a flexibility
issue. In
semantic terms, an object's name need only identify the object,
not the object's type. Consider
the following arguments:
How are people named?
Usually one can ascertain the gender of someone simply by
knowing their name but would you banish all males named Chris,
Pat, or Terry? Or
maybe all females named Chris, Pat, and Terry?
After all, men and women are different.
Shouldn't we make sure that all
men's names are differentiated from women's names? Maybe we should start all men's names with an M and all
women's names with a W? If
we did, we'd sure have a lot of Marks and Wendys, wouldn't we?
The point here is that context enables us to
differentiate men from women, when it is necessary.
The same can be said of database objects. How
are COBOL program variables named?
Do you name your 01, 05, and 77 level variable names
differently in your COBOL programs?
For example, do all 01 levels start with "O"
(for one), all 05 levels start with "F", and
all 77 levels start with "S"?
No? Why not?
Isn't this the same as forcing views to start with V (or
having a strategically imbedded V within the name)?
What about the naming of pets?
Say I have a dog, a cat, and a bird.
Now, I wouldn't want to get them confused, so I'll make
sure that I start all of my dog names with a D, cat names with a
C, and bird names with a B.
So, I'll feed C_FELIX in the morning, take D_ROVER for a
walk after work, and make sure I cover B_TWEETY's cage before I
go to bed. Sound
ridiculous, doesn't it? The
whole point of this tirade is that if we don't manufacture hokey
names in the real world, why would we want to do it with our DB2
objects? There is
really no reason to embed special characters into DB2 objects
names to differentiate them from one another.
It is very practical and desirable to name DB2 objects in
a consistent manner, but that consistent manner should be well
thought out and should utilize the system to its fullest
capacity wherever possible. The
second reason for this rule is to increase flexibility. Say, for example, that we have a table that for some reason
is significantly altered, dropped, or renamed.
If views are not constrained by rigid naming conventions
requiring an embedded "V" in the name, then a view can
be constructed that resembles the way the table used to look.
Furthermore, this view can be given the same name as the
old table. This
increases system flexibility. Most
users don't care whether they are using a table, view, synonym,
or alias. They
simply want the data. And,
in a relational database, tables, views, synonyms, and aliases
all logically appear
to be identical to the end user: as rows and columns.
It is true that there are certain operations that can not
be performed on certain types of views, and users who need to
know this will generally be sophisticated users.
For example, very few shops allow end users to update any
table they want using QMF, SPUFI, or some other tool that uses
dynamic SQL. Updates,
deletions, and insertions (the operations which are not
available to some views) are generally coded into application
programs and executed in batch or via on-line transactions.
The end user does need to query tables dynamically.
Now you tell me, which name will your typical end user
remember more readily when he needs to access his marketing
contacts: MKT_CONTACT
or VMKTCT01? Further
Arguments For Indicators Some
shops believe they have very valid reasons for embedding an
object type indicator character into database objects; view
names, in particular. Let's
examine these arguments: Point
Embedding a V into our view names enables our DBAs to
quickly determine which objects are views and which are tables. Counterpoint
Many of these shops do not embed a T into the table name,
but feel that a V in the view name is necessary.
It is believed that the DBA will be able to more easily
discern views from tables.
But, rarely do these shops force an S into synonym names
or an A into alias names. Even
if they do, it is usually overkill. Any good DBA already knows which objects are tables and which
are views, and if s/he doesn't, a simple query against the
system catalog will clarify the matter.
For
example, in DB2, this query will list all table-like objects:
SELECT NAME,
CREATOR, "TABLE"
FROM SYSIBM.SYSTABLES
WHERE TYPE = "T"
UNION ALL
SELECT NAME, CREATOR, "ALIAS"
FROM SYSIBM.SYSTABLES
WHERE TYPE = "A"
UNION ALL
SELECT NAME, CREATOR, "SYNONYM"
FROM SYSIBM.SYSSYNONYMS
UNION ALL
SELECT NAME, CREATOR, "VIEW"
FROM SYSIBM.SYSVTREE
ORDER BY 3, 1 Point
It is necessary to code view names differently so that
users understand that they are working with a view and not all
operations can be performed on the view. Counterpoint
But all operations can be performed on some views.
And not all operations can be performed on all tables
either! What if the
user does not have the security to perform the operation?
For example, what is the difference, from the user's
perspective, between accessing a non-updateable view and accessing
a table where only the SELECT privilege has been granted? Use
It or Lose It Another
common problem with database naming conventions is unnecessary
size restrictions. Using
DB2 as an example, most objects can have a name up to 18
characters long. But,
in many instances, shops establish naming standards that do not
utilize all of the characters available.
This is usually unwise. Revision #3:
Unless a compelling reason exits, ensure that your
standards allow for the length of database object names to
utilize every character available. Refer
to Figure 2 for maximum and recommended DB2 object name lengths.
Notice that except for indexes, the recommended length is
equal to the maximum length for each object.
Figure 2.
DB2 Object Reference
* where TYPE = T
** where
TYPE = A Why
are indexes singled out in DB2?
This is an example of a compelling reason to
bypass the general recommendation.
Developers can explicitly name DB2 indexes,
but they can not explicitly name DB2 index spaces. Yet, every DB2 index requires an index space name.
The index space name will be implicitly
generated by DB2 from the index name.
If the index name is 8 characters or less in
length, then the index space name will be the same
as the index name.
However, if the index name is greater than 8
characters long, DB2 will use an internal,
proprietary algorithm to generate a unique, 8 byte
index space name.
As this name can not be determined prior to
index creation, it is wise to limit the length of
index names to 8 characters.
This is a good example of the maxim that
there are exceptions to every rule. Embedded
Meaning One
final troublesome naming convention is embedding
specialized meaning into database object names.
The name of an object should reflect what
that object is or represents. However, it should not attempt to define the object.
Revision #4:
Do not embed specialized meaning into
database object names. Let's
examine this revised rule by means of an example. Some shops enforce DB2 index naming conventions such that the
type of index is embedded in the index name.
Consider the standard shown in Figure 3. Figure
3. Typical Index Naming Standard
(Not Recommended) Prototype:
cr.aaaXtttl Each
index name is composed of the following segments:
cr.
Creator Identification;
creator or owner.
Example:
PTMKTP01
(for
Marketing system
database)
aaa
Application System;
3 characters.
Example:
MKT
(for
Marketing system)
X
Index Indicator; constant X.
ttt
Table Name Identification;
3 characters.
abbreviation of corresponding table name.
Example:
CON
(for
MKT_CONTACT)
l
Index Letter;
1 character.
P
for index corresponding to Primary Key
F
for index corresponding to a Foreign Key
U
for unique index
N
for non-unique index Note
two potential problem areas with this standard: 1.
An embedded X identifies this object as an
index. 2.
Embedded meaning in the form of indicators
detailing the type of index.
The
embedded indicator character "X", although
unnecessary, is not as evil as indicator characters
embedded in table-like objects.
Indexes are not explicitly accessed by users.
Therefore, obscure or difficult to remember
naming conventions are not a problem. The same arguments hold true for table space names.
In
fact, indicator characters may actually be helpful
to ensure that table spaces and indexes are never
named the same.
Table spaces and indexes, within the same
database, can never be named the same.
DB2 uses a name generation algorithm to
enforce uniqueness if it is attempted. This
exception results in another revised rule: Revision #5:
If you must use indicator characters in
database names, use them only in objects which are
never explicitly accessed by end users. The
second potential problem area poses quite a bit of
trouble. Consider the following cases which would cause the embedded
meaning in the index name to be incorrect: 1.
The primary key is dropped. 2.
A foreign key is dropped. 3.
The index is altered from non-unique to
unique (or vice versa) using a database alteration
tool. 4.
What if an index is defined for a foreign
key, but is also unique?
Should we use an "F" or a
"U"?
Or do we need another character? I
have also seen naming conventions which indicate
whether the index is clustering ("C") or
not ("N").
This is not a good idea either.
Misconceptions can occur.
For example, in DB2, if no clustering index
is explicitly defined, DB2 will use the first index
created as a clustering index.
Should this index be named with an embedded
"C" or not? Let's
look at one final example from the real world to
better illustrate why it is a bad idea to embed
specialized meaning into names.
Consider what would happen if we named
corporations based upon what they produce.
When IBM began, they produced typewriters.
If we named corporation like we name database
objects, the company could have been named based
upon the fact that they manufactured typewriters
when they began.
IBM might have been called TIBM (the
"T" is for typewriters).
And guess what, they don't make typewriters
any longer. What would we do? Rename
TIBM or live with a name that is no longer relevant? Synopsis Naming
conventions evoke a lot of heated discussion.
Everybody has their opinion as to what is the
best method for naming database objects.
Remember, though, that it is best to keep an
open mind. Johnny
Cash may have been upset that his father gave him a
girl's name in the hit song "A Boy Named
Sue," but that was before he knew why.
If this article caused you to think about
naming conventions from a different perspective,
then I will consider it a success. From Database
Trends, September 1999. |