Nuggets
of Data Rambling Through My Mind
I
constantly have random database-related thoughts that
build up over time. I thought it would be useful and
interesting to group some of these thoughts together
into a column, sort of like Andy Rooney meets IT.
Do
you ever wonder why e-mail systems don't use database
management technology? Do you store some of your
e-mails for long periods of time? Do you group them
into folders? Are they hard to find later? And the
substance of the message isn't usually obvious from
the folder name you gave it (which made sense at the
time) or the subject of the e-mail (which might not
have anything to do with the actual content). I'd love
to be able to use SQL against my e-mail system,
writing something like:
SELECT
TEXT
FROM ALL OF MY E-MAIL FOLDERS
WHERE TEXT CONTAINS "NEW DB2 PRODUCT
ROLLOUT"
I
bet a DBMS that interfaces to or integrates with
e-mail would be a rousing success.
Have
you noticed that relational technology is taking a
beating these days? With new "data models"
purported to be better than relational, it seems that
the industry is all too ready to stick a fork in
relational technology.
There
are no true relational database management systems.
DB2 and Oracle and SQL Server, though based on the
relational model, do not support all of the features
that would make them truly relational. They are better
termed "SQL DBMS products." Before sounding
the death knell for relational, shouldn't we at least
try a fully relational DBMS first? Keep in mind that
the three largest software companies in the world
today are the three biggest providers of
relational--well, SQL--database technology: IBM,
Microsoft, and Oracle. These companies’ database
products are successful because they are useful, and,
in fact, help companies manage and access their data.
Why
is it important for my DBMS to be relational? My DBMS
works just fine as it is, how could relational help?
With SQL, you have a language that is non-orthogonal.
You can write the same data request in many ways. This
can cause performance problems and confuse
programmers. Another problem--current SQL database
systems permit the creation of tables that can contain
duplicate rows. The relational model is based on set
theory, and a set cannot contain duplicate members.
And there are significant features that current SQL
database systems omit, such as domains. A domain is
basically the set of all valid values for a particular
column (or attribute). It is flexible, so you can
define the valid values more precisely than a mere
data type. But a domain actually defines the valid
operations and comparisons that can be performed on
columns assigned to that domain. Because SQL DBMS
products lack domains, it is completely legal to code
the following predicates:
WHERE
AVG(SALARY) > PHONE_NUMBER
WHERE SHOE_SIZE = IQ
WHERE AGE = FIRST_NAME
These
are not useful predicates. But if each of these
columns were assigned to a different domain,
comparisons between them would be disallowed, because
comparisons between incompatible domains would be
disallowed. And values outside the scope of the domain
could not be assigned to the column. And these would
be good things because it would boost data integrity
and data quality.
|