Craig S. Mullins |
|
September 1999 |
|
|
Using Nulls in SQL Serverby
Craig S. Mullins A
null is the way in which a relational database records missing or unknown
information. When you assign a null to a column instance, it means that a value
currently does not exist for that column. It is important to understand that a
column assigned to null logically means one of two things: 1.
the column does not apply to this row, or; 2.
the column applies to this row, but the information is not known at
present Nulls
are not the same as the value 0 or the value blank. Null means that no entry has
been made. Quite literally, a null is the lack of a value. Nulls
sometimes are inappropriately referred to as null values. Using the term value
to describe a null column is incorrect because the term null
implies the lack of a value. Even the SQL Server documentation uses the term
“null value,” which just serves to confuse users. By
using null, instead of a default value, you can distinguish between a deliberate
entry of 0 (for numerical columns) or a blank (for character columns) or any
actual, valid value from an unknown or inapplicable entry. For
example, suppose that a table contains information on the hair color of
employees. The HAIR_COLOR column is defined in the table as being capable of
accepting nulls. Three new employees are added today: a man with black hair, a
woman with unknown hair color, and a bald man. The woman with the unknown hair
color and the bald man both could be assigned a null HAIR_COLOR, but for
different reasons. The hair column color for the woman would be null because she
has hair but the color presently is unknown. The hair color column for the bald
man would be null also, but this is because he has no hair and so hair color
does not apply. SQL
Server does not differentiate between nulls that signify unknown data and those
that signify inapplicable data. This distinction must be made by the program
logic of each application. The relational model, as defined by its creator, E.F.
Codd, provides for the notion of i-marks and a-marks to differentiate between
inapplicable (i-mark) and applicable but unknown (a-mark). Codd introduced
“marks” instead of nulls late in the life cycle of the relational model in
what he refers to as RMV2 (Relational Model Version 2). However, no current
relational DBMS implementation supports these marks. Null or Not Null? For
each column defined in each table you create you must decide whether nulls
should be allowed. This is done in the
CREATE TABLE statement by specifying either NULL or NOT NULL after the data type
specification for the column. You can use the sp_help system procedure to find
out which columns are nullable, and which are not. So,
defining a column’s value to be null provides a place holder for data
that is not (yet) known. For example, in the titles table (of the sample pubs
database), price, advance, royalty, and ytd_sales are set up to allow nulls.
However, title_id and title do not allow nulls because the title_id is the key
which defines the row and hence can not be null and storing data about a book
where the title in unknown would not be a sound business practice. A price
without a title makes no sense, but a title without a price could indicate that
the book is just about to be published but the price has yet to be determined. In
general, when you create tables, define columns to be NOT NULL when the value
for the column is required to provide meaning to the other columns in the row. Nulls
should only be allowed where column values can be legitimately unknown and yet,
still cause the row to make sense. You must have specific knowledge of how the
absence of a value will impact the applications and queries using the data. In
practice, you may want to create nullable columns only in rare circumstances
because of the confusing behavior of nulls in relational queries. When
inserting data, the DEFAULT keyword can be used with nullable columns. This is
so even if there is no default defined. By default, the default for a nullable
column is NULL. Rules
and check constraints treat nulls as special conditions. A
column definition that allows nulls can have nulls inserted even if the rule or
constraint does not explicitly allow nulls. For example, consider a columns that
allows nulls and has a rule defined for it as follows:
(@variable > 10) This
would seem to indicate that only legitimate values greater than the number 10
can be stored in the column. However, a null can be stored as the default if no
value is specified for the column in an insert statement. The
Behavior of Nulls Avoid
nulls in columns that must participate in arithmetic logic (for example, money
values). The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column
occurrences set to null. The COUNT(*) function, however, does not omit columns
set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*)
when the average is being computed for a column that can contain nulls. For
example, the price column in the titles table is nullable, so the result of the
following query: select
avg(price is
not the same as for this query if any price is actually set to NULL: select sum(price)/count(*) even
though you may logically assume it to be equivalent. For this reason, avoid
nulls in columns involved in math functions. One
other note: the aggregate functions MIN, MAX, SUM, and AVG return NULL if no
rows are returned. So, even if none of your columns are nullable, you may need
to prepare for the possibility of a null result if you use these functions and no
rows satisfy the WHERE clause. Another
“apparent” quirk of nulls is that two nulls are not equivalent to one
another. This confuses many developers who test columns for equivalency in a
WHERE clause and expect two columns set to null to evaluate to true. But, just
to be difficult, there is an exception: nulls are
considered to be duplicates in a GROUP BY clause or an ORDER BY clause. Also,
nulls are considered to be equal when duplicates are eliminated by SELECT
DISTINCT or COUNT (DISTINCT column). However,
any expression using a comparison operator evaluates to false if any
of the operands is null. This means that a null never matches another value (not
even another null). Instead, the syntax IS [NOT] NULL is used to find nulls in
queries. So, for example, you can
specify:
where price is null to
check for instances where the price columns is set to null. This formulation
must be used. The following syntax
is not valid:
where price = null Nulls
and Time When
date and time columns can be unknown, it is usually best to assign the columns
as nullable. SQL Server checks to ensure that only valid dates and times are
placed in columns defined as such. If the column can be unknown, it must be
defined to be nullable because the only other option is to define a default for
these columns from the valid domain of dates and times. And this can be
confusing at best, or at worst cause problems similar to what the industry is
experiencing with the Year 2000 crisis. Avoid
Special Sequencing for Nullable Columns Treat
nullable columns the same as you would any other column in terms of their
“positioning” in your SQL Server tables. Some DBAs place nullable columns
after non-nullable columns in the table. This is supposed to assist in
administering the null columns, but in my opinion it does not. Sequencing
nullable columns in this manner provides no clear benefit and should be avoided. Synopsis Nulls
are one of the most potentially confusing concepts in relational technology.
Confusing or not, as a developer or DBA you need to understand how nulls work so
that your databases will be designed properly and your applications will
function correctly. As this article has demonstrated, nulls may need to be dealt
with even if none of your columns are defined as nullable, so there is no clear
and certain path to avoiding nulls. This
column has touched upon the basics of nulls: what they are, their functionality
and quirks, and guidelines for their implementation. If you are using nulls
extensively, it is wise to understand how they differ from traditional database
values. And please, don’t call a null a “null value,” if anything it is a
“null lack of a value.” ©
1999, Craig S, Mullins From SQL Server Update (Xephon) September 1999. © 1999 Craig S. Mullins, All rights reserved.
|