1. Data Types Revisited

1. Data Types Revisited
Prev	Chapter 4. More Nuts and Bolts	Next

Let's begin by reviewing a few definitions:

A data type is a category of data values.
A literal is a data value.

When I introduced data types two chapters ago, I mentioned the int, double, and string data types. It's time to give you the complete picture.

1.1. Numeric Data

C# has not one, but several different data types for use with integer data. Take a look at the table. The difference between them involves the amount of memory they require and the range of integers each can represent (notice that as the amount of memory increases, so does the range of values that can be represented). A short takes only two bytes of memory, but if you need to represent the value 200,000, you have to use an int or a long.

Current computer architectures are designed to work most efficiently with int data. There's no reason to use byte or short unless you are working with a very large array (collection) of integers and memory becomes a concern. Since you don't know how to use arrays yet, you only need to think about using int and long. Unless you need the range of a long, you should normally use the int data type.

Table 4.1. Integer Data Types

data type	memory consumed	range of values
byte	1 byte	-128 to 127
short	2 bytes	-32,768 to 32,767
int	4 bytes	approx. -2 billion to +2 billion
long	8 bytes	approx. -9 quintillion to 9 quintillion

You also have more than one option when it comes to floating point numbers (see below). Although the float data type requires less memory than double, it also provides a narrower range and, more importantly, less precision. There's no reason to use the float type unless you are working with a large array and memory becomes a concern.

Table 4.2. Decimal Data Types

data type	memory consumed	range of values	significant digits of precision
float	4 bytes	+/- 10³⁸	8
double	8 bytes	+/- 10³⁰⁸	16

1.2. Character Data

C# offers two data types for working with character data (letters, digits, and other symbols). You are already familiar with the string data type. A string value is a sequence of 0 or more characters inside double quotes. In addition to normal characters, a string literal may contain escape sequences. An escape sequence is a sequence of two characters, beginning with a backslash (\), which is used to embed a special character in the string. Here are the common escape sequences:

Table 4.3. Escape Sequences

\n	newline
\r	carriage return (no line feed)
\t	tab
\'	single quote
\"	double quote
\\	backslash

Here are some examples of using strings with embedded escape sequences:

Console.WriteLine("Suzie said, \"I love C#.\"");

outputs:

Suzie said, "I love C#."

Console.WriteLine("This string is split\ninto \\two\\ lines.");

outputs:

This string is split

into \two\ lines

Thus, there are two symbols you have to be careful about including in a string value: backslash (\) and double quote ("). Since a backslash signals the beginning of an escape sequence, if you want a string value with a regular backslash in it, you must type two backslashes. Since a double quote usually indicates the end of a string value, if you want a string value with a double quote in it, you must precede the quote with a backslash. For example, the following statement is illegal, because it contains an unescaped backslash and unescaped double quotes:

Console.WriteLine("I say, "This \ won't compile.""); // THIS IS AN ILLEGAL STRING

In addition to the string type, C# provides the char data type. The char type differs from the string type in the following ways:

A char literal is written using single quotes, like this:
'A'
A char value is always a single character. A string can contain any number of characters; it can even be empty. You can't have an empty char value. If necessary, you can use a space as a char value, like this:
' '
If you ever attempt to write an "empty" char value with '', or a char value with more than one character ('fred'), the C# compiler will report an error.

The char type has the following advantages:

It takes much less memory than a string (even an empty string!)
char values can be easily converted to numbers for special processing needs

By the way, knowing when to use quotes, and which kind to use, is an important matter in C# programs. As an example, the values

'0'

"0"

mean three very different things in C#. The first is an int, the second a char, and the third a string. You can do arithmetic with the first, but not the others. (Actually, C# lets you perform arithmetic operations with chars due to a quirk in C#'s language design, but you won't get the results you expect. I won't discuss that further.)

1.3. Logical Data

C# provides a data type for working with logical values: bool. The bool data type contains only two values, whose literals, true and false, are written just like that: in lowercase, without quotes. For example, the following code fragment shows how you could define a boolean variable and store a value in it:

bool ok = true; 

// do some processing, possibly set ok to false if a problem occurs
// ...

if (ok == true) {
  // do some stuff...
}

The bool type is often used for variables that indicate success / failure. It is also used in if statements and while loops.

1.4. Other Data

All of the data types we've discussed in this section have something in common. They are C#'s fundamental data types (called "value" types). The rest of C#'s data types (like string) are called "reference" types. Reference types are different from the value types in several ways. We'll explore some of these differences later in this chapter, but as an introduction, let me explain why they are called "reference" types.

A reference is something that refers to something else. In the Bible, for example, references (like John 3:16) refer to specific verses. If I say "One of the most important verses in the Bible about Salvation is John 3:16," you would know where to go to find the verse I'm referring to. "John 3:16" is not the verse; it is something that tells you where the verse is located.

In computer science, a reference is the address of a memory cell. Recall that RAM consists of a collection of memory cells, each with a unique numeric address. When you store a value in a value-type variable with an assigment statement, C# puts the value in the variable's memory space. But when you store a value in a reference variable, C# doesn't do that. Instead of storing the value in the variable, it puts a reference to the value in the variable. In other words, reference variables hold the memory address of the value, not the value itself. The actual value is stored somewhere else in memory.

For example, consider these two statements:

int x = 5;
string msg = "I am not a primitive value.";

After these statements execute, the variable x holds the value 5. But the variable msg does not hold text. Instead, it holds the address of a memory area that holds that text "I am not a primitive value." As an example of what this might look like, see Figure 4.1, “Reference vs. Value Variables”.

Figure 4.1. Reference vs. Value Variables

Often in diagrams showing variables that hold references, rather than inventing dummy addresses, a graphical pointer notation is used instead, like this:

If this reference business seems odd to you, I agree. It is rather strange. But it does have certain advantages. For example, consider what happens when you copy a reference value:

string msg, msg2;
string msg2;

msg = "I am not a primitive value.";
msg2 = msg;

Focus on the last line: msg2 = msg. The computer copies the reference in msg into msg2. This causes msg2 to refer to the same string that msg refers to:

References are rather small (4 bytes), and it's much faster to copy a reference than to copy an entire string value. Also, msg2 and msg now share the same value, rather than having separate copies, so less memory is used.

This technical difference between the way value variables and reference variables hold data is one that you usually don't have to think about much when you're writing code. But I introduced it here because it helps to explain why C# works the way it does in certain situations, and I'll point them out as we go along.

1.5. A Word on Constants

A constant is a name created by a programmer that denotes a fixed value. Programmers create constants to increase the readability of their code, and to make it easier to change key values in a program.

Constants have a data type and a value, are created much like variables. Here are some examples of constant definition statements:

const int MAX_ITEMS = 3;
const char TEMP_KELVINS = 'K', TEMP_CELSIUS = 'C';

Note the use of the const keyword at the beginning of each constant definition statement. This tells the compiler that the definition is permanent and cannot be altered (in contrast to a variable definition, which can be changed with an assignment statement). Also, notice that the constant names are capitalized. Capitalizing constant names is not required by the compiler, but it is a standard naming convention used by C# programmers.

After a constant has been defined, it can be used in an expression like a regular variable:

int itemsUsed;
... set itemsUsed to some value ...
int itemsLeft = MAX_ITEMS - itemsUsed;

char tempType;
... get tempType from user ...
if (tempType == TEMP_KELVINS) { ... }

A constant's value cannot be changed after it has been defined. A constant name may not appear on the left-hand side of an assignment statement.