3. Processing Data

Let's look at another example.

Example 2.2. Area.cs

// Area.cs
// Calculates the area of a rectangle

using System;

class Area { 
  static void Main() { 

    int width, height, area;
    string units;
    
    width = 20;
    height = 30;
    units = "cm";
    
    area = width * height;
        
    Console.Write("A rectangle ");
    Console.Write(width);
    Console.Write(units);
    Console.Write(" by ");
    Console.Write(height);
    Console.Write(units);
    Console.Write(" has an area of ");
    Console.Write(area);
    Console.WriteLine(units);    

  } // main
} // Area

When you run the program, you see the following output:

A rectangle 20cm by 30cm has an area of 600cm

This program does some useful work and uses several language features. Let's tackle them one by one.

3.1. Using Comments, Punctuation, and Spacing

Programmers often leave notes to themselves in source code. These notes, called comments, are ignored by the compiler, because they begin with a special marker -- two slashes //. Often comments appear on a line by themselves, but since the compiler ignores everything from the // marker to the end of the line, sometimes they occur at the end of a C# statement.

You may have wondered about the use of punctuation in the program. I've already explained about the curly brackets that mark the beginning and end of the main method and the class. If you look at the program as the compiler does, removing the comments, you'll notice that every statement in the program ends with a semicolon. In C#, the semicolon functions like the period does in English: it terminates each statement.

C# is a free-form language. That means that the compiler is very permissive about how you use white space in your program. You can insert blank lines wherever you want; you can split a single statement across multiple lines; you can even put multiple statements together on the same line. In general, you can break a line before or after a piece of punctuation. For example, the statement "Console.Write("A rectangle ");" could be written

  Console
.
        Write
   (
"A rectangle "
)
    ;

This example isn't very readable, but it demonstrates the flexibility available to you. Text inside double quotes ("A rectangle ") must remain together on one line, but aside from that, the compiler allows you to lay out your program however you like. Your instructor will have certain style guidelines for you to follow in this matter.

3.2. Creating Variables

Programs that process information use variables to hold the data. Recall that a variable is a memory location that holds a value and has a name.

Before you can store information in a variable, you must first create one by writing a variable definition statement that looks like this:

data-type variable-name-list;

where variable-name-list is a comma-separated list of names for the variables you want to create, and data-type is the name of a valid C# data type. I will explain data types in just a minute. For now, look at the program above to find the variable definition statements near the beginning:

int width, height, area;
string units;

You already know that variables have names. A programmer usually tries to choose meaningful names, so that other programmers reviewing the code will have an idea of what sort of value is stored in a particular variable. Technically, a variable name can contain characters other than letters (digits and a few other symbols are legal -- fred26 and ax_13$r are legal names -- but spaces are not permitted), but I strongly suggest you stick with letters. For the most part, sticking with letters only will keep you out of trouble. There are a few names you can't use (things like int and string come to mind, because they already have assigned meanings), and the compiler will let you know if you try to define a variable with a name that's not permissible.

Now, about data types. Based on my description of a variable definition statement, you can see from the example above that int and string are two examples of something called a data type, but more explanation is needed. Let's begin with a little activity. Consider the following list of data values:

26, crumb, to be, -15.2, 0, 25.3, -3, -2.6, Alice hit a ball

Imagine that each value is written on a separate index card, and you have them spread out on the table in front of you. Take a moment and organize the cards into three groups, putting similar kinds of values together in the same group. Now, I know you don't actually have index cards to work with, but why don't you write down on a piece of paper the list of items you would group together. When you're finished, come back and continue reading. Go ahead and try it! Remember to organize the values into exactly three groups.

Done? Now, compare your groups to mine:

  • integers: 26, -3, 0

  • decimal numbers: 25.3, -15.2, -2.6

  • text values: crumb, to be, Alice hit a ball

You might have chosen different categories. I chose these groups because these are the categories of data found in C#. C# considers integers, decimal numbers, and text values to be three different types of data ("data types"). It probably makes sense that C# draws a distinction between text values and numbers, because you process numbers differently from text. For example, you can multiply two numbers together, but it doesn't make sense to multiply two pieces of text together. However, you might wonder why C# treats integers and decimal numbers differently. The reason is that computers store integers in memory differently than decimal values, and use different techniques to perform arithmetic on them. In particular, computers are much faster at performing calculations on integers than on decimal values. To a computer, a decimal number is a very different kind of data than an integer.

In C#, when you define a variable, you have to tell the compiler what category of data values (integer, decimal number, text value) you will be storing in the variable, so that it knows how much memory the variable will require, and how to store the information in the variable. These categories of data values are called data types. A data type, then, is simply a category (or group, or set) of data values.

In C#, each data type has a name. Here are the C# data type names that correspond to the categories I have been discussing:

  • int is the category of integers

  • double is the category of decimal numbers (called "floating point" numbers by computer scientists)

  • string is the category of text values

    (Note that string values can, and often do, contain digits, punctuation, and other symbols in addition to text. But most people think of strings primarily as text values, and I will continue to refer to them that way.)

In C#, a variable's value can change during the course of a program, but its data type is fixed and cannot change. When you create a variable, you tell the compiler which category of values you plan to store in that variable, and then the compiler will only allow you to place values of that type in the variable.

Now that I have described what a data type is and how you use it, take another look at the variable definition statements from the example above:

int width, height, area;
string units;

The variables width, height, and area can hold only integers, because we used the int data type when we created them; on the other hand units can hold only text values.

C# has several more data types for values such as logical values and individual symbols. As it happens, dealing with all these data types is a rather complicated subject -- one that I will defer to the next chapter. For the rest of this chapter, you only need to remember the two data types int and string.

3.3. Using Variables

Now that you know how to create a variable, let's talk about how you use them. You can basically do two things with a variable:

  • store a value in it

  • use its value

To store a value in a variable, you write an assignment statement like this:

variable-name = expression;

where variable-name is the name of a variable you have previously defined with a variable definition statement, and expression is a formula that produces a value. When the computer executes an assignment statement, it determines the value of the expression (programmers say it "evaluates the expression"), and stores the value in the variable. Since a variable can hold only a single value, the new value replaces whatever value the variable held previously.

Here's an example of an assignment statement from the program above:

area = width * height;

The expression in this statement, width * height, is a formula that produces a value. The * symbol is the multiplication operator in C#. When the computer executes this statement, it multiplies the values in the variables width and height together, and stores the resulting value in the variable area.

C# has several arithmetic operators, but for now, all you need to learn are the four basic ones: + (addition), - (subtraction), * (multiplication), and / (division). You should know that C# follows standard mathematical rules of precedence when evaluating expressions, and you can override the precedence using parentheses when you need to. You'll see examples of this later.

Take a look at some more of the assignment statements from the program above:

height = 30;
units = "cm";

The expressions in these two examples may not look like formulas, but they are, technically. The expressions 30 and "cm" consist of what programmers call literals. A literal is a data value that appears directly in a C# statement. In C#, text values must be written enclosed in double quotes, but numeric values are written without quotes. Thus, "cm" is recognized by the compiler as a string literal, because of the double quotes, but 30 is recognized as an int, because it meets the rules for int literals.

Something you should know about assignment statements is that C# requires that the data type of the value produced by the expression must be compatible with the data type of the variable. This is an important rule, and I'll try to explain using some examples. In the statement "height = 30", the data type of the expression 30 is int. The data type of height is also int, according to the variable definition statement where height was created. Thus, the statement is legal. It would not be legal to write "units = 30", because units is a string variable, and it would not be valid to attempt to store an int value in a string variable.

[Note]Note

The values 30 and "30" are treated differently by the compiler. The value "30" is recognized as a string, even though we do not think of it as text, because it is surrounded by quotes. This is a subtle point, but an important one. For example, the compiler would not accept the statement

height = "30";

as valid. It would report an error, telling you in effect that "you can't put a string value in an int variable."

The data type of the expression "width * height" is int, because the * operator produces an int value when given int operands. Thus, it is legal to use that expression in the assignment statement

area = width * height;

since the data type of the expression (int) is the same as the data type of area, the variable being changed.

If width and height were double variables, the expression "width * height" would produce a double, because the * operator produces a double when given double operands. The complete rules for determining the data type of a mathematical expression are too involved to cover here, but I will summarize the basic rule for expressions involving int and double values:

  • Rule: If a mathematical expression contains only int operands, the result is an int. Otherwise, when at least one double value is involved, the result is a double.

    For example, the result of 2 * 3 is 6, but 2.0 * 3.0 yields 6.0, as does 2 * 3.0.

  • Corollary: Division involving int operands yields an int result (ex. 7 / 3 yields 2). Further, the result is always truncated (the decimal portion is dropped, not rounded). Thus, 11 / 3 yields 3, not 4, as you might (reasonably) expect.

    Tip: If you want a division involving int variables to yield a double result, multiply the first value by 1.0, like this:

    1.0 * doublevalue1 / doublevalue2

    I will discuss rounding and other math techniques later.

The assignment statement is a flexible and powerful command. It can be used to copy a value from one variable to another, like this:

width = height;

Can you figure out which variable is being changed in this example?

Remember that the right-hand side of an assignment statement always produces the value that is stored in the variable on the left-hand side. This statement takes the value of height and stores it in width.

The assignment statement can also be used to increment a variable, like this:

width = width + 1;

The computer takes the current value of width, adds 1, and stores the resulting value back into width. This example highlights the difference between C#'s use of the = sign and its use in mathematical equations. In C#, = does not express that one quantity is equal to another quantity; instead, it means "take the value on the right side and store it in the memory associated with the variable on the left side." It may take some getting used to.

Before I leave this section, I want to mention that the variable definition statement allows you to assign an initial value to a variable, like this:

data-type variable = expression;

This way, you can define and initialize a variable in one statement.

3.4. Producing Output

Take a moment to recall our definition of an expression--a formula that produces a value. Expressions can range in complexity from simple literals to complicated mathematical formulas involving variables. They can be used in many ways in a C# program. One of the places you can use expressions is in output statements like this one:

Console.Write("A rectangle ");

This statement prints a message on the screen: "A rectangle ". Now read the following sentence very carefully. The message printed is the value of the expression that appears inside the parenthesis. To put it another way, when you write an output statement, you must put an expression in between the parenthesis; when the computer executes the statement, it evaluates the expression and displays the resulting value on the screen. In this example, the expression is a string literal.

Now that you know this, answer this question: Does the following statement display the word "width" on the screen?

Console.Write(width);

If you're not sure of the answer, compare the statement with this one:

Console.Write("width");

Both statements are valid, but the result is quite different. Can you tell what the difference is?

In the first statement, the expression width is a variable, so the value that is displayed is the value of the variable. In the second statement, the expression "width" is a string literal, so the text width is displayed.

Now, according to what you know, do you think this next example is valid?

Console.Write(width * height);

Answer: Yes! Since width * height is a valid expression, you can use it in an output statement.

How about this next one?

Console.Write(A rectangle);

Answer: No. What appears in between parenthesis is not a valid expression, for two reasons. First, since no quotes are used, C# would attempt to interpret A and rectangle as variable names, but neither A nor rectangle have been defined as variables. Secondly, even if they were defined as variables, you can't write two variables side-by-side with no operator between them. Sometimes beginning programmers write a statement like this one when they want the computer to display the text "A rectangle" on the screen, forgetting that text expressions must be enclosed in double quotes.

[Note]Note

If the compiler won't accept an output statement in your program as valid, ask yourself if what you put in parenthesis is a valid expression -- a valid formula you could have written on the right-hand side of an assignment statement. That kind of reasoning should help you to spot the problem and fix it.

Did you notice that two different methods are used to produce output in this program? Take a look and see if you can determine what the two are. Go ahead -- I'll wait.

The methods are Write() and WriteLine(), and they are similar: they both output the value of an expression. The difference is that the WriteLine() method causes the cursor to advance to the beginning of the next line after the value is output; the Write() method leaves the cursor on the same line. To understand this, imagine the output of the program is being typed by a robot on a typewriter. The WriteLine() method causes the robot to press the Enter key after typing the output value; the Write() method does not cause the robot to press the Enter key. Let's have a short example:

Console.Write("I");
Console.Write("like");
Console.Write("C#");

... would result in this output:

IlikeC#

because the robot didn't press Enter after typing each output value. If we had used the WriteLine() method instead, the robot would have typed this output:

I
like
C#

An output statement is a special case of something called a method call statement. Remember that a method is a named group of statements defined in a class. When a programmer wants to use a method, he writes a method call statement. A method call statement causes the computer to execute the statements defined in the method; programmers refer to this as "calling" the method. In this case, we are calling a predefined method named Write() that is defined in a class named Console.

Method call statements have the form

object-or-class-name . method-name ( expression-list );

where object-or-class-name is the name of a class or object that contains the method, method-name is the name of a method inside the class, and expression-list is a comma separated list of expressions that provide values for the method to use. The Write (and WriteLine) method will accept only one expression. Later, we'll use methods that require more than one expression.

By the way, even though the Write/WriteLine() methods accept only one expression, it is possible to output several values with a single print statement. Instead of writing all the individual statements in the program above, we could have written:

Console.WriteLine("A rectangle " + width + units + " by " + height + units + " has an area of " + area + units);

This single command displays exactly the same output as the multiple commands used in the program. Although the + operator is normally used to perform addition, when a string value is involved, the + operator performs concatenation instead. It converts all of the values in the expression to strings, and joins them together to form a single string value.

3.5. The using statement

There's just one last detail about the program I need to mention. Compare the output statement in the HelloWorld example in Section 2, “A First Program” with the output statements in the Area program. The HelloWorld example used System.Console.WriteLine, and the Area example used Console.Write (except for the last statement, which used Console.WriteLine).

Let me explain first about the System prefix used in HelloWorld. In C#, classes are organized into namespaces. Or, to put it another way, a namespace is a named collection of classes.

In just the same way as statements are grouped together in a method, and methods are grouped together in a class, classes are grouped together in a namespace. It's just another way to help programmers organize large programs (I told you that C# was designed to help manage big projects!). The Console class is in a namespace named System.

When you use a class, you must tell the compiler which namespace it is located in. One way to do this is to write its fully qualified name (namespace-name - dot - class-name, as in "System.Console"). If you use the class in several statements, that can get tedious to type, and tends to clutter the program with long names. Imagine, for example, each output statement in the program above starting "System.Console.Write(...".

The C# language provides the using statement to address this issue. Here's an example of the using statement, as it appears in the Area program:

using System;

This statement allows you to use any classes defined in the System namespace in your program without having to write the namespace name each time you use the class. Instead of having to write "System.Console.Write" each time we want to use the Write() method, we only have to write "Console.Write".

Unlike most other C# statements, the using statement must be written at the beginning of your program, before you start your class definition. Take another look at the Area program to see what I mean. We'll use the using statement for most of the programs in the rest of this book, because it reduces the visual clutter of the source code, thus enhancing the readability of the code.

3.6. Capitalization and Style

C# is picky about a lot of things, including capitalization. When you type the name of a namespace, class, method, or variable, you must use the same capitalization that was used when the name was defined. For example, if you try to use the method "Console.Write" but type "console.Write" or "Console.write", the compiler will report an error because you used the wrong capitalization. Also, after you define a variable with a particular capitalization, you must use the same capitalization every time you use the variable in your program. For example, if you define a variable with the name "count", and later in the program you try to use it by typing "Count = 0", the compiler will complain that "Count" is undefined. To the compiler, "count" and "Count" are two different names. In programmer lingo, we call C# a "case-sensitive" language.

To help reduce the likelihood of capitalization-related errors, the creators of C# have defined standard "style guidelines." The style guidelines state that you should use only lower-case letters for variable names, unless the name is a multi-word name, in which case you should capitalize the first letter of each word after the first, like this: countOfRedBlocks. The style guidelines also state that class and method names should always begin with an initial capital letter (ex. Console and Write). If you are consistent about following this guideline when you create your own variables, methods, and classes, you won't have as much difficulty remembering the correct capitalization to use. Also, other programmers reading your code will be able to understand it more easily if you follow the guidelines.

3.7. Recap

Assignment statements, data types, expressions, defining and using variables, method call statements -- I've introduced a lot on this section. These are all key ideas, and I suggest you review them now before forging ahead into the next section.