2. Simple Objects

Lists give us one way to create a collection of data. They work well for situations where you need to manage a collection of Objects. But sometimes we need a way to group together a collection of variables and treat them as a unit. That calls for a different approach.

Consider a program that does calculations involving points on a plane. A point is a logical unit of information that consists of an x coordinate and a y coordinate. To keep track of the information for a single point, you would probably create two variables, like this:

int x, y;

This seems perfectly natural, and often works quite well. However, having to create multiple variables for a single logical unit of data has its drawbacks:

What we need is a way to create a variable that contains other variables. If we could do that, we could treat the main variable as a logical unit of data -- pass it as a parameter, return it as a return value -- while still being able to access its component variables when necessary. In other words, it would nice to be able to do something like this:

// NOTE: This is a "dream" of what we would like to do, NOT WORKING CODE

Point p1, p2;    // define two variables that each contain an x and a y coordinate
p1.x = 15;       // set p1's x component
p1.y = 100;      // set p1's y component

p2.x = -15;      // set p2's x component
p2.y = 350;      // set p2's y component

double x = computeDistance(p1, p2); // compute the distance between p1 and p2

C# gives us a way to do this using classes.

Back in Section 6, “Using Instance Methods”, we talked about the fact that the class mechanism in C# has two distinct uses:

Classes can be used to group related variables together. For example, we could define a Point class and define some static variables in it, like this:

class Point {
  public static int x;
  public static int y;
}

Elsewhere in the program, we could use these variables by prefixing them with the class name, like this:

Point.x = 5;
Point.y = 10;

But this does not achieve our goal. We can't pass Point as a parameter to a method, because it isn't a variable itself. Worse, we can't create more than one Point. All the Point class does here is to group together some variables. We need something different than just the capability to group together some variables; we need to be able to create variables that contain an x and y variable inside them.

The solution is to use classes in their second role: to define objects. Think of an object as a variable that contains data and methods that operate on that data. If you simplify by leaving out methods, you get simple objects -- objects that contain only data.

2.1. Creating Simple Objects

A simple object contains only variables. You create a simple object by

  1. Defining a class containing only non-static variables

  2. Instantiating the class

An example should help to explain what I mean. Let's define the Point class this way:

class Point {
  public int x;
  public int y;
}

Now, we can no longer access Point.x and Point.y, because x and y are not static. But we can create Point objects, like this:

Point pt1 = new Point();
Point pt2 = new Point();

This code creates two Point objects named pt1 and pt2. Each object contains two member variables: x and y. We call them "instance variables" because they are part of an object (remember that one definition of an object is "an instance of a class"). Because x and y are instance variables, each object has its own copy of x and y, so each object can represent a different Point on a plane. We can access x and y this way:

pt1.x = 50; 
pt1.y = 100;

pt2.x = 10;
pt2.y = 30;

It might help to visualize this. Recall that object variables like pt1 and pt2 contain a reference to an object, as shown in the following diagram:

Removing the word "static" from the member variable definitions in Point does a lot for us. Now, we can create as many Point objects as we want. Each one of them contains its own set of x and y variables with their own values. We can pass Point objects as parameters to a method, and return a Point object from a method as its return value. In other words, we can now create variables that contain other variables inside them -- just what we wanted to be able to do.

By the way, you may have noticed the use of the word "public" in the instance variable definitions in the Point class. We'll discuss the public keyword more in the next chapter. For now, I'll just say that marking the instance variables public allows code in the Main() program to access them.

Here's a complete program that demonstrates some of the possibilities.

Example 9.2. PointDemo.cs

class Point {
  public int x;
  public int y;
}

class PointDemo {
  static void Main() {
    Point here = new Point();
    here.x = 5;
    here.y = 10;

    Point there = new Point();
    there.x = 3;
    there.y = 98;

    ShowPoint(here);   // displays (5, 10)
    ShowPoint(there);  // displays (3, 98)

    SetPoint(here, 3, 5);
    ShowPoint(here);   // displays (3, 5)
    
    Point where = MakePoint(15, 25);
    ShowPoint(where);  // displays (15, 25)

  }

  // displays the coordinates in <place>
  static void ShowPoint(Point place) {
    Console.WriteLine("Point: (" + place.x + ", " + place.y + ")");
  }

  // changes <place>'s coordinates to (<newx>, <newy>)
  static void SetPoint(Point place, int newx, int newy) {
    place.x = newx;
    place.y = newy;
  }
  
  // returns a new Point with x=<newx>, y=<newy>
  static Point MakePoint(int newx, int newy) {
    Point place = new Point();
    place.x = newx;
    place.y = newy;
    return place;
  }

}

In the sections that follow, we'll dissect this program so you understand how it all works.

2.2. Objects as Method Parameters

In Example 9.2, “PointDemo.cs”, look at the definitions of ShowPoint() and SetPoint(). Both receive a Point object as a parameter. This allows the main program to send a complete Point object as a logical unit of data, without having to pass its component parts individually. For example, in the following method call in main:

ShowPoint(here);

The complete here object is transferred to the ShowPoint method for processing. Inside the ShowPoint method body, the code is able to access the components of the object using the formal parameter name, place:

Console.WriteLine("Point: (" + place.x + ", " + place.y + ")");

This causes the data inside here to appear on the screen.

When ShowPoint is called again, this time with there as the actual parameter, it is the values of there's copy of x and y that appear on the screen.

Now, consider the SetPoint method call:

SetPoint(here, 3, 5);

This method call alters the x and y values in here. After the call returns, here.x is 3, and here.y is 5. This requires a bit of explanation, because normally, a method is not able to alter the values of its parameters. You'll have to follow closely, because this gets a little complicated, so hang in there.

The following program demonstrates the rule that "changes to formal parameters do not affect actual parameters":

class WontChange {
  static void Main() {
    int x = 0;
    inc(x);
    Console.WriteLine(x);
  }
  
  static void inc(int num) {
    num = num + 1;
  }
}

In this program, when main() calls

inc(x);

the value of x is copied into num. When num's value is changed in inc() by the assignment statement

num = num + 1;

only num is affected; the change does not affect x in Main().

Now, review the definition of SetPoint:

  // changes <place>'s coordinates to (<newx>, <newy>)
  static void SetPoint(Point place, int newx, int newy) {
    place.x = newx;
    place.y = newy;
  }

When SetPoint is called, place receives a copy of the value of the actual parameter, here. Now, do you remember what here contains?

If you said, "a reference," you're right. place receives a copy of the reference in here. During the call to SetPoint, both here and place refer to the same object, like this:

The assignment statements in SetPoint aren't really changing the value of place; rather, they are changing the state of the object that place refers to. Since place refers to the same object that here refers to, changes to the object using place.x and place.y simultaneously affect here's object, since they are the same. The basic rule about changes to formal parameters not affecting actual parameters still holds. But passing objects gives a method a "loophole" to reach back into the caller and change something passed in.

By the way, this situation has implications for methods that receive a List as a parameter. Think about this method for a moment:

static void AddItem(List list, string item) {
  list.Add(item);
}

Now, let's say the main program calls addItem like this:

List<string> stuff = new List<string>();
AddItem(stuff, "Frank");
Console.WriteLine(stuff); // displays [Frank]

Since addItem receives a copy of stuff's reference to the List, anything it does to the List is permanent, since while AddItem() is executing, list refers to the same object as stuff. Here's how it works:

  1. When addItem begins executing, list and stuff both refer to the same List:

  2. After AddItem() adds the item, the picture looks like this:

    You can see that stuff's List was affected.

However, consider this method:

static void ClearList(List<string> list) {
  list = new List<string>();
}

Now, the main program creates a List and calls clearList():

List<string> stuff = new List<string>();
stuff.add("Frank");
ClearList(stuff);
Console.WriteLine(stuff); // displays [Frank]

The call to ClearList() does not affect stuff, because instead of operating on the stuff List, the method creates a new List. Here's what happens when ClearList() is called:

  1. When ClearList() begins executing, list and stuff both refer to the same List:

  2. After ClearList() creates a new List, the picture looks like this:

    You can see that, inside ClearList(), list refers to an empty List, but stuff's List remains unaffected.

The difference between AddItem() and ClearList() is that AddItem() operates on the object that list refers to, while ClearList() changes list to refer to a different object. The one affects the object in the caller; the other does not.

2.3. Objects as Return Values

In Example 9.2, “PointDemo.cs”, look at the definition of MakePoint():

  // returns a new Point with x=<newx>, y=<newy>
  static Point MakePoint(int newx, int newy) {
    Point place = new Point();
    place.x = newx;
    place.y = newy;
    return place;
  }

It creates and returns a Point object whose x and y coordinates are determined by the parameters supplied by the caller. The return statement

return place;

returns a reference to the newly created object. Back in Main(), look at the method call:

Point where = MakePoint(15, 25);

The assignment statement copies the reference returned from MakePoint() into where, so where now refers to the object containing the location (15,25).

This technique can be used any time a method needs to return more than one value. The method can package up the values into an object and returns the object. In practice, you should not use this technique to return just any old group of values. An object should always represent a logical unit of information, not a group of unrelated values.

2.4. More about References

You know that Point variables like here and there contain references to Point objects. But what you haven't seen yet is a Point variable that doesn't reference any object at all. Consider this code:

Point here;
here = new Point();

What does an object variable like here contain before it is instantiated?

The answer: It contains a null reference. A null reference is a reference that indicates the absence of an object.

You can explicitly initialize a Point variable to the null reference, like this:

Point here = null;

When a variable contains the null reference, it is illegal to attempt to access data inside an object. A line like this:

here.x = 5;  // illegal operation when here is null

would trigger a runtime exception known as the "Null Reference Exception."

If you're not sure whether an object variable has been initialized or not, you can test for the null reference using code like this:

if (here == null) {
  // not safe to access here.x
} else {
  // safe to access here.x
}

You might ask, "When would I ever have a reason to use the null reference?" Well, consider the MakePoint method from the previous section. Suppose you want to design MakePoint so that it refuses to create an object if it is given negative coordinates. You might write it like this:

  // returns a new Point with x=<newx>, y=<newy>
  // returns null if <newx> or <newy> is negative
  static Point MakePoint(int newx, int newy) {
    Point place;
    if (newx < 0 || newy < 0) {
      place = null;
    } else {
      place = new Point();
      place.x = newx;
      place.y = newy;
    }
    return place;
  }

This version of MakePoint() returns null if given negative coordinates; otherwise, it instantiates and returns a new Point object.

When the main program calls MakePoint, it might do so like this:

  Point somewhere = MakePoint(userx, usery);
  if (somewhere != null) {
    // we have a new point
  } else {
    // user supplied invalid coordinates
  }