3. Pointers - Part 3

Having introduced character arrays, we learn the basics of the low-level strings provided in C and C++, then examine more examples of the pointer arithmetic and comparison touched on last time, then some tricky things to do with pointers.

Strings (repeating a little from the last lecture)

null-terminated strings

Character arrays are also known as strings. In contrast to string literals, character arrays are not necessarily null-terminated (i.e., the last element doesn't necessarily == '\0'), but any static storage class character arrays usually null-terminated - if default initialized, the whole array is zeroed out, and with explicit initialization by a string literal, the terminating '\0' on the string literal is copied into the character array. Don't forget that last '\0' in calculating string literal size:

// outside of any block
char q[] = "dog"; // q has 4 elements
char r[3] = "dog"; // **** error *** too many initializers
char s[] = {'a','b','c','d'}; // clumsier, but doesn't include '\0';

Note that initializing in the form used for other fundamental types will not include a final null-character automatically.

String pooling

With character pointers, in constrast to pointers to other fundamental types, comes a new problem:

int *x = &4.0; // ***illegal***, because operand of & must be lvalue
char *q = "dog"; // legal 
char *r = "dog"; // -- note - same string used
q[0] = 'c'; // implementation dependent result - does r[0] get changed?

The definition of q is legal, because string literals have type char[], which is automatically converted to a char * pointing to the first element of the array. The problem occurs because although a string literal is an object (has memory associated with it), one common compiler optimization is to pool strings. In the above example, it is possible that q and rboth point to the same location.

The reason why C++ doesn't do the obvious -define string literals to be of type const char[], which would mean that q and r would instead have to be declared as const char*, so that the (possibly shared) string could not be changed anyways -is for compatibility with pre-ANSI C.

String manipulations

There are several standard low-level string operations declared in the header file <string.h>, that work in C++ but actually belong to the ANSI C standard library. I want to point out that the proposed standard would incorporate this and other ANSI C header files with some renaming: <string.h> will become <cstring>, <stdlib.h> will become <cstdlib>, and so on. They include copying one null-terminated string to another (strcpy, strncpy), getting the length of a string (strlen), and so on. You can get more information on them under UNIX by "man string".

They're also easy enough to write; here's strlen(), which returns the length of a null-terminated string, not including the terminal '\0':

#include <stddef.h> // get definition of size_t
size_t strlen(const char *s) 
{
	const char *p = s;
	while (*p)
		++p;
	return p-s;
}

Pointer arithmetic and operators

Pointers can be manipulated in several ways, including pointer arithmetic and comparison.

Pointer arithmetic

An integer i can be added to a pointer q of any type T, resulting in a pointer of the same type pointing sizeof(T)*i bytes beyond q - forward or backward in memory, as i is positive or negative:

double a[20], *q = a; // q points to a[0]
double *r = q+10;     // r points to a[10]
double *s = q++;      // s points to a[0], q now points to a[1]

Strictly speaking, pointer arithmetic is undefined except for pointers pointing to the same array, either to an element within the array, or one element beyond the end. The following code will yield unpredictable results:

double x, y;  // is y located right after x?
double *q = &x; // q points to x
*++q = 4.0;  // try (who knows?) to set y to 4.0

Pointers cannot be added to each other, but pointers of the same type pointing to the same array (within or one element beyond) can be subtracted from each other, yielding an integral type ptrdiff_t, which is defined in <stddef.h> (or <cstddef> for more modern C++ compilers). As with pointer-integer addition, pointer difference is in units of the pointed-to object:

#include <cstddef> // to get typedef for ptrdiff_t
int z[10], *q = z, *r = z+10;
ptrdiff_t diff1 = q-r, diff2 = r-q; // diff1== -10, diff2 = +10

Pointer comparison

Pointers can be compared with each other, though as with arithmetic, results are undefined unless the two pointers each point either within or to one element beyond the same array.

char s[20], *q = &s;
int testeq = (q == s); // testq == 1
char *r = s+10;
int testr = (r > q); // testr == 1, since r points to s[10], q to s[0]

Because an integral constant evaluating to 0 can be automatically converted to a pointer of any type, a pointer can be compared with 0. Typically the comparison is restricted to == and !=, known as testing a pointer against null:

int *i=0, j, k=&j;
int q = (i != 0); // q==0
int r = (k > i); // r's value is uncertain, but probably 1

My compiler (and I suspect most) apparently assigns a literal value of 0 to the null pointer, so that the null pointer is always less than any non-null pointer; this is strictly implementation dependent, and should not be depended on.

Pointer assignment

Since pointer-integer arithmetic is allowed, so obviously are the += and -= operators:

double a[20], *r=a;
r += 20; // r now points just beyond end of a[]

Assignment (operator=) requires both pointers involved be of the same type, or have explicit cast(s) applied to bring them to the same type (see discussion below on void*, however). A constant integral 0 can be assigned to any pointer:

char *a, *b, *c;
a = b = c = 0; // a,b,c now null pointers

Miscellaneous pointer operators

Conditional operator ?

The conditional operator ? can take pointers (possibly including 0 converted to the null pointer) in the "result" conditions:

char *r="r string", *s="s string";
cout << (1 != 0 ? r : s) << '\n'; // outputs "r string\n";

sizeof() operator

The sizeof() operator yields the size of the operand in "bytes" - where a byte is defined as the size of a character, which may or may not be the same as 8 bits.

This is one place where an array name is not automatically converted to a pointer of the corresponding type:

// outputs  10 4 4 on my machine, where sizeof(char *)==4
char q1[10], *q2 = q1, *q3 = "dog";
cout << sizeof(q1) << ' '<<sizeof(q2)<<' '<<sizeof(q3)<<'\n';

Be careful on errors like expecting sizeof(q3) to yield 4.

Pointers in flow control

A pointer expression, used as the condition of a test (if, while, for), is converted to 1 or 0, depending on whether it evaluates to the null pointer or not.

Testing against null is usually worth the bother:

// char *q defined somewhere above...
if (q)
	cout << q <<'\n';
else
	cerr << "q was null!\n";

Tricky pointer points

Watch out for dangling pointers: the validity of a pointer value (the address it holds) has exactly the same scope and lifetime as the object it points to:

// function fragment
int *foo()
{
	int x= 4;
	return &x; // ****error**** x's "value" not likely to be 4
               // after foo() returns to the statement calling it
}

Pointer conversions (except for 0 -> null, and const pointer types, which we'll consider in a later lecture) are not automatic, with the major exception of conversions to void*. The latter is intended to hold a pointer to any type, thus the following is allowed:

double x;
int i;
char a;
void *r1= &x, *r2 = &i, *r3 = &a;

On the other hand, in C++ (as opposed to ANSI C), pointers of type void* are not converted to other types without an explicit cast (ANSI C allows this type-safety loophole).

Because void cannot be an argument of sizeof() (what is the size of no argument - as opposed to nothing?), void* arithmetic is impossible, but comparisons between void* type expressions are allowed.

[ Phy 405/905 Home Page] [Lecturer ]