[Phy 405/905 Home Page] [ Lecturer ]

3. Pointers - Part 4

We learn about the dangers and uses of pointer conversion in an example involving the ellipsis operator. We peripherally touch on typedef, macro definitions, and an odd use of the comma operator.

Variable arguments

Pointers and pointer casting are simultaneously dangerous and essential when it comes to functions declared with an ellipsis argument. Before embarking on discussion of this feature (inherited by C++ from ANSI C), let me note that the ellipsis argument is something like the ultimate evil as far as type-checking is concerned. Ditto regarding the example used - a simple implementation of the ANSI C general printing function, printf(), which is declared in the ANSI C header <stdio.h>. For the the ellipsis argument allows a function declared like:
void my_printf(char *format, ...);
where my_printf() can be called with one argument (a char *), or an arbitrary number of arguments of arbitrary type. The ellipsis is typed in literally as "...", i.e., three periods in a row. The following calls are legal:
const double x = 3.4;
const int q = 4;
const char *a = "hey guy!";
my_printf("%f %d %s\n", x, q, a);
If the real printf() were called, the result of the output would be something like
3.400000 4 hey guy  
including a terminating '\n'. The idea is that my_printf() allows for an arbitrary combination of items to be output (how could we possibly write a version of my_printf() for every such combination?).

The first argument to my_printf(), the format string, which lets the code inside my_printf() know what to expect for successive arguments. It interprets the control string:

%f
next argument to be processed is of type double
%d
next arg is type int, print out in base 10
%s
next arg is type char* (null-terminated)
any other character(s)
print any non-control characters

Since type-checking goes out the window, you're on your own if youwrite code like this:

my_printf("%s", q); // integer passed in, where a char* is expected!!
What's one likely result of this call (recall that string arguments are supposed to be null-terminated)?

Note the similarity of printf() to Fortran's formatted write statements. In C++, one generally eschews printf() and its ilk (defined in <stdio.h> forANSI C compilers) in favor of the very type-safe operator<<

cout << x << q << a << '\n';
If you're already grumbling about requiring the kind of control over ouputafforded by the Fortran FORMAT statement or printf()'s format argument, don't worry - the same sorts of things exist for C++ "standard" i/o, and I'll go over all those in a future lecture. For the moment, pretend you must use the ellipsis argument, in which case you'll have to use the method outlined here.

Using the macros in <cstdarg> (or <stdarg.h>)

You'll need the following functions (macro functions actually), located in <stdarg.h> (and in the upcoming standard, in <cstdarg>): where va_list is a type defined in <stdarg.h> or <cstdarg>, and is used to iterate over theargument list, and lastarg is the name of the last named formalargument to the function. where arg_type specifies how the next argument to be read should be cast which performs any "clean-up" - such as resetting the stack, and so on. Here's an implementation of my_printf
///////////////////////////////////////////////////////////////////////////
// va_arg.cc
// shows how pointers and pointer casting can be used to implement
// a non-portable version of variable argument handling (stuff defined
// in <cstdarg>
//
///////////////////////////////////////////////////////////////////////////
#include <iostream.h> // for i/o library
#include <stdarg.h> // for variable argument library

void my_printf(char *fmt, ...);
void my_printf_and_va(char *fmt, ...); // same as my_printf(),
// except uses my va_arg stuff

int main(int argc, char *argv[])
{

const double x = 3.4;
const int q = 4;
char a[] = "hey guy!";
cout << "Here's version with standard va_arg:\n";
my_printf("%f %d %s\n", x, q, a);
cout << "Here's version with my_va_arg:\n";
my_printf_and_va("%f %d %s\n", x, q, a);
return 0;
}

void my_printf(char *fmt, ...)
{
// initialize pass over arguments
va_list argl;
va_start(argl, fmt); // beginning after last named argument: fmt

for (char *p = fmt; *p; ++p) // format string controls parsing of args
{
if (*p == '%')
{
switch (*++p)
{
case 'f':
double fval=va_arg(argl,double); // get arg as double
cout << fval;
break;
case 'd':
int ival=va_arg(argl,int);
cout << ival;
break;
case 's':
char *sval = va_arg(argl,char *);
cout << sval;
break;
default:
cout << *p;
break;
}
}
else
cout << *p;
}
va_end(argl);

}

Implementing our own (non-portable) version of <cstdarg> to illustrate pointer casting

What does this have to do with pointers, which, aside from the format string,haven't made an appearance? The answer is in the behind-the-scenes magic that makes va_arg() work. A typical system-dependent implementation of va_start would do something as follows.

typedef

The main assumption (which is not always true, thus the code below is not portable) is that the arguments to a function - which are passed by value - are located on the "stack" in the order they were provided in the actual function call. Thus in the above example, the first argument (a char *pointing to the first character of the format string "%f %d %s\n" is followed in memory by a double (with value 3.4), then an int (equalling 4) then another char* (pointing to a[0]). Pulling each of these variables off the stack involves setting a pointer to each of the arguments, in turn, and derefencing the pointer to get at the respective value. To point to different type objects, the pointer in question will have to be of type void* for the reasons discussed above. So we "define" a new type, using typedef:
typedef void *my_va_list;
// defs.hh
typedef double Mass;

// main.cc
Mass x; // exactly the same as declaring x to be type double
you can change all variables of type Mass from being double to, say, float,by changing the single typedef in defs.hh. You could also aocomplish this using preprocessor macros, but that would not be a good idea, as we'll see below

Macros and the preprocessor

Now let's turn to defining our own versions of va_start(), va_arg()and va_end(). The first obstacle to confront is that our version of va_arg() cannot be sensibly declared as a C++ function, because the second argument to the function is a type name (what's the type of (a typename)?). So we must fall back on using macros, in particular, a macro function.

Macros are implemented by the preprocessor, which is the beast that handles the #include statements that include header files as well as the #ifndef, #define, and #endif that we've already used in header files to prevent them from being #include'd more than once.

The preprocessor is usually automatically invoked whenever you compile a source file - it is simply a text filter, applied to the source code, with the result passed on to the compiler. The preprocessor makes simple text-substitutions, and (aside from recognizing both C and C++ comments) knows nothing about the structure of the language. Anything beginning with a # and continuing up to the end of the line (possibly continued by '\'s) is handled by the preprocessor,and never seen by the compiler. There are several preprocessor commands; hereI'll just discuss the #define command, which is followed by

#define BIG_DOG     sin(x) * 4.0
#define PI atan(1.0)*4.0
#define Q
// code fragment
cout << BIG_DOG <<'\n'; // BIG_DOG --> "sin(x) * 4.0"
cout << Q '\n'; // Q --> "" (nothing)
cout << "pi == " << PI <<'\n';
Some comments:
#define BIG_BUF 1024;
// code fragment below
char q[BIG_BUF]; // parses to q[1024;]; !!!
#define SILLY_SUB "(this is illegal?)" // silly sub
cout << SILLY_SUB << '\n'; // don't worry,
// the stuff following SILLY_SUB is
// not commented out
const double pi = atan(1.0)*4.0; // preferred to macro substitution above
// multi-line #define
#define LONG_MACRO \
"it is important to make sure that" \
"the '\\' is followed immediately \
"by a carriage return: like this " \
"last line of macro"

Here we need a slightly more complicated version of the macro - the macro function, which is defined as follows:
#define MAX( arg1, arg2 ) \
( (arg1) > (arg2) ? (arg1) : (arg2) )
// use of MAX() macro
double x = MAX( 3.4, 4.0);
// bad (but legal preprocessor-wise) use of MAX:
double y = MAX("hey dude", this is a nonsense argument);
Several notes on macro functions
#define bad_one(x) x / x // supposedly always == 1
const int x = 4;
const int z = bad_one(x+x, x+x); // evaluates to x+x / x+x == 2x + 1
#define double_arg(INT_ARG) INT_ARG+INT_ARG
int i=20, j = double_arg(i++); // what does j equal?
#define MAX( arg, arg2 ) \
( (arg1) > (arg2) ? (arg1) : (arg2) )
cout << MAX(1,2); // ERROR!
// evaluates to ( (arg1) > (2) ? (arg1) : (2) )
double max(double x, double y) { return x > y ? x : y ; }
This function has type-checking, which is nice, but then again the macro automatically "worked" for all types (convenient), even nonsensical comparisons like in MAX("hey guy", 3.0) - which is not so nice, since the compiler will choke on this, but since macro function names never appear before the compiler,the compiler error message will probably not be so easy to interpret.

Non-portable code for va_start(), va_arg(), va_end()

So (taking a big breath) here's the code for va_start(), va_arg(),va_end()
//////////////////////////////////////////////////////////////////////////
// my_va_start(): initializes VA_LIST variable to point to 1st unnamed arg
#define my_va_start(VA_LIST,LAST_NAMED_ARG) \
((VA_LIST) = ( (char *)(&(LAST_NAMED_ARG)) + sizeof(LAST_NAMED_ARG) ))
This macro merely sets our void* to point just past the last named argument, i.e., at the first argument hidden in the ... formal argument of my_printf(). Note the address arithmetic, where the cast to char* ensures that the address pointed to is sizeof(LAST_NAMED_ARG) bytes beyond the last named arg's beginning.

What we've assumed (which is true in some, but not all implementations, thus the non-portability of the code) that the function arguments, which are all passed by value, are contiguous and sequential, so by taking the address of the last named argument, we can use sizeof() to move from argument to argument (assuming we really know the type of each argument, which is the big if for printf()).

//////////////////////////////////////////////////////////////////////////// 
// my_va_arg(). uses comma operator to:
// (1) increments VA_LIST variable,
// (2) return VA_LIST previously pointed to
#define my_va_arg(VA_LIST, SOME_TYPE) \
((VA_LIST) = (char *)(VA_LIST) + sizeof(SOME_TYPE), \
*(SOME_TYPE*)((char *)(VA_LIST) - sizeof(SOME_TYPE)))
This trickily uses the comma operator to first bump up our argument pointer to the argument after the one whose value we're retrieving, then (subtracting off the pointer to get back to where it just was) pulling out the value, using the appropriate cast. This code relies on the fact that the value of the comma operator is the right hand expression.

In this case, there's no real clean-up to be performed by my_va_end(), so it does nothing:

//////////////////////////////////////////////////////////////////////////// 
my_va_end():
does nothing (nothing left to do)
#define my_va_end(VA_LIST) () /* does nothing */

Code using our implementation of va_arg(),...

This is just a copy of the previous my_printf(), except with calls tomy_va_arg() instead of va_arg(), etc.
void my_printf_and_va(char *fmt, ...)
{
// initialize pass over arguments
my_va_list argl;
my_va_start(argl, fmt); // beginning after last named argument: fmt

// this loop is identical to that in my_printf() except
// that it uses non-portable version of va_start(),va_arg(),va_end
for (char *p = fmt; *p; ++p) // format string controls parsing of args
{
if (*p == '%')
{
switch (*++p)
{
case 'f':
double fval= my_va_arg(argl,double); // get arg as double
cout << fval;
break;
case 'd':
int ival=my_va_arg(argl,int);
cout << ival;
break;
case 's':
char *sval = my_va_arg(argl,char *);
cout << sval;
break;
default:
cout << *p;
break;
}
}
else
cout << *p;
}
va_end(argl);

}

Notes on ellipsis "..." operator

Some final notes - the ellipsis terminates the argument list, so no arguments can follow it:
void bad_func(int i, ..., char q); // ***illegal*** cannot have 
// arg q follow ...
Also,the ellipsis can be the only argument for a function, in which case va_start() cannot be used (there are no named arguments, let alone a last named arguments), in which case you'd have to rely on fairly unportable means as discussed above to get at the unknown arguments.

The ARM's comment ([[section]]8.3) on the arbitrary argument mechanism: "Don't use it." There are type-safer ways to accomplish much the same thing in C++. On the other hand, you may find yourself depending on it, especially when writingutility functions that will be interfaced to programs written in other languages - Fortran or C, for example.

[Phy 405/905 Home Page] [ Lecturer ]