3. Pointers - Part 4

We learn about the dangers and uses of pointer conversion in an example involving the ellipsis operator. We peripherally touch on typedef, macro definitions, and an odd use of the comma operator.

Variable arguments

Pointers and pointer casting are simultaneously dangerous and essential when it comes to functions declared with an ellipsis argument. Before embarking on discussion of this feature (inherited by C++ from ANSI C), let me note that the ellipsis argument is something like the ultimate evil as far as type-checking is concerned. Ditto regarding the example used - a simple implementation of the ANSI C general printing function, printf(), which is declared in the ANSI C header <stdio.h>. For the the ellipsis argument allows a function declared like:

void my_printf(char *format, ...);

where my_printf() can be called with one argument (a char *), or an arbitrary number of arguments of arbitrary type. The ellipsis is typed in literally as "...", i.e., three periods in a row. The following calls are legal:

const double x = 3.4;
const int q = 4;
const char *a = "hey guy!";
my_printf("%f %d %s\n", x, q, a);

If the real printf() were called, the result of the output would be something like

3.400000 4 hey guy

including a terminating '\n'. The idea is that my_printf() allows for an arbitrary combination of items to be output (how could we possibly write a version of my_printf() for every such combination?).

The first argument to my_printf(), the format string, which lets the code inside my_printf() know what to expect for successive arguments. It interprets the control string:

%f: next argument to be processed is of type double
%d: next arg is type int, print out in base 10
%s: next arg is type char* (null-terminated)
any other character(s): print any non-control characters

Since type-checking goes out the window, you're on your own if youwrite code like this:

my_printf("%s", q); // integer passed in, where a char* is expected!!

What's one likely result of this call (recall that string arguments are supposed to be null-terminated)?

Note the similarity of printf() to Fortran's formatted write statements. In C++, one generally eschews printf() and its ilk (defined in <stdio.h>forANSI C compilers) in favor of the very type-safe operator<<

cout << x << q << a << '\n';

If you're already grumbling about requiring the kind of control over ouputafforded by the Fortran FORMAT statement or printf()'s format argument, don't worry - the same sorts of things exist for C++ "standard" i/o, and I'll go over all those in a future lecture. For the moment, pretend you must use the ellipsis argument, in which case you'll have to use the method outlined here.

Using the macros in <cstdarg> (or <stdarg.h>)

You'll need the following functions (macro functions actually), located in <stdarg.h> (and in the upcoming standard, in <cstdarg>):

* void va_start( va_list argl, lastarg)

where va_list is a type defined in <stdarg.h> or <cstdarg>, and is used to iterate over theargument list, and lastarg is the name of the last named formalargument to the function.

* arg_type va_arg(va_list argl, arg_type)

where arg_type specifies how the next argument to be read should be cast

* void va_end(va_list argl)

which performs any "clean-up" - such as resetting the stack, and so on. Here's an implementation of my_printf

///////////////////////////////////////////////////////////////////////////
// va_arg.cc
// shows how pointers and pointer casting can be used to implement
// a non-portable version of variable argument handling (stuff defined
// in <cstdarg>
//
///////////////////////////////////////////////////////////////////////////
#include <iostream.h> // for i/o library
#include <stdarg.h> // for variable argument library

void my_printf(char *fmt, ...);   
void my_printf_and_va(char *fmt, ...);  // same as my_printf(), 
                                        // except uses my va_arg stuff

int main(int argc, char *argv[])
{
 
 const double x = 3.4;
 const int q = 4;
 char a[] = "hey guy!";
 cout << "Here's version with standard va_arg:\n";
 my_printf("%f %d %s\n", x, q, a);
 cout << "Here's version with my_va_arg:\n";
 my_printf_and_va("%f %d %s\n", x, q, a);
 return 0;
}

void my_printf(char *fmt, ...)
{
 // initialize pass over arguments
 va_list argl;
 va_start(argl, fmt); // beginning after last named argument: fmt
 
 for (char *p = fmt; *p; ++p) // format string controls parsing of args
 {
  if (*p == '%')
  {
   switch (*++p)
   {
    case 'f':
     double fval=va_arg(argl,double); // get arg as double
     cout << fval;
     break;
    case 'd':
     int ival=va_arg(argl,int);
     cout << ival;
     break;
    case 's':
     char *sval = va_arg(argl,char *);
     cout << sval;
     break;
    default:
     cout << *p;
     break;
   }
  }
  else
   cout << *p;
 }
 va_end(argl);

}

Implementing our own (non-portable) version of <cstdarg> to illustrate pointer casting

What does this have to do with pointers, which, aside from the format string,haven't made an appearance? The answer is in the behind-the-scenes magic that makes va_arg() work. A typical system-dependent implementation of va_start would do something as follows.

typedef

The main assumption (which is not always true, thus the code below is not portable) is that the arguments to a function - which are passed by value - are located on the "stack" in the order they were provided in the actual function call. Thus in the above example, the first argument (a char *pointing to the first character of the format string "%f %d %s\n" is followed in memory by a double (with value 3.4), then an int (equalling 4) then another char* (pointing to a[0]). Pulling each of these variables off the stack involves setting a pointer to each of the arguments, in turn, and derefencing the pointer to get at the respective value. To point to different type objects, the pointer in question will have to be of type void* for the reasons discussed above. So we "define" a new type, using typedef:

typedef void *my_va_list;

* The typedef just makes the identifier "my_va_list" a synonym for the type "void*".
* Typedef is used here (and for the real va_list) to shield the user from the details of implementation, but typedef is also used to break complicated declarations down into simpler steps, such as for function pointer variables (see the upcoming lectures on functions).
* typedefs are used in one style of programming (not my own, however) to defer defining the actual type used from the source files containing the actual variable definitions to a header file. That way, if you have two files, defs.hh and main.cc:

// defs.hh
typedef double Mass;

// main.cc
Mass x; // exactly the same as declaring x to be type double

you can change all variables of type Mass from being double to, say, float,by changing the single typedef in defs.hh. You could also aocomplish this using preprocessor macros, but that would not be a good idea, as we'll see below

* typedef as used here does not actually introduce a new type into the language, though when we consider structures and their generalization(classes), we'll be able to do exactly that.

Macros and the preprocessor

Now let's turn to defining our own versions of va_start(), va_arg()and va_end(). The first obstacle to confront is that our version of va_arg() cannot be sensibly declared as a C++ function, because the second argument to the function is a type name (what's the type of (a typename)?). So we must fall back on using macros, in particular, a macro function.

Macros are implemented by the preprocessor, which is the beast that handles the #include statements that include header files as well as the #ifndef, #define, and #endif that we've already used in header files to prevent them from being #include'd more than once.

The preprocessor is usually automatically invoked whenever you compile a source file - it is simply a text filter, applied to the source code, with the result passed on to the compiler. The preprocessor makes simple text-substitutions, and (aside from recognizing both C and C++ comments) knows nothing about the structure of the language. Anything beginning with a # and continuing up to the end of the line (possibly continued by '\'s) is handled by the preprocessor,and never seen by the compiler. There are several preprocessor commands; hereI'll just discuss the #define command, which is followed by

* one or more spaces or tabs
* then by an identifier (which will be substituted for in the rest of thefile)
* then by more spaces or tabs, which separate the identifier from the replacement text
* then by the replacement text (which may include spaces)
* then by zero or more spaces/tabs, then the newline. For the rest of thefile (preprocessor defines essentially have "file scope", but are never hidden by local scope identifiers), any instances of the identifier are replaced by the replacement text. For example:

#define BIG_DOG     sin(x) * 4.0
#define PI          atan(1.0)*4.0
#define Q
// code fragment
 cout << BIG_DOG <<'\n'; // BIG_DOG --> "sin(x) * 4.0"
 cout << Q '\n'; // Q --> "" (nothing)
 cout << "pi == " << PI <<'\n';

Some comments:

* Macros are often named in all-capital letters, which help remind theprogrammer of their origin in the mysts of programming prehistory and the existence of preferred alternatives in C++.
* the whitespace surrounding the replacement text in the #define is notincluded in replacements. Replacement is on a "token" as opposed to "character"basis, so that "f[PI]" will become "f[atan(1.0)*4.0]",whereas "f[PIguy]" will be unchanged. Also, if the identifier appearsin a string literal, it is not replaced.
* semicolons in the replacement text are included (the preprocessor knows next to nothing about the language, it's just a mindless text-filter), so the following is bound to give the wrong results:

#define BIG_BUF 1024;
// code fragment below
 char q[BIG_BUF]; // parses to q[1024;];  !!!

* comments, however are stripped out:

#define SILLY_SUB "(this is illegal?)" // silly sub
cout << SILLY_SUB << '\n'; // don't worry, 
                           // the stuff following SILLY_SUB is 
                           // not commented out

* I myself restrict #define statements to the use we've already seen, namely preventing multiple header file inclusion. If you want a constant, use the C++const, which for one thing will have a name associated with it if youhave to use a debugger. Preprocessor #define's can fairly easily get you in trouble, especially if you have a lot of them (resulting in lots of unintended text-substitution).

const double pi = atan(1.0)*4.0; // preferred to macro substitution above

* The #define Q statement above will replace instances of Q by nothing, but note that as far as the preprocessor is concerned, Q is defined so that a section of text following the #define that begins with "#ifndef Q" (which tests whether Q has been defined by the preprocessor) will be ignored. This is how our standard trick of preventing header file multiple-inclusion works.
* A macro does not extend over more than one "line"; however, the preprocessor's idea of a single line includes several actual lines, all but the last of which end in a '\' immediately followed by a '\n', so the following is okay:

// multi-line #define
#define LONG_MACRO \
      "it is important to make sure that" \
      "the '\\' is followed immediately \
      "by a carriage return: like this " \
      "last line of macro"

Here we need a slightly more complicated version of the macro - the macro function, which is defined as follows:

#define MAX( arg1, arg2 ) \
        ( (arg1) > (arg2) ? (arg1) : (arg2) )
// use of MAX() macro
double x = MAX( 3.4, 4.0);
// bad (but legal preprocessor-wise) use of MAX:
double y = MAX("hey dude", this is a nonsense argument);

Several notes on macro functions

* macro functions are notorious for mindless substitution of arguments (which is sometimes intended). If the actual argument of a macro function is in turn a different macro that was previously defined, it is itselfreplaced. No checks are made on the types of arguments, which is why we have to rely on a macro function to write our version of va_arg (which takes a type name as an argument, which is not a valid C++ type).
* in the definition of the macro, the first parenthesis must immediately follow the macro name
* Carefully written macro functions include parentheses around all arguments in the replacement expression, to ward off unintended run-ons:

#define bad_one(x) x / x // supposedly always == 1
const int x = 4;
const int z = bad_one(x+x, x+x); // evaluates to x+x / x+x == 2x + 1

* watch out for side-effects:

#define double_arg(INT_ARG) INT_ARG+INT_ARG
int i=20, j = double_arg(i++); // what does j equal?

* Watch out for errors like the following (where we mispelled the name ofthe first formal argument):

#define MAX( arg, arg2 ) \
        ( (arg1) > (arg2) ? (arg1) : (arg2) )
cout << MAX(1,2); // ERROR! 
                  // evaluates to ( (arg1) > (2) ? (arg1) : (2) )

* the main strength of macro functions is also their weakness - they do no type-checking, because they don't know about types. Compare the macro MAX() with a true function:

double max(double x, double y) { return x > y ? x : y ; }

This function has type-checking, which is nice, but then again the macro automatically "worked" for all types (convenient), even nonsensical comparisons like in MAX("hey guy", 3.0) - which is not so nice, since the compiler will choke on this, but since macro function names never appear before the compiler,the compiler error message will probably not be so easy to interpret.

Non-portable code for va_start(), va_arg(), va_end()

So (taking a big breath) here's the code for va_start(), va_arg(),va_end()

//////////////////////////////////////////////////////////////////////////
// my_va_start(): initializes VA_LIST variable to point to 1st unnamed arg
#define my_va_start(VA_LIST,LAST_NAMED_ARG) \
 ((VA_LIST) = ( (char *)(&(LAST_NAMED_ARG)) + sizeof(LAST_NAMED_ARG) ))

This macro merely sets our void* to point just past the last named argument, i.e., at the first argument hidden in the ... formal argument of my_printf(). Note the address arithmetic, where the cast to char* ensures that the address pointed to is sizeof(LAST_NAMED_ARG) bytes beyond the last named arg's beginning.

What we've assumed (which is true in some, but not all implementations, thus the non-portability of the code) that the function arguments, which are all passed by value, are contiguous and sequential, so by taking the address of the last named argument, we can use sizeof() to move from argument to argument (assuming we really know the type of each argument, which is the big if for printf()).

//////////////////////////////////////////////////////////////////////////// 
// my_va_arg(). uses comma operator to:
// (1) increments VA_LIST variable,
// (2) return VA_LIST previously pointed to
#define my_va_arg(VA_LIST, SOME_TYPE) \
 ((VA_LIST) = (char *)(VA_LIST) + sizeof(SOME_TYPE), \
  *(SOME_TYPE*)((char *)(VA_LIST) - sizeof(SOME_TYPE)))

This trickily uses the comma operator to first bump up our argument pointer to the argument after the one whose value we're retrieving, then (subtracting off the pointer to get back to where it just was) pulling out the value, using the appropriate cast. This code relies on the fact that the value of the comma operator is the right hand expression.

In this case, there's no real clean-up to be performed by my_va_end(), so it does nothing:

//////////////////////////////////////////////////////////////////////////// 
my_va_end(): 
does nothing (nothing left to do)
#define my_va_end(VA_LIST) () /* does nothing */

Code using our implementation of va_arg(),...

This is just a copy of the previous my_printf(), except with calls tomy_va_arg() instead of va_arg(), etc.

void my_printf_and_va(char *fmt, ...)
{
 // initialize pass over arguments
 my_va_list argl;
 my_va_start(argl, fmt);  // beginning after last named argument: fmt
 
 // this loop is identical to that in my_printf() except
 // that it uses non-portable version of va_start(),va_arg(),va_end
 for (char *p = fmt; *p; ++p) // format string controls parsing of args
 {
  if (*p == '%')
  {
   switch (*++p) 
   {
    case 'f':
     double fval= my_va_arg(argl,double); // get arg as double
     cout << fval;
     break;
    case 'd':
     int ival=my_va_arg(argl,int);
     cout << ival;
     break;
    case 's':
     char *sval = my_va_arg(argl,char *);
     cout << sval;
     break;
    default:
     cout << *p;
     break;
   }
  }
  else
   cout << *p;
 }
 va_end(argl);

}

Notes on ellipsis "..." operator

Some final notes - the ellipsis terminates the argument list, so no arguments can follow it:

void bad_func(int i, ..., char q); // ***illegal*** cannot have 
                                   // arg q follow ...

Also,the ellipsis can be the only argument for a function, in which case va_start() cannot be used (there are no named arguments, let alone a last named arguments), in which case you'd have to rely on fairly unportable means as discussed above to get at the unknown arguments.

The ARM's comment ([[section]]8.3) on the arbitrary argument mechanism: "Don't use it." There are type-safer ways to accomplish much the same thing in C++. On the other hand, you may find yourself depending on it, especially when writingutility functions that will be interfaced to programs written in other languages - Fortran or C, for example.

[Phy 405/905 Home Page] [ Lecturer ]