0. Writing, compiling, and running C++ programs

In the introductory lecture, I gave a rapid (maybe terrifyingly so) overview of C and C++, and some examples of the object-oriented features of the latter. Here we begin a much slower, thorough introduction to the subject. Starting with the basics of compiling programs under VMS (msupa.pa.msu.edu) and UNIX (pads1.pa.msu.edu), a simple program to print out the command-line arguments introduces identifiers, literals, functions (and function overloading).

Also bear in mind that I'll sprinkle definitions of terms which will sometimes seem rather elementary to you, but not to others - due to the very large spread in computer-programming experience in our class.

C++ compilation basics

Files and filenames

C++ is usually a compiled language (in contrast to BASIC, which is traditionally an interpreted language). This means that after you have written your program and typed it into one or more files, you will need to compile each of the files (together or separately) and then link all the files together. Here's some definitions in case you're fuzzy on the concepts:

compile: To reduce a file containing C++ code into a form the computer can directly execute, i.e., machine language. The original file is a source code file while the end result is an object file.
link: To combine several object files together into a single executable file - a "binary". This is less trivial than it sounds, as we shall see when we discuss the scope of objects.

The details of this process are different in UNIX (e.g., pads1.pa.msu.edu) and VMS (e.g., msupa.pa.msu.edu), but the smallest unit of compilation for both is a single file.

Unix:

Any filename legal under UNIX is okay, though it is convenient for our compiler (GNU g++) if the name is of the form "xxx.cc" - for example, "my_program.cc". Upper/lower-case is distinguished, and underscores ('_') are useful for more intelligible filenames (e.g., "main.cc", "io_routines.cc", "diff_eq.cc", ...)
g++ -c xxx1.cc xxx2.cc compiles one or more files xxx1.cc, xxx2.cc, .... They are not linked together, however.
g++ -o Prog xxx1.o xxx2.o links together the previously compiled files (compiled using the previous command) into the executable file "Prog". The first file ("xxx1.o") must contain the main() function.
g++ -o Prog xxx1.cc xxx2.cc compiles and links together the files into the executable "Prog". The first file ("xxx1.cc") must contain the main() function.
Prog my name is Fred runs the program "Prog", with command line arguments "my name is Fred" (command line arguments are introduced below). Type the key-combination control-C to stop the program prematurely.

VMS:

Any filename legal under VMS is okay, though it is convenient for the VMS compiler names are of the form "xxx.cxx" - for example "myprog.cc". Of course, VMS doesn't distinguish lower/uppercase names.
cxx xxx1.cxx,xxx2.cxx,xxx3.cxx compiles one or more C++ source files. If the extension is in fact ".cxx", the extension can be omitted (cxx xxx1)
link xxx1,xxx2,xxx3 links together one or more previously compiled object files (e.g. "xxx1.obj" produced by "cxx xxx1") into one executable file "xxx1.exe", where the base name of the binary comes from the first file (which must contain the main() function). run xxx1 runs the program.This executable, however, is not ready to handle command-line arguments (unless you're running under POSIX, which I'll talk about later), so go on to the next step to be able to use
prog == "$dept1:[davechao]xxx1.exe performs the VMS magic to make command line arguments (like in UNIX!) possible, where the " is absolutely necessary, and $dept1:[davechao]xxx1.exe is the complete file name. After this step, you can run the program via Prog my name is Fred (just like in UNIX!). Type the key-combination control-z to stop the program prematurely.

We are discussing here having the shell (csh or Bourne under UNIX, DCL under VMS) run your program for you, but of course it could be called directly by the operating system on a lower-level, or by an application program such as Mathematica. Just bear the possibility in mind.

Editing

I suggest Emacs under either UNIX (it could be installed on our VMS machine as well, but EVE looks like it has some of the same functionality there); you could alternatively edit the source files in WordPerfect on your Mac at home, and just keep ftp'ing the files to pads1.pa.msu.edu for compilation and execution.
If you do transport files back and forth from your Mac or PC, ftp files in "TEXT" mode to ensure that carriage return characters ('\r') on the Mac, for example, get properly mapped to newlines ('\n') for UNIX.

Programming Basics

As will be the pattern this semester, I'll present a couple concepts in detail, then take a look at a simple illustrative example. In this lecture, we'll next define identifier and literal, then learn a little on printing to the screen.

Identifier

identifier: Basically a "legal" name in C++, be it for words reserved for use by the language (keyword) or as the name of a variable, function, etc.

Besides the wide variety of punctuation marks used by C++ (&, *, $, ..., etc.), identifiers are the "words" making up the "sentences" in your source code.

Identifiers:

must begin with a letter (which is defined as any of : a-z, A-Z and _), with successive characters being either letters or digits. Note that the name _foo is allowed, because '_' is a letter as far as identifiers are concerned.
Upper/lower case distinction is made, so "MyVariable" != "myvariable"
Names of the form __xxxxx (e.g., __myvar) are discouraged, because they are reserved for use by the system (or system libraries). A single '_' is alright, however (_myvar is fine). Underscores in the middle of names is more than okay, it's quite useful for informative identifiers: four_vector, decision_tree, particle_code, Asteroid_and_Comet_Database
An arbitrary (finite) length is permitted by the standard, but in practice there is a limit to the length of identifiers imposed by the linker, which varies from system to system.

Literal

Next up is the literal, which I'm bringing up because

we use them in the simple program that follows
they provide a convenient warm-up to discussion of the basic types in C++.

Literals are what are often called "constants" in other languages, but since C++ has a precise idea of what a constant is, let's stick with literal.

literal: An explicit value (no associated identifier) of one of the types described below.

There are different kinds of literals, roughly corresponding to the basic types (numeric and character) as well as an array type (character arrays). There are:

integer literals
floating point literals
character literals
character array ("string") literals

The first three types are broken down into more specific categories according to the number of bits used to represent the literal, and whether they are signed or not.

Just a reminder -

a bit is the smallest unit in computer memory, capable of holding a single binary digit (with value 0 or 1)
a byte is a sequence of 8 bits (and 1 K is 1024 bytes, 1M is 1024*1024 bytes)
a machine word is the "natural" size for a particular machine (which these days is likely to be either 4 bytes or 8 bytes).

Integer literals

are interpreted as of type {int,unsigned int, long int, long unsigned int}
can be specified in hexadecimal (0xFF, -0X1e), octal (+017, 07777), or decimal (24,-10), with or without a + or - sign. The (x or X)/(0)/(no ) prefix selects the number base (16/8/10).
If no - is present, the integer literal can be made unsigned with a (u or U)suffix: 0x10u, 24u
The integer literal can be forced to be twice the normal bit-length ("double-precision", and double the range of representable numbers) with the suffix (l or L). The textbook rightly recommends use of 'L' however to avoid confusion of 'l' with '1'. Examples: -0x44e0L, 045UL.
Despite what the textbook says, (unsigned) integer literals do not default to type (unsigned) int, but rather are assigned to one of the basic integer types {int, unsigned int, long int, unsigned long int} on a "smallest-shoe to fit" basis.
the type short int is not available for integer literals (though this is not really a problem; see discussion in a later lecture on casting)
see <limits.h> for ranges of these types

Floating-point literals

are of type {float, double, long double)
of the form: -3.0E4 (double) , 0.0241 (double) , 3e4 (double) , 4.0f-4 (float) ,
-23L-43 (long double)
only base-10
in contrast to integer literals, default is simple - with only a decimal point (no exponent), they are type double; otherwise the exponent symbol specifies float (single-precision), double, and long double (l or L).
see <float.h> for ranges of these types

Character literals

are simply type char (not unsigned char, etc.)
delimited by single forward quote: 'a', '\0', '\n', 'd'
whitespace (newlines, carriage return, tab, ...) are specified by '\n', '\r', '\t'. Note that "new lines" are forced by '\n' on UNIX/DOS and '\r' on Macs. Also, the tab character '\t' is useful for exporting program output to spreadsheet programs.
ASCII codes can be used, by giving the octal value: \ooo, where ooo is an octal integer literal (up to three digits).
multi-character constants do exist (but we won't use them)

The last type of literal doesn't correspond to a basic type but instead is an array of a basic type:

Strings (Character array literals)

always of type char *
consists of sequence of character literals delimited by forward double quotes: "dog", "baby", "house", "neutrino"
adjacent (separated only by whitespace) string literals are concatenated:
"dog" "baby" is equivalent to "dogbaby"
includes a hidden '\0'. So "dog\n" consists of the character sequence {'d','o','g','\n','\0'} - 5 characters long.

First Program

So, armed with the concept of a literal, here's a simple program (our first!) that outputs the various types of literals (as well as the odd variable:

// first_prog.cc 
//
#include <iostream.h> // to get definitions for  cout, operator<<. etc

// int main()                    //   (1) alternative version of main()

int main(int argc, char *argv[]) //   (2) version of main() needed here
                                 //    argc-1 = # of commmand-line args
{


	cout << 34.0 << '\n'; /* output float, then char literal */
	cout << "Hello world\n";
	cout << "# of command-line args = " << argc <<'\n'; // output variable!
	cout << 'H' << "i there, my "<< 0xFF << "friends\n"; //mix and match
	return 0; // return statement

}

Things to note:

main() is our first function, complete with definition of its arguments (int argc, char* argv[]) and return value (int). There's an alternative version which takes no arguments with the same name - our first example of function overloading, where functions different in at least one argument but similar in meaning can be assigned the same "name".
the function main() is "the" program. Anything other functions you want executed must be called directly or indirectly from main(). Conversely, main() cannot be called by other functions (including itself).
braces {} enclose the body of the program
Output to the screen (standard ouput) is performed by" cout << (some literal". You can output a series of things in a program line by repeating the pattern (expressions are output left-to-right, just as the code reads). Don't forget a '\n' to force newlines.
The use of the output operator << is our first example of operator-overloading, where the same operator is used to output: char literals, string literals, float literals, int variables, .... This should be familiar to Fortran programmers (e.g. the sqrt() function), though unlike Fortran, the programmer is allowed to add her own operator-overloading.
the #include <iostream.h> sucked in the definitions of the operator <<. which are not built-in to the language. They're just part of a library, despite the way that the code using the library looks so "built-in"
I'll postpone discussion of the argument argv, which is of type char*, till later. Just note that running the above code as "prog foo1 foo2" would yield 3 as the "# of command-line arguments".
main() returns a value (an integer). UNIX shell programs usually take a return value of 0 to signal everything's okay, and non-zero to signal program failure
Comments begin at a '//' and end at the end of the same line in the source file. One can also use the C-style '/*' (to begin a comment) and '*/' (to end the comment), which as the advantage of being able to span an arbitrary number of lines. But it's rather easy to prematurely end one C-style comment by accidentally nesting another, which is not a problem with the C++-style comment '//', so I recommend the latter.

[ Phy 405/905 Home Page

] [Lecturer ]