[Phy 405/905 Home Page] [ Lecturer ]
In the introductory lecture, I gave a rapid (maybe terrifyingly so) overview of
C and C++, and some examples of the object-oriented features of the latter.
Here we begin a much slower, thorough introduction to the subject. Starting
with the basics of compiling programs under VMS (msupa.pa.msu.edu) and UNIX
(pads1.pa.msu.edu), a simple program to print out the command-line arguments
introduces identifiers, literals, functions (and
function overloading).
Also bear in mind that I'll sprinkle definitions of terms which will
sometimes seem rather elementary to you, but not to others - due to the very
large spread in computer-programming experience in our class.
C++ is usually a compiled language (in contrast to BASIC, which is
traditionally an interpreted language). This means that after you have written
your program and typed it into one or more files, you will need to compile
each of the files (together or separately) and then link all the files
together. Here's some definitions in case you're fuzzy on the concepts:
- compile
- To reduce a file containing C++ code into a form the computer can directly
execute, i.e., machine language. The original file is a source code
file while the end result is an object file.
- link
- To combine several object files together into a single executable file - a
"binary". This is less trivial than it sounds, as we shall see when we
discuss the scope of objects.
The details of this process are different in
UNIX (e.g., pads1.pa.msu.edu) and VMS (e.g., msupa.pa.msu.edu), but the
smallest unit of compilation for both is a single file.
Unix:
- Any filename legal under UNIX is okay, though it is convenient for our
compiler (GNU g++) if the name is of the form "xxx.cc" - for example,
"my_program.cc". Upper/lower-case is distinguished, and underscores ('_') are
useful for more intelligible filenames (e.g., "main.cc", "io_routines.cc",
"diff_eq.cc", ...)
- g++ -c xxx1.cc xxx2.cc compiles one or more files xxx1.cc,
xxx2.cc, .... They are not linked together, however.
- g++ -o Prog xxx1.o xxx2.o links together the previously
compiled files (compiled using the previous command) into the executable file
"Prog". The first file ("xxx1.o") must contain the main() function.
- g++ -o Prog xxx1.cc xxx2.cc compiles and links together
the files into the executable "Prog". The first file ("xxx1.cc") must contain
the main() function.
- Prog my name is Fred runs the program "Prog", with command
line arguments "my name is Fred" (command line arguments are introduced below).
Type the key-combination control-C to stop the program prematurely.
VMS:
- Any filename legal under VMS is okay, though it is convenient for the VMS
compiler names are of the form "xxx.cxx" - for example "myprog.cc". Of course,
VMS doesn't distinguish lower/uppercase names.
- cxx xxx1.cxx,xxx2.cxx,xxx3.cxx compiles one or more C++
source files. If the extension is in fact ".cxx", the extension can be omitted
(cxx xxx1)
- link xxx1,xxx2,xxx3 links together one or more previously
compiled object files (e.g. "xxx1.obj" produced by "cxx xxx1") into one
executable file "xxx1.exe", where the base name of the binary comes from the
first file (which must contain the main() function). run xxx1
runs the program.This executable, however, is not ready to handle command-line
arguments (unless you're running under POSIX, which I'll talk about later), so
go on to the next step to be able to use
- prog == "$dept1:[davechao]xxx1.exe performs the VMS magic
to make command line arguments (like in UNIX!) possible, where the " is
absolutely necessary, and $dept1:[davechao]xxx1.exe is the complete file name.
After this step, you can run the program via Prog my name is
Fred (just like in UNIX!). Type the key-combination control-z to stop
the program prematurely.
We are discussing here having the shell (csh or Bourne under UNIX, DCL under
VMS) run your program for you, but of course it could be called directly by the
operating system on a lower-level, or by an application program such as
Mathematica. Just bear the possibility in mind.
- I suggest Emacs under either UNIX (it could be installed on our VMS
machine as well, but EVE looks like it has some of the same functionality
there); you could alternatively edit the source files in WordPerfect on your
Mac at home, and just keep ftp'ing the files to pads1.pa.msu.edu for
compilation and execution.
- If you do transport files back and forth from your Mac or PC, ftp files in
"TEXT" mode to ensure that carriage return characters ('\r') on the Mac, for
example, get properly mapped to newlines ('\n') for UNIX.
As will be the pattern this semester, I'll present a couple concepts in detail,
then take a look at a simple illustrative example. In this lecture, we'll next
define identifier and literal, then learn a little on printing to
the screen.
- identifier
- Basically a "legal" name in C++, be it for words reserved for use by the
language (keyword) or as the name of a variable, function, etc.
Besides the wide variety of punctuation marks used by C++ (&, *, $, ...,
etc.), identifiers are the "words" making up the "sentences" in your source
code.
Identifiers:
- must begin with a letter (which is defined as any of : a-z, A-Z and _),
with successive characters being either letters or digits. Note that the name
_foo is allowed, because '_' is a letter as far as identifiers are
concerned.
- Upper/lower case distinction is made, so "MyVariable" != "myvariable"
- Names of the form __xxxxx (e.g., __myvar) are
discouraged, because they are reserved for use by the system (or system
libraries). A single '_' is alright, however (_myvar is fine).
Underscores in the middle of names is more than okay, it's quite useful for
informative identifiers: four_vector, decision_tree, particle_code,
Asteroid_and_Comet_Database
- An arbitrary (finite) length is permitted by the standard, but in practice
there is a limit to the length of identifiers imposed by the linker, which
varies from system to system.
Next up is the literal, which I'm bringing up because
- we use them in the simple program that follows
- they provide a convenient warm-up to discussion of the basic types
in C++.
Literals are what are often called "constants" in other languages, but since
C++ has a precise idea of what a constant is, let's stick with literal.
- literal
- An explicit value (no associated identifier) of one of the types
described below.
There are different kinds of literals, roughly corresponding to the basic types
(numeric and character) as well as an array type (character arrays). There are:
- integer literals
- floating point literals
- character literals
- character array ("string") literals
The first three types are broken
down into more specific categories according to the number of bits used
to represent the literal, and whether they are signed or not.
Just a reminder -
- a bit is the smallest unit in computer memory, capable of holding a
single binary digit (with value 0 or 1)
- a byte is a sequence of 8 bits (and 1 K is 1024 bytes, 1M is 1024*1024
bytes)
- a machine word is the "natural" size for a particular machine (which these
days is likely to be either 4 bytes or 8 bytes).
- are interpreted as of type {int,unsigned int, long int, long
unsigned int}
- can be specified in hexadecimal (0xFF, -0X1e), octal
(+017, 07777), or decimal (24,-10), with or without a
+ or - sign. The (x or X)/(0)/(no ) prefix selects the
number base (16/8/10).
- If no - is present, the integer literal can be made unsigned with a (u
or U) suffix: 0x10u, 24u
- The integer literal can be forced to be twice the normal bit-length
("double-precision", and double the range of representable numbers) with the
suffix (l or L). The textbook rightly recommends use of 'L'
however to avoid confusion of 'l' with '1'. Examples: -0x44e0L, 045UL.
- Despite what the textbook says, (unsigned) integer literals do not default
to type (unsigned) int, but rather are assigned to one of the basic
integer types {int, unsigned int, long int, unsigned long int} on a
"smallest-shoe to fit" basis.
- the type short int is not available for integer literals (though
this is not really a problem; see discussion in a later lecture on casting)
- see <limits.h> for ranges of these types
- are of type {float, double, long double)
- of the form: -3.0E4 (double) , 0.0241 (double) , 3e4 (double) , 4.0f-4
(float) ,
-23L-43 (long double)
- only base-10
- in contrast to integer literals, default is simple - with only a decimal
point (no exponent), they are type double; otherwise the exponent symbol
specifies float (single-precision), double, and long double (l or L).
- see <float.h> for ranges of these types
- are simply type char (not unsigned char, etc.)
- delimited by single forward quote: 'a', '\0', '\n', 'd'
- whitespace (newlines, carriage return, tab, ...) are specified by '\n',
'\r', '\t'. Note that "new lines" are forced by '\n' on UNIX/DOS and '\r' on
Macs. Also, the tab character '\t' is useful for exporting program output to
spreadsheet programs.
- ASCII codes can be used, by giving the octal value: \ooo, where
ooo is an octal integer literal (up to three digits).
- multi-character constants do exist (but we won't use them)
The last type of literal doesn't correspond to a basic type but instead is an
array of a basic type:
- always of type char *
- consists of sequence of character literals delimited by forward double
quotes: "dog", "baby", "house", "neutrino"
- adjacent (separated only by whitespace) string literals are concatenated:
"dog" "baby" is equivalent to "dogbaby"
- includes a hidden '\0'. So "dog\n" consists of the
character sequence {'d','o','g','\n','\0'} - 5 characters long.
So, armed with the concept of a literal, here's a simple program (our first!)
that outputs the various types of literals (as well as the odd variable:
// first_prog.cc
//
#include <iostream.h> // to get definitions for cout, operator<<. etc
// int main() // (1) alternative version of main()
int main(int argc, char *argv[]) // (2) version of main() needed here
// argc-1 = # of commmand-line args
{
cout << 34.0 << '\n'; /* output float, then char literal */
cout << "Hello world\n";
cout << "# of command-line args = " << argc <<'\n'; // output variable!
cout << 'H' << "i there, my "<< 0xFF << "friends\n"; //mix and match
return 0; // return statement
}
Things
to note:
- main() is our first function, complete with definition of its
arguments (int argc, char* argv[]) and return value (int). There's an
alternative version which takes no arguments with the same name - our first
example of function overloading, where functions different in at least
one argument but similar in meaning can be assigned the same "name".
- the function main() is "the" program. Anything other functions you want
executed must be called directly or indirectly from main(). Conversely, main()
cannot be called by other functions (including itself).
- braces {} enclose the body of the program
- Output to the screen (standard ouput) is performed by" cout << (some
literal". You can output a series of things in a program line by repeating the
pattern (expressions are output left-to-right, just as the code reads). Don't
forget a '\n' to force newlines.
- The use of the output operator << is our first example of
operator-overloading, where the same operator is used to output: char
literals, string literals, float literals, int variables, .... This should be
familiar to Fortran programmers (e.g. the sqrt() function), though unlike
Fortran, the programmer is allowed to add her own operator-overloading.
- the #include <iostream.h> sucked in the definitions of the operator
<<. which are not built-in to the language. They're just part of a
library, despite the way that the code using the library looks so "built-in"
- I'll postpone discussion of the argument argv, which is of type char*,
till later. Just note that running the above code as "prog foo1 foo2" would
yield 3 as the "# of command-line arguments".
- main() returns a value (an integer). UNIX shell programs usually take a
return value of 0 to signal everything's okay, and non-zero to signal program
failure
- Comments begin at a '//' and end at the end of the same line in the source
file. One can also use the C-style '/*' (to begin a comment) and '*/' (to end
the comment), which as the advantage of being able to span an arbitrary number
of lines. But it's rather easy to prematurely end one C-style comment by
accidentally nesting another, which is not a problem with the C++-style comment
'//', so I recommend the latter.
[ Phy 405/905 Home
Page] [Lecturer ]