Identifiers

An identifier is a sequence of characters that represents an entity such as a function or a data object.

Syntax

identifier ::= nondigit
            identifier nondigit
            identifier digit
            identifier dollar-sign
 
 
nondigit ::= any character from the set:
            _ a b c d e f g h i j k l m n o p
            q r s t u v w x y z A B C D E F G
            H I J K L M N O P Q R S T U V W X
            Y Z
 
digit ::= any character from the set:
         0 1 2 3 4 5 6 7 8 9
 
dollar-sign ::= the $ character

Description

An identifier must start with a nonnumeric character followed by a sequence of digits or nonnumeric characters. Internal and external names may have up to 255 significant characters.

Identifiers are case sensitive. The compiler considers upper- and lowercase characters to be different. For example, the identifier CAT is different from the identifier cat. This is true for external as well as internal names.

An HP extension to the language in compatibility mode allows $ as a valid character in an identifier as long as it is not the first character.

The following are examples of legal and illegal identifiers:

Legal Identifiers

Sub_Total
X
aBc
Else
do_123

Illegal Identifiers

     3xyz           First character is a digit.
     const          Conflict with a reserved word.
     #note          First character not alphabetic or _.
     Num'2          Contains an illegal character.

All identifiers that begin with the underscore (_) character are reserved for system use. If you define identifiers that begin with an underscore, the compiler may interpret them as internal system names. The resulting behavior is undefined.

Finally, identifiers cannot have the same spelling as reserved words. For example, int cannot be used as an identifier because it is a reserved word. INT is a valid identifier because it has different case letters.

Identifier Scope

The scope of an identifier is the region of the program in which the identifier has meaning. There are four kinds of scope:

File Scope — Identifiers declared outside of any block or list of parameters have scope from their declaration point until the end of the translation unit.
Function Prototype Scope — If the identifier is part of the parameter list in a function declaration, then it is visible only inside the function declarator. This scope ends with the function prototype.
Block Scope — Identifiers declared inside a block or in the list of parameter declarations in a function definition have scope from their declaration point until the end of the associated block.
Function Scope — Statement labels have scope over the entire function in which they are defined. Labels cannot be referenced outside of the function in which they are defined. Labels do not follow the block scope rules. In particular, goto statements can reference labels that are defined inside iteration statements. Label names must be unique within a function.

A preprocessor macro is visible from the #define directive that declares it until either the end of the translation unit or an #undef directive that undefines the macro.

Identifier Linkage

An identifier is bound to a physical object by the context of its use. The same identifier can be bound to several different objects at different places in the same program. This apparent ambiguity is resolved through the use of scope and name spaces. The term name spaces refers to various categories of identifiers in C ( for more information, see “Name Spaces ”).

Similarly, an identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage. There are three kinds of linkage:

Internal — within a single translation unit, each instance of an identifier with internal linkage denotes the same object or function.
External — within all the translation units and libraries that constitute an entire program, each instance of a particular identifier with external linkage denotes the same object or function.
None — identifiers with no linkage denote unique entities.

If an identifier is declared at file scope using the storage-class specifier static, it has internal linkage.

If an identifier is declared using the storage-class specifier extern, it has the same linkage as any visible declaration of the identifier with file scope. If there is no visible declaration with file scope, the identifier has external linkage.

If the declaration of an identifier for a function has no storage-class specifier, its linkage is determined exactly as if it were declared with the storage-class specifier extern. If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.

The following identifiers have no linkage: an identifier declared to be anything other that an object or a function; an identifier declared to be a function parameter; and a block scope identifier for an object declared without the storage-class specifier extern.

For example:

        extern int i;           /* External linkage */
        static float f;         /* Internal linkage */
        struct Q { int z; };    /*Q and z both have no linkage */
 
        static int func()       /* Internal linkage */
        {
           extern int temp;     /* External linkage */
           static char c;       /* No linkage */
           int j;               /* No linkage */
           extern float f;      /* Internal linkage; refers to */
                                /* float f at file scope */
         }

Two identifiers that have the same scope and share the same name space cannot be spelled the same way. Two identifiers that are not in the same scope or same name space can have the same spelling and will bind to two different physical objects. For example, a formal parameter to a function may have the same name as a structure tag in the same function. This is because the two identifiers are not in the same name space.

If one identifier is defined in a block and another is defined in a nested (subordinate) block, both can have the same spelling.

For example:

     {
        int i;       <-A
          .
          .          <-B
          .
        {
           float i;  <-C
              .      <-D
              .
              .
        }            <-E
        .
        .            <-F
        .
     }               <-G

In the example above, the identifier i is bound to two physically different objects. One object is an integer and the other is a floating-point number. Both objects, in this case, have block scope. At location A, identifier i is declared. Its scope continues until the end of the block in which it is defined (point G). References to i at location B refer to an integer object.

At point C, another identifier is declared. The previous declaration for i is hidden by the new declaration until the end of the block in which the new i is declared. References to the identifier i result in references to a floating-point number (point D). At the end of the second block (point E), the floating-point declaration of i ends. The previous declaration of i again becomes visible, and references to identifier i at point F reference an int.

Storage Duration

Identifiers that represent variables have a real existence at run time, unlike identifiers that represent abstractions like typedef names or structure tags. The duration of an object's existence is the period of time in which the object has storage allocated for it. There are two different durations for C objects:

Static — An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static, has static storage duration. Objects with static storage duration have storage allocated to them when the program begins execution. The storage remains allocated until the program terminates.
Automatic — An object whose identifier is declared with no linkage, and without the storage-class specifier static, has automatic storage duration. Objects with automatic storage duration are allocated when entering a function and deallocated on exit from a function. If you do not explicitly initialize such an object, its contents when allocated will be indeterminate. Further, if a block that declares an initialized automatic duration object is not entered through the top of the block, the object will not be initialized.

Name Spaces

In any given scope, you can use an identifier for only one purpose. An exception to this rule is caused by separate name spaces. Different name spaces allow the same identifier to be overloaded within the same scope. This is to say that, in some cases, the compiler can determine from the context of use which identifier is being referred to. For example, an identifier can be both a variable name and a structure tag.

Four different name spaces are used in C:

Labels — The definition of a label is always followed by a colon ( : ). A label is only referenced as the object of a goto statement. Labels, therefore, can have the same spelling as any nonlabel identifier.
Tags — Tags are part of structure, union, and enumeration declarations. All tags for these constructs share the same name space (even though a preceding struct, union or enum keyword could clarify their use). Tags can have the same spelling as any non-tag identifier.
Members — Each structure or union has its own name space for members. Two different structures can have members with exactly the same names. Members are therefore tightly bound to their defining structure. For example, a pointer to structure of type A cannot reference members from a structure of type B. (You may use unions or a cast to accomplish this.)
Other names — All other names are in the same name space, including variables, functions, typedef names, and enumeration constants.

Conceptually, the macro prepass occurs before the compilation of the translation unit. As a result, macro names are independent from all other names. Use of macro names as ordinary identifiers can cause unwanted substitutions.

Types

The type of an identifier defines how the identifier can be used. The type defines a set of values and operations that can be performed on these values. There are three major categories of types in C — object type, function type, and incomplete type.

Object Type
There are 3 object types — scalar, aggregate, and union. These are further subdivided (see Figure 2-1 “C Types”).
1. Scalar — These types are all objects that the computer can directly manipulate. Scalar types include pointers, numeric objects, and enumeration types.
  1. Pointer — These types include pointers to objects and functions.
  2. Arithmetic — These types include floating and integral types.
    Floating: The floating types include the following:
    float — A 32-bit floating point number.
    double — A 64-bit double precision floating point number.
    long double — A 128-bit quad precision floating point number.
    Integral: The integral types include all of the integer types that the computer supports. This includes type char, signed and unsigned integer types, and the enumerated types.
    char — An object of char type is one that is large enough to store an ASCII character. Internally, a char is a signed integer.
    Integer — Integers can be short, long, int, or long long; they are normally signed, but can be made unsigned by using the keyword unsigned with the type. In C, a computation involving unsigned operands can never overflow; high-order bits that do not fit in the result field are simply discarded without warning. A short int is a 16-bit integer. The int and long int integers are 32-bit integers. A long long int is a 64-bit integer. Integer types include signed char and unsigned char (but not "plain" char).
    Enumerated — Enumerated types are explicitly listed by the programmer; they name specified integer constant values. The enumerated type color might, for example, define red, blue, and green. An object of type enum color could then have the value red, blue, or green. As an extension to the HP C compiler, it is possible to override the default allocation of four bytes for enumerated variables by specifying a type in the declaration. For example, a short enum is two bytes long and a char enum is one byte.
2. Aggregate — Aggregate types are types that are composed of other types. With some restrictions, aggregate types can be composed of members of all of the other types including (recursively) aggregate types. Aggregate types include:
  1. Structures — Structures are collections of heterogeneous objects. They are similar to Pascal records and are useful for defining special-purpose data types.
  2. Arrays — Arrays are collections of homogeneous objects. C arrays can be multidimensional with conceptually no limit on the number of dimensions.
3. Unions — Unions, like structures, can hold different types of objects. However, all members of a union are "overlaid"; that is, they begin at the same location in memory. This means that the union can contain only one of the objects at any given time. Unions are useful for manipulating a variety of data within the same memory location.
Function Type
A function type specifies the type of the object that a function returns. A function that returns an object of type T can be referred to as a "function returning T", or simply, a T function.
Incomplete Type
The void type is an incomplete type. It comprises an empty set of values. Only pointers and functions can have void type. A function that returns void is a function that returns nothing. A pointer to void establishes a generic pointer.

Figure 2-1 illustrates the C types.

Figure 2-1 C Types