C syntax

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

C syntax makes use of the maximal munch principle.

Operators

Control structures

C is a free-form language.

Bracing style varies from programmer to programmer and can be the subject of debate. See Indentation style for more details.

Compound statements

In the items in this section, any <statement> can be replaced with a compound statement. Compound statements have the form:

{
    <optional-declaration-list>
    <optional-statement-list>
}

and are used as the body of a function or anywhere that a single statement is expected. The declaration-list declares variables to be used in that scope, and the statement-list are the actions to be performed. Brackets define their own scope, and variables defined inside those brackets will be automatically deallocated at the closing bracket. Declarations and statements can be freely intermixed within a compound statement (as in C++).

Selection statements

C has two types of selection statements: the if statement and the switch statement.

The if statement is in the form:

if (<expression>)
    <statement1>
else
    <statement2>

In the if statement, if the <expression> in parentheses is nonzero (true), control passes to <statement1>. If the else clause is present and the <expression> is zero (false), control will pass to <statement2>. The else <statement2> part is optional and, if absent, a false <expression> will simply result in skipping over the <statement1>. An else always matches the nearest previous unmatched if; braces may be used to override this when necessary, or for clarity.

The switch statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case labels, which consist of the keyword case followed by a constant expression and then a colon (:). The syntax is as follows:

switch (<expression>)
{
    case <label1> :
        <statements 1>
    case <label2> :
        <statements 2>
        break;
    default :
        <statements 3>
}

No two of the case constants associated with the same switch may have the same value. There may be at most one default label associated with a switch. If none of the case labels are equal to the expression in the parentheses following switch, control passes to the default label or, if there is no default label, execution resumes just beyond the entire construct.

Switches may be nested; a case or default label is associated with the innermost switch that contains it. Switch statements can "fall through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break; statement is encountered. Fall-through is useful in some circumstances, but is usually not desired. In the preceding example, if <label2> is reached, the statements <statements 2> are executed and nothing more inside the braces. However, if <label1> is reached, both <statements 1> and <statements 2> are executed since there is no break to separate the two case statements.

It is possible, although unusual, to insert the switch labels into the sub-blocks of other control structures. Examples of this include Duff's device and Simon Tatham's implementation of coroutines in Putty.^[1]

Iteration statements

C has three forms of iteration statement:

do
    <statement>
while ( <expression> ) ;

while ( <expression> )
    <statement>

for ( <expression> ; <expression> ; <expression> )
    <statement>

In the while and do statements, the sub-statement is executed repeatedly so long as the value of the expression remains non-zero (equivalent to true). With while, the test, including all side effects from <expression>, occurs before each iteration (execution of <statement>); with do, the test occurs after each iteration. Thus, a do statement always executes its sub-statement at least once, whereas while may not execute the sub-statement at all.

The statement:

for (e1; e2; e3)
    s;

is equivalent to:

e1;
while (e2)
{
    s;
cont:
    e3;
}

except for the behaviour of a continue; statement (which in the for loop jumps to e3 instead of e2). If e2 is blank, it would have to be replaced with a 1.

Any of the three expressions in the for loop may be omitted. A missing second expression makes the while test always non-zero, creating a potentially infinite loop.

Since C99, the first expression may take the form of a declaration, typically including an initializer, such as:

for (int i = 0; i < limit; ++i) {
    // ...
}

The declaration's scope is limited to the extent of the for loop.

Jump statements

Jump statements transfer control unconditionally. There are four types of jump statements in C: goto, continue, break, and return.

The goto statement looks like this:

goto <identifier> ;

The identifier must be a label (followed by a colon) located in the current function. Control transfers to the labeled statement.

A continue statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the innermost enclosing iteration statement. That is, within each of the statements

while (expression)
{
    /* ... */
    cont: ;
}

do
{
    /* ... */
    cont: ;
} while (expression);

for (expr1; expr2; expr3) {
     /* ... */
     cont: ;
}

a continue not contained within a nested iteration statement is the same as goto cont.

The break statement is used to end a for loop, while loop, do loop, or switch statement. Control passes to the statement following the terminated statement.

A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller as the value of the function. Encountering the end of the function is equivalent to a return with no expression. In that case, if the function is declared as returning a value and the caller tries to use the returned value, the result is undefined.

Storing the address of a label

GCC extends the C language with a unary && operator that returns the address of a label. This address can be stored in a void* variable type and may be used later in a goto instruction. For example, the following prints "hi " in an infinite loop:

    void *ptr = &&J1;

J1: printf("hi ");
    goto *ptr;

This feature can be used to implement a jump table.

Functions

Syntax

A C function definition consists of a return type (void if no value is returned), a unique name, a list of parameters in parentheses, and various statements:

<return-type> functionName( <parameter-list> )
{
    <statements>
    return <expression of type return-type>;
}

A function with non-void return type should include at least one return statement. The parameters are given by the <parameter-list>, a comma-separated list of parameter declarations, each item in the list being a data type followed by an identifier: <data-type> <variable-identifier>, <data-type> <variable-identifier>, ....

The return type cannot be an array type or function type.

int f()[3];    // Error: function returning an array
int (*g())[3]; // OK: function returning a pointer to an array.

void h()();    // Error: function returning a function
void (*k())(); // OK: function returning a function pointer

If there are no parameters, the <parameter-list> may be left empty or optionally be specified with the single word void.

It is possible to define a function as taking a variable number of parameters by providing the ... keyword as the last parameter instead of a data type ad variable identifier. A commonly used function that does this is the standard library function printf, which has the declaration:

int printf (const char*, ...);

Manipulation of these parameters can be done by using the routines in the standard library header <stdarg.h>.

Function Pointers

A pointer to a function can be declared as follows:

<return-type> (*<function-name>)(<parameter-list>);

The following program shows use of a function pointer for selecting between addition and subtraction:

#include <stdio.h>

int (*operation)(int x, int y);

int add(int x, int y)
{
    return x + y;
}

int subtract(int x, int y)
{
    return x - y;
}

int main(int argc, char* args[])
{
   int  foo = 1, bar = 1;

   operation = add;
   printf("%d + %d = %d\n", foo, bar, operation(foo, bar));
   operation = subtract;
   printf("%d - %d = %d\n", foo, bar, operation(foo, bar));
   return 0;
}

Global structure

After preprocessing, at the highest level a C program consists of a sequence of declarations at file scope. These may be partitioned into several separate source files, which may be compiled separately; the resulting object modules are then linked along with implementation-provided run-time support modules to produce an executable image.

The declarations introduce functions, variables and types. C functions are akin to the subroutines of Fortran or the procedures of Pascal.

A definition is a special type of declaration. A variable definition sets aside storage and possibly initializes it, a function definition provides its body.

An implementation of C providing all of the standard library functions is called a hosted implementation. Programs written for hosted implementations are required to define a special function called main, which is the first function called when a program begins executing.

Hosted implementations start program execution by invoking the main function, which must be defined following one of these prototypes (using different parameter names or spelling the types differently is allowed):

int main() {...}
int main(void) {...}
int main(int argc, char *argv[]) {...}
int main(int argc, char **argv) {...} // char *argv[] and char **argv have the same type as function parameters

The first two definitions are equivalent (and both are compatible with C++). It is probably up to individual preference which one is used (the current C standard contains two examples of main() and two of main(void), but the draft C++ standard uses main()). The return value of main (which should be int) serves as termination status returned to the host environment.

The C standard defines return values 0 and EXIT_SUCCESS as indicating success and EXIT_FAILURE as indicating failure. (EXIT_SUCCESS and EXIT_FAILURE are defined in <stdlib.h>). Other return values have implementation-defined meanings; for example, under Linux a program killed by a signal yields a return code of the numerical value of the signal plus 128.

A minimal correct C program consists of an empty main routine, taking no arguments and doing nothing:

int main(void){}

Because no return statement is present, main returns 0 on exit.^[2] (This is a special-case feature introduced in C99 that applies only to main.)

The main function will usually call other functions to help it perform its job.

Some implementations are not hosted, usually because they are not intended to be used with an operating system. Such implementations are called free-standing in the C standard. A free-standing implementation is free to specify how it handles program startup; in particular it need not require a program to define a main function.

Functions may be written by the programmer or provided by existing libraries. Interfaces for the latter are usually declared by including header files—with the #include preprocessing directive—and the library objects are linked into the final executable image. Certain library functions, such as printf, are defined by the C standard; these are referred to as the standard library functions.

A function may return a value to caller (usually another C function, or the hosting environment for the function main). The printf function mentioned above returns how many characters were printed, but this value is often ignored.

Argument passing

In C, arguments are passed to functions by value while other languages may pass variables by reference. This means that the receiving function gets copies of the values and has no direct way of altering the original variables. For a function to alter a variable passed from another function, the caller must pass its address (a pointer to it), which can then be dereferenced in the receiving function. See Pointers for more information.

void incInt(int *y)
{
    (*y)++;  // Increase the value of 'x', in 'main' below, by one
}

int main(void)
{
    int x = 0;
    incInt(&x);  // pass a reference to the var 'x'
    return 0;
}

The function scanf works the same way:

int x;
scanf("%d", &x);

In order to pass an editable pointer to a function (such as for the purpose of returning an allocated array to the calling code) you have to pass a pointer to that pointer: its address.

#include <stdio.h>
#include <stdlib.h>

void allocate_array(int ** const a_p, const int A) {
/* 
 allocate array of A ints
 assigning to *a_p alters the 'a' in main()
*/
    *a_p = malloc(sizeof(int) * A); 
}

int main(void) {
    int * a; /* create a pointer to one or more ints, this will be the array */

 /* pass the address of 'a' */
    allocate_array(&a, 42);

/* 'a' is now an array of length 42 and can be manipulated and freed here */

    free(a);
    return 0;
}

The parameter int **a_p is a pointer to a pointer to an int, which is the address of the pointer p defined in the main function in this case.

Array parameters

Function parameters of array type may at first glance appear to be an exception to C's pass-by-value rule. The following program will print 2, not 1:

#include <stdio.h>

void setArray(int array[], int index, int value)
{
    array[index] = value;
}

int main(void)
{
    int a[1] = {1};
    setArray(a, 0, 2);
    printf ("a[0]=%d\n", a[0]);
    return 0;
}

However, there is a different reason for this behavior. In fact, a function parameter declared with an array type is treated like one declared to be a pointer. That is, the preceding declaration of setArray is equivalent to the following:

void setArray(int *array, int index, int value)

At the same time, C rules for the use of arrays in expressions cause the value of a in the call to setArray to be converted to a pointer to the first element of array a. Thus, in fact this is still an example of pass-by-value, with the caveat that it is the address of the first element of the array being passed by value, not the contents of the array.

Since C99, the programmer can specify that a function takes an array of a certain size by using the keyword static. In void setArray(int array[static 4], int index, int value) the first parameter must be a pointer to the first element of an array of length at least 4. It is also possible to add qualifiers (const, volatile and restrict) to the pointer type that the array is converted to by putting them between the brackets.

Anonymous functions

The anonymous function is not supported by standard C programming language, but supported by some C dialects, such as GCC^[3] and Clang.

GCC

The GNU Compiler Collection (GCC) supports anonymous functions, mixed by nested functions and statement expressions. It has the form:

( { return_type anonymous_functions_name (parameters) { function_body } anonymous_functions_name; } )

The following example works only with GCC. Because of how macros are expanded, the l_body cannot contain any commas outside of parentheses; GCC treats the comma as a delimiter between macro arguments. The argument l_ret_type can be removed if __typeof__ is available; in the example below using __typeof__ on array would return testtype *, which can be dereferenced for the actual value if needed.

#include <stdio.h>

//* this is the definition of the anonymous function */
#define lambda(l_ret_type, l_arguments, l_body)        \
  ({                                                   \
   l_ret_type l_anonymous_functions_name l_arguments   \
   l_body                                              \
   &l_anonymous_functions_name;                        \
   })

#define forEachInArray(fe_arrType, fe_arr, fe_fn_body)                                    \
{                                                                                         \
  int i=0;                                                                                \
  for(;i<sizeof(fe_arr)/sizeof(fe_arrType);i++) {  fe_arr[i] = fe_fn_body(&fe_arr[i]); }  \
}

typedef struct
{
  int a;
  int b;
} testtype;

void printout(const testtype * array)
{
  int i;
  for ( i = 0; i < 3; ++ i )
    printf("%d %d\n", array[i].a, array[i].b);
  printf("\n");
}

int main(void)
{
  testtype array[] = { {0,1}, {2,3}, {4,5} };

  printout(array);
  /* the anonymous function is given as function for the foreach */
  forEachInArray(testtype, array,
    lambda (testtype, (void *item),
    {
      int temp = (*( testtype *) item).a;
      (*( testtype *) item).a = (*( testtype *) item).b;
      (*( testtype *) item).b = temp;
      return (*( testtype *) item);
    }));
  printout(array);
  return 0;
}

Clang (C, C++, Objective-C, Objective-C++)

Clang supports anonymous functions, called blocks,^[4] which have the form:

^return_type ( parameters ) { function_body }

The type of the blocks above is return_type (^)(parameters).

Using the aforementioned blocks extension and Grand Central Dispatch (libdispatch), the code could look simpler:

#include <stdio.h>
#include <dispatch/dispatch.h>

int main(void) {
  void (^count_loop)() = ^{
    for (int i = 0; i < 100; i++)
      printf("%d\n", i);
    printf("ah ah ah\n");
  };

/* Pass as a parameter to another function */
  dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), count_loop);

/* Invoke directly */
  count_loop();

  return 0;
}

The code with blocks should be compiled with -fblocks and linked with -lBlocksRuntime

Miscellaneous

Reserved keywords

The following words are reserved, and may not be used as identifiers:

_Alignas

_Alignof

_Atomic

auto

_Bool

break

case

char

_Complex

const

continue

default

do

double

else

enum

extern

float

for

_Generic

goto

if

_Imaginary

inline

int

long

_Noreturn

register

restrict

return

short

signed

sizeof

static

struct

switch

_Thread_local

typedef

union

unsigned

void

volatile

while

Implementations may reserve other keywords, such as asm, although implementations typically provide non-standard keywords that begin with one or two underscores.

Case sensitivity

C identifiers are case sensitive (e.g., foo, FOO, and Foo are the names of different objects). Some linkers may map external identifiers to a single case, although this is uncommon in most modern linkers.

Comments

Text starting with the token /* is treated as a comment and ignored. The comment ends at the next */; it can occur within expressions, and can span multiple lines. Accidental omission of the comment terminator is problematic in that the next comment's properly constructed comment terminator will be used to terminate the initial comment, and all code in between the comments will be considered as a comment. C-style comments do not nest; that is, accidentally placing a comment within a comment has unintended results:

/*
This line will be ignored.
/*
A compiler warning may be produced here. These lines will also be ignored.
The comment opening token above did not start a new comment,
and the comment closing token below will close the comment begun on line 1.
*/
This line and the line below it will not be ignored. Both will likely produce compile errors.
*/

C++ style line comments start with // and extend to the end of the line. This style of comment originated in BCPL and became valid C syntax in C99; it is not available in the original K&R C nor in ANSI C:

// this line will be ignored by the compiler

/* these lines
   will be ignored
   by the compiler */

x = *p/*q;  /* this comment starts after the 'p' */

Command-line arguments

The parameters given on a command line are passed to a C program with two predefined variables - the count of the command-line arguments in argc and the individual arguments as character strings in the pointer array argv. So the command:

myFilt p1 p2 p3

results in something like:

m	y	F	i	l	t	\0	p	1	\0	p	2	\0	p	3	\0
argv[0]							argv[1]			argv[2]			argv[3]

While individual strings are arrays of contiguous characters, there is no guarantee that the strings are stored as a contiguous group.

The name of the program, argv[0], may be useful when printing diagnostic messages or for making one binary serve multiple purposes. The individual values of the parameters may be accessed with argv[1], argv[2], and argv[3], as shown in the following program:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("argc\t= %d\n", argc);
    for (int i = 0; i < argc; i++)
        printf("argv[%i]\t= %s\n", i, argv[i]);
}

Evaluation order

In any reasonably complex expression, there arises a choice as to the order in which to evaluate the parts of the expression: (1+1)+(3+3) may be evaluated in the order (1+1)+(3+3), (2)+(3+3), (2)+(6), (8), or in the order (1+1)+(3+3), (1+1)+(6), (2)+(6), (8). Formally, a conforming C compiler may evaluate expressions in any order between sequence points (this allows the compiler to do some optimization). Sequence points are defined by:

Statement ends at semicolons.
The sequencing operator: a comma. However, commas that delimit function arguments are not sequence points.
The short-circuit operators: logical and (&&, which can be read and then) and logical or (||, which can be read or else).
The ternary operator (?:): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first.
Entry to and exit from a function call (but not between evaluations of the arguments).

Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression (a() || b()), if the first argument evaluates to nonzero (true), the result of the entire expression cannot be anything else than true, so b() is not evaluated. Similarly, in the expression (a() && b()), if the first argument evaluates to zero (false), the result of the entire expression cannot be anything else than false, so b() is not evaluated.

The arguments to a function call may be evaluated in any order, as long as they are all evaluated by the time the function is entered. The following expression, for example, has undefined behavior:

 printf("%s %s\n", argv[i = 0], argv[++i]);

Undefined behavior

An aspect of the C standard (not unique to C) is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as the programmer intended, to crashing every time it is run.

For example, the following code produces undefined behavior, because the variable b is modified more than once with no intervening sequence point:

#include <stdio.h>

int main(void)
{
    int b = 1;
    int a = b++ + b++;
    printf("%d\n", a);
}

Because there is no sequence point between the modifications of b in "b++ + b++", it is possible to perform the evaluation steps in more than one order, resulting in an ambiguous statement. This can be fixed by rewriting the code to insert a sequence point in order to enforce an unambiguous behavior, for example:

a = b++;
a += b++;

Notes

References

^ Tatham, Simon (2000). "Coroutines in C". Retrieved 2017-04-30.
^ Klemens, Ben (2012). 21st Century C. O'Reilly Media. ISBN 978-1449327149.
^ "Statement Exprs (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-01-12.
^ "Language Specification for Blocks — Clang 13 documentation". clang.llvm.org. Retrieved 2022-01-14.

General

Kernighan, Brian W.; Ritchie, Dennis M. (1988). The C Programming Language (2nd ed.). Upper Saddle River, New Jersey: Prentice Hall PTR. ISBN 0-13-110370-9.
American National Standard for Information Systems - Programming Language - C - ANSI X3.159-1989
"ISO/IEC 9899:2018 - Information technology -- Programming languages -- C". International Organization for Standardization.
"ISO/IEC 9899:1999 - Programming languages - C". Iso.org. 2011-12-08. Retrieved 2014-04-08.

External links

[1] Tatham, Simon (2000). "Coroutines in C". Retrieved 2017-04-30.

[bk21st-2] Klemens, Ben (2012). 21st Century C. O'Reilly Media. ISBN 978-1449327149.

[3] "Statement Exprs (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Retrieved 2022-01-12.

[4] "Language Specification for Blocks — Clang 13 documentation". clang.llvm.org. Retrieved 2022-01-14.

[1]

[2]

[3]

[4]