Jump to content

Undefined behavior

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Helios2k6 (talk | contribs) at 19:15, 26 January 2016 (Copy-editing introductory sentence to be clearer and more concise.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer programming, undefined behavior (UB) is the result of executing computer code that does not have a prescribe behavior by the language specification the code adheres to. This happens when the translator of the source code makes certain assumptions, but these assumptions are not satisfied during execution.

The behavior of some programming languages - most famously C and C++ - is undefined in some cases.[1] In the standards for these languages, the semantics of certain operations is undefined. An implementation is allowed to assume that such operations never occur in standard-conforming program code; the implementation will be considered correct whatever it does in such cases, analogous to don't-care terms in digital logic. This assumption can make various program transformations valid or simplify their proof of correctness, giving flexibility to the implementation. As a result, the compiler can often make more optimizations. It is the responsibility of the programmer to write code that never invokes undefined behavior, although compiler implementations are allowed to issue diagnostics when this happens.

Undefined behavior is often unpredictable and a frequent cause of software bugs. In the C community, undefined behavior may be humorously referred to as "nasal demons", after a comp.std.c post that explained undefined behavior as allowing the compiler to do anything it chooses, even "to make demons fly out of your nose".[2] Under some circumstances there can be specific restrictions on undefined behavior. For example, the instruction set specifications of a CPU might leave the behavior of some forms of an instruction undefined, but if the CPU supports memory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in the operating system's security; so an actual CPU would be permitted to corrupt user registers in response to such an instruction, but would not be allowed to, for example, switch into supervisor mode.

Benefits from undefined behavior

Documenting an operation as undefined behavior allows compilers to assume that this operation will never happen in a conforming program. This gives the compiler more information about the code and this information can lead to more optimization opportunities.

int foo(unsigned x)
{
     int value = 5;
     value += x;
     if (value < 5)
        bar();
     return value;
}

The value of x cannot be negative and, given that signed integer overflow is undefined behavior, the compiler can assume that at the line of the if check value >= 5. Thus the if and the call to the function bar can be ignored by the compiler since the if has no side effects and its condition will never be satisfied. The code above is equivalent to:

int foo(unsigned x)
{
     int value = 5;
     value += x;
     return value;
}

Such optimizations become hard to spot by humans when the code is more complex and other optimizations, like inlining, take place.

Examples in C and C++

In C the use of any automatic variable before it has been initialized yields undefined behavior, as does division by zero or indexing an array outside of its defined bounds (see buffer overflow). In general, any instance of undefined behavior leaves the abstract execution machine in an unknown state, and any subsequent behavior is also undefined. If it is not required that the compiler diagnose undefined behavior, programs invoking undefined behavior may compile and run producing correct results, incorrect results, or have any other behavior. Because of this, undefined behavior can create errors that are difficult to detect.

Attempting to modify a string literal causes undefined behavior:[3]

char *p = "wikipedia"; // ill-formed C++11, deprecated C++98/C++03
p[0] = 'W'; // undefined behavior

One way to prevent this is defining it as an array instead of a pointer.

char p[] = "wikipedia"; // RIGHT
p[0] = 'W';

In C++, one can use a standard string as follows:

std::string s = "wikipedia"; // RIGHT
s[0] = 'W';

Integer division by zero results in undefined behavior:[4]

int x = 1;
return x / 0; // undefined behavior

Certain pointer operations may result in undefined behavior:[5]

int arr[4] = {0, 1, 2, 3};
int *p = arr + 5;  // undefined behavior

Reaching the end of a value-returning function (other than main()) without a return statement results in undefined behavior if the value of the function call is used by the caller:[6]

int f()
{
}  /* undefined behavior if the value of the function call is used*/

The original The C Programming Language book cites the following examples of code which “can (and does) produce different results on different machines”[7] (which could be considered just unspecified or implementation-defined behavior in today's terms)[citation needed]:

printf("%d %d\n", ++n, power(2, n));    /* WRONG */
a[i] = i++;

The later ANSI C standard chose to leave similar constructions undefined, e.g. “This paragraph renders undefined statement expressions such as i = ++i + 1; while allowing i = i + 1;”.[8]

Risks of undefined behavior

HTML versions 4 and earlier left error handling undefined. Over time pages started relying on unspecified error-recovery implemented in popular browsers. This caused difficulties for vendors of less-popular browsers who were forced to reverse-engineer and implement bug compatible error recovery. This has led to de facto standard that was much more complicated than it could have been if this behavior was specified from the start.[citation needed]

Undefined behavior in server programs can cause security issues. When GCC's developers changed their compiler in 2008 such that it omitted certain overflow checks that relied on undefined behavior, CERT issued a warning against the newer versions of the compiler.[9] Linux Weekly News pointed out that the same behavior was observed in PathScale C, Microsoft Visual C++ 2005 and several other compilers;[10] the warning was later amended to warn about various compilers.[11]

See also

References

  1. ^ Lattner, Chris (May 13, 2011). "What Every C Programmer Should Know About Undefined Behavior". LLVM Project Blog. LLVM.org. Retrieved May 24, 2011.
  2. ^ "nasal demons". The Jargon File. Retrieved 12 June 2014.
  3. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §2.13.4 String literals [lex.string] para. 2
  4. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.6 Multiplicative operators [expr.mul] para. 4
  5. ^ ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages - C++ §5.7 Additive operators [expr.add] para. 5
  6. ^ ISO/IEC (2007). ISO/IEC 9899:2007(E): Programming Languages - C §6.9 External definitions para. 1
  7. ^ Kernighan, Brian W.; Ritchie, Dennis M. (February 1978). The C Programming Language (1st ed.). Englewood Cliffs, NJ: Prentice Hall. p. 50. ISBN 0-13-110163-3.
  8. ^ ANSI X3.159-1989 Programming Language C, footnote 26
  9. ^ "Vulnerability Note VU#162289 — gcc silently discards some wraparound checks". Vulnerability Notes Database. CERT. 4 April 2008. Archived from the original on 9 April 2014. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch; 9 April 2008 suggested (help)
  10. ^ Jonathan Corbet (16 April 2008). "GCC and pointer overflows". Linux Weekly News.
  11. ^ "Vulnerability Note VU#162289 — C compilers may silently discard some wraparound checks". Vulnerability Notes Database. CERT. 8 October 2008 [4 April 2008].

Further reading