C string handling
C standard library (libc) |
---|
General topics |
Miscellaneous headers |
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character ('\0', called NUL in ASCII). The name refers to the C programming language which uses this string representation. Alternative names are C string and ASCIIZ (note that C strings do not imply the use of ASCII).
The length of a C string is found by searching for the (first) NUL byte. This can be slow as it takes O(n) (linear time) with respect to the string length. It also means that a NUL cannot be inside the string, as the only NUL is the one marking the end.
The term, string, is used in C to describe a contiguous sequence of characters terminated by and including the first null byte.[1] A common misconception is that a string is an array, because string literals are converted to arrays during the compilation (or translation) phase.[2] It is important to remember that a string ends at the first NUL byte. An array or string literal that contains a null byte before the last byte therefore contains a string, or possibly several strings, but is not itself a string.[3]
The term, pointer to a string is used in C to describe a pointer to the initial (lowest addressed) byte of a string.[4] As pointers are used to pass a reference to a string to functions in C, documentation (including this page) will often use the term string when correct notation is to say pointer to string.
The term, length of a string is used in C to describe the number of bytes preceding the null character.[5] strlen is a standardised function commonly used to determine the length of a string.
Function overview
Most of the functions that operate on C strings are defined in the string.h
(cstring
header in C++). This header contains declarations of functions and types used not only for handling C strings but also various memory handling functions; the name is thus something of a misnomer.
Functions declared in string.h
are extremely popular, since as a part of the C standard library, they are guaranteed to work on any platform which supports C. However, some security issues exist with these functions, such as buffer overflows, leading programmers to prefer safer, possibly less portable variants. Also, the string functions only work with character encodings made of bytes, such as ASCII and UTF-8. In historical documentation the term "character" was often used instead of "byte", which if followed literally would mean that multi-byte encodings such as UTF-8 were not supported. The BSD documentation has been fixed to make this clear, but POSIX, Linux, and Windows documentation still uses "character" in many places. Functions to handle character encodings made up of larger code units than bytes, such as UTF-16, is generally achieved through wchar.h
.
Constants and types
Name | Notes |
---|---|
NULL |
macro expanding to the null pointer constant; that is, a constant representing a pointer value which is guaranteed not to be a valid address of an object in memory. |
size_t |
an unsigned integer type which is the type of the result of the sizeof operator.
|
Functions
- String manipulation
- - copies one string to another
- - write exactly n bytes to a string, copying from src or add 0's
- - appends one string to another
- - appends no more than n bytes from one string to another
- - transforms a string according to the current locale
- String examination
- - returns the length of a string
- - compares two strings
- - compares specific number of bytes in two strings
- - compares two strings according to the current locale
- - finds the first occurrence of a byte
- - finds the last occurrence of a byte
- - finds length of a substring
- - finds the last occurrence of a byte not in a set of bytes
- - finds the first occurrence of a byte in a set of bytes
- - finds the first occurrence of a substring
- - finds the next occurrence of a token
- Miscellaneous
- - returns a string error message derived from the error code
- Memory manipulation
- - fills a buffer with a byte repeated
- - copies one buffer to another
- - copies one buffer to another, possibly overlapping, buffer
- - compares two buffers
- - finds the first occurrence of a byte
Extensions to ISO C
Name | Notes | Specification |
---|---|---|
void *memccpy(void *dest, const void *src, int c, size_t n);
|
copies up to n bytes between two memory areas, which must not overlap, stopping when the byte c is found | SVID, POSIX[6] |
void *mempcpy(void *dest, const void *src, size_t n);
|
variant of memcpy returning a pointer to the byte following the last written byte
|
GNU |
errno_t strcat_s(char *dest, size_t n, const char *src);
|
variant of strcat and strcpy that clear the destination if it is too small
|
ISO/IEC WDTR 24731, but currently only supported by Microsoft Visual C++. Warning messages produced by Microsoft's compilers suggesting programmers use these functions instead of standard ones have been speculated by some to be a Microsoft attempt to lock developers to its platform.[7][8] |
errno_t strcpy_s(char *dest, size_t n, const char *src);
| ||
char *strdup(const char *src);
|
allocates and duplicates a string into memory | POSIX; originally a BSD extension |
int strerror_r(int, char *, size_t);
|
Puts the result of strerror() into the provided buffer in a thread-safe way. | IEEE Std 1003.1, also known as POSIX 1. |
char *strerror_r(int, char *, size_t);
|
Return strerror() in a thread-safe way. The provided buffer is used only if necessary (incompatible with POSIX version). | GNU |
size_t strlcat(char *dest, const char *src, size_t n);
|
variant of strcat and strcpy that truncate so the nul always fits in the destination[9]
|
Originally OpenBSD, now also FreeBSD, Solaris, Mac OS X. Developed by Todd C. Miller and Theo de Raadt. Very popular, but notably not on Linux due to objections by GNU C Library maintainer Ulrich Drepper[10] Also criticized for lacking documentation other than source code[11]. |
size_t strlcpy(char *dest, const char *src, size_t n);
| ||
char *strsignal(int sig);
|
by analogy to strerror , returns string representation of the signal sig (not thread safe)
|
POSIX:2008[12] |
char *strtok_r(char *, const char *delim, char **saveptr);
|
thread-safe and reentrant version of strtok[13] | POSIX |
Numeric conversions
![]() | This section needs expansion. You can help by adding to it. (October 2011) |
- atof - C/C++ - converts a string to a floating-point value
- atoi, atol, atoll(C99/C++11) - C/C++ - converts a string to an integer
- strtof, strtod, strtold - C/C++ - converts a string to a floating-point value
- strtol, strtoll - C/C++ - converts a string to a signed integer
- strtoul, strtoull - C/C++ - converts a string to an unsigned integer
See also
References
- ^ "The C99 standard draft + TC3" (PDF). Section 7.1.1p1. Retrieved 7 January 2011.
{{cite web}}
: CS1 maint: location (link) - ^ "The C99 standard draft + TC3" (PDF). Section 6.4.5p7. Retrieved 7 January 2011.
{{cite web}}
: CS1 maint: location (link) - ^ "The C99 standard draft + TC3" (PDF). Section 6.4.5 footnote 66. Retrieved 7 January 2011.
{{cite web}}
: CS1 maint: location (link) - ^ "The C99 standard draft + TC3" (PDF). Section 7.1.1p1. Retrieved 7 January 2011.
{{cite web}}
: CS1 maint: location (link) - ^ "The C99 standard draft + TC3" (PDF). Section 7.1.1p1. Retrieved 7 January 2011.
{{cite web}}
: CS1 maint: location (link) - ^ http://pubs.opengroup.org/onlinepubs/009695399/functions/memccpy.html
- ^ Danny Kalev. "They're at it again". InformIT.
- ^ "Security Enhanced CRT, Safer Than Standard Library?".
- ^ Todd C. Miller (1999). "strlcpy and strlcat - consistent, safe, string copy and concatenation". USENIX '99.
{{cite web}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ libc-alpha mailing list, selected messages from 8 August 2000 thread: 53, 60, 61
- ^ Antill, James. Security with string APIs: Security relevant things to look for in a string library API
- ^ http://pubs.opengroup.org/onlinepubs/9699919799/functions/strsignal.html
- ^ http://pubs.opengroup.org/onlinepubs/009695399/functions/strtok.html