Jump to content

Naming convention (programming)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Eelis.net (talk | contribs) at 23:31, 22 May 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In computer programming a naming convention is a set of rules for choosing identifiers.

Naming conventions are commonly used for various purposes:

  • to embed information about the entity, such as its type and intended use, in the identifier (See: Hungarian notation);
  • to work around restrictions imposed by the language on what an identifier may look like (See: CamelCase notation);
  • to ensure clarity (for example by disallowing overly long names or abbreviations);

Programmers tend to be very picky about naming conventions, and choise of naming conventions can become an enormously controversial issue, with partisans of each holding theirs to be the best and others to be inferior.

Multiple-word identifiers

As most programming languages do not allow spaces in identifiers, some system must be devised when a programmer wishes to use a name containing multiple words. There are several in widespread use; each has a significant following, though sometimes one dominates amongst users of a particular programming language. There are also some programmers who eschew multiple-word names entirely, and so use none of these systems (see the section below on the amount of information in identifiers).

One approach is to replace spaces with another character. The two characters commonly used for this purpose are the hyphen ('-') and the underscore ('_'), so the two-word name two words would be represented as two-words or two_words. The hyphen is arguably the easier to type and more readable of these, and is used by nearly all programmers of Lisp, Scheme, and other languages that permit hyphens in identifiers. However, many other languages reserve the hyphen for use as the subtraction operator, and so do not permit it in identifiers. Thus some programmers of these languages use underscores instead. However, underscores are somewhat harder to type due to their location on most keyboards, and so this solution has not been universally adopted; it is, however, in fairly widespread use among programmers of C, Perl, and many scripting languages.

An alternate approach, developed mostly as an alternative to the underscore in languages that do not permit hyphens, is to omit the space and indicate word boundaries using capitalization, thus rendering two words as either twoWords or TwoWords. This is called CamelCase, among other names.

There are several methods of writing multi-word identifier names in computer languages that tokenize on whitespace. For example, a variable called "my favorite variable" could be written as:

  • myFavoriteVariable (lower camel case)
  • MyFavoriteVariable (upper camel case)
  • my_favorite_variable (underscored)
  • my-favorite-variable (dashed)
  • MY_FAVORITE_VARIABLE (all caps)
  • myfvvbl (some people would use this one)

Information in identifiers

There is significant disagreement over how much information to put in identifiers. This was driven initially by technical reasons, as some early programming languages only allowed identifiers of a certain length. Thus in the standard C library (C was initially one of those languages), one finds atoi as the name of a function that converts ASCII strings to integers. In Lisp, one would be more likely to encounter the same function named as ascii-to-integer or similar. However, the use of shorter identifiers has outlived those technical restrictions, partly as heritage (it continues more commonly in those languages that once had the restrictions), and partly out of ease of use -- it's simply easier to type shorter identifiers, especially when the identifier is used frequently. Those who prefer the longer identifiers argue that the difficulty of typing the longer identifiers is outweighed by the ease of reading code that is more descriptive rather than peppered with impenetrable acronyms and abbreviations.

In addition to the issue of length of identifiers in their descriptive capacity, there are also several systems for codifying specific technical aspects of a particular identifier in the name. Perhaps the most well-known is Hungarian notation, which encodes the type of a variable in its name. Several more minor conventions are widespread; one example is the convention of naming variables in C and C++ with an initial lowercase letter, and naming user-defined datatypes with an initial capital letter.