Schlemiel the Painter's algorithm
In software development, a Schlemiel the Painter's algorithm (or Schlemiel the Painter algorithm) is a methodology that is inefficient because the programmer has overlooked some fundamental issues at the very lowest levels of software design. The term was coined in 2001 by software engineer and essayist Joel Spolsky.
The algorithm should not to be confused with the Painter's algorithm of image compositing, as the two are completely unrelated.
Spolsky's analogy
Spolsky used a Yiddish joke to illustrate a certain poor programming practice. In the joke, Schlemiel (also rendered Shlemiel) has a job painting the dotted lines down the middle of a road. Each day, Schlemiel paints less than he painted the day before. When he is asked why, Schlemiel complains that it is because each day he gets farther away from the paint can.[1]
The inefficiency to which Spolsky was drawing an analogy was the poor programming practice of repeated concatenation of C-style null-terminated character arrays (that is, strings) in which the position of the destination string has to be recomputed from the beginning of the string each time because it is not carried over from a previous concatenation.
Spolsky condemned such inefficiencies as typical for programmers who had not been taught basic programming techniques before they began programming using higher level languages: "Generations of graduates are descending on us and creating Schlemiel The Painter algorithms right and left and they don't even realize it, since they fundamentally have no idea that strings are, at a very deep level, difficult."[1]
Spolsky's essays have been cited as examples of good writing "about their insular world in a way that wins the respect of their colleagues and the attention of outsiders."[2]
Spolsky's example
The programming practice that Spolsky used to make his point was repeated concatenation of null-terminated character arrays ("strings").[1]
The first step in every implementation of the standard C library function for concatenating strings is determining the length of the string being appended to by checking each character in the array, starting from the beginning, to see if it is the terminating null character. In subsequent steps, another string is then copied to the end of the first string, so effectively concatenating the two. At the end of the concatenation, the length of the combined string is discarded upon return to the calling code.
In Spolsky's example, the "Schlemiels" occur when multiple strings are being concatenated together:
strcat( buffer, "John" );
/* Here, the string "John" is appended to the buffer */strcat( buffer, "Paul" );
/* Now the string "Paul" is appended to that */strcat( buffer, "George" );
/* ... and the string "George" is appended to that */strcat( buffer, "Ringo" );
/* ... and the string "Ringo" is appended to that */
After Paul is finished appending to John, the length of "JohnPaul" (or, more precisely, the position of the terminating null character) is known within the scope of strcat()
but is discarded upon its return to the point after Paul and before George. Afterwards, when strcat()
is told to append George to "JohnPaul", strcat()
starts at the very first character of the array (which is 'J') all over again just to find the terminating null character. Each subsequent call to strcat()
has to compute the length again before concatenating another name to the buffer
.
Analogous to Schlemiel's not carrying the paint-bucket (or the string's length) with him, all the subsequent strcat()
s have to again "walk" the length of the string to determine where the second string should be copied. As more data is added to buffer
, that terminating null character also gets farther away from the beginning with each call to strcat()
, meaning more checks must be taken to find that character and subsequent calls are increasingly slower—just as "Schlemiel's" path to his bucket keeps getting longer.
The problems illustrated by Spolsky's example are not noticed by a programmer who is using a high level language and has little or no knowledge of its underlying principles and functions. "Some of the biggest mistakes people make even at the highest architectural levels come from having a weak or broken understanding of a few simple things at the very lowest levels."[1]
References
- ^ a b c d Spolsky, Joel (December 11, 2001), Back to Basics, Joel on Software, joelonsoftware.com.
- ^ Rosenberg, Scott (December 9, 2004), The Shlemiel way of software, salon.com.