Jump to content

Schlemiel the Painter's algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Wookie2u (talk | contribs) at 05:31, 11 May 2008 (Supplied a "stub" article for Schlemiel the painters algorithm... which is (IMHO) one of the fifty-five million things that every software engineer should be on the lookout for, no excuses.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Introduction

In computer science "Schlemiel The Painter's Algorithm" is a name for any algorithm which repeatedly reprocesses content. This reprocessing is extremely inefficient. The problem most commonly appears in string and buffer processing code.

So who was Schlemiel?

So who was Schlemiel The Painter? Well, Schlemiel got a job painting the white line down the middle of the road. On his first day Schlemiel painted miles of white line, and his boss rejoiced. On his second day, Schlemiel was "a bit too slow". By the end of the third day Schlemiel was going nowhere. On the morning of Schlemiel's fourth day the boss called Schlemiel into the office for some "counselling".

The boss said "Schlemiel, I think you are good worker. You are bright and you work hard. So why is that the longer you work the less you get done?". "Well" Schlemiel repied "yesterday afternoon it took me two hours to run from my paint tin to the end of the line, and two hours to run back again to replenish my brush, so I knocked off and went to the pub." The boss thought for a few seconds, and managed to say (quit calmly) "Ah Schlemiel, you need to take your paint with you."

The Problem

Repeated calls to ANSI-C's strcat function is a classic example of Schlemiel's algorithm. An implementation of strcat might look something like this:

char *strcat(char *dest, const char *src) {
  // remember the start of the destination string
  char *startOfDest = dest;
  // make dest point at the null terminator at the end of the destination string
  while(*dest) {                                         // <-- EXPENSIVE
    dest++; 
  }
  // append the source string to the destination string
  while(*src) {
    dest++ = src++;
  }
  // null terminate the destination string
  dest = '\0';
  // return the start of the destination string
  return startOfDest;
}

The strcat function looks innocious enough, and it is, as long as you don't call it __repeatedly__ to build a large string.

So lets examine an example of how __not__ to use strcat:

char *testPerformanceOfStrcat() {
  // produces a 1Meg string of "word, word, word, ....";
  final int BUFFER_SIZE = 1024*1024;
  char word[] = "word"; 
  char buffer[BUFFER_SIZE] = "";
  while( strlen(buffer) < BUFFER_SIZE-strlen(word) ) {   // <-- LUDICROUSLY EXPENSIVE
    if(*buffer) strcat(buffer, ", ");                    // <-- LUDICROUSLY EXPENSIVE
    strcat(buffer, word);                                // <-- LUDICROUSLY EXPENSIVE
  }
}

The total cost of testPerformanceOfStrcat is:

  • NOTE: I am ignoring "ancillary expenses" like the cost of making a function call.
  • The number of times through the loop is int(1024*1024/6) = 174,762 // strlen("word, ")==6
  • The cost of strlen(buffer) in the while condition is fib(174,762) = 15,270,965,703
  • The cost of finding the end of the buffer (again) in strcat(buffer, ", ") is 15,270,965,703 (again).
  • The cost of finding the end of the buffer (again) in strcat(buffer, word) is 15,270,965,703 (again).
  • So the TOTAL COST is 15,270,965,710 * 3 + = 45,812,897,130 or about 45 billion clock ticks.
  • 45 billion cycles at 2 billion cycles per second (my CPU is 2.01GHz) is about 22 seconds.
  • Ouch !!!!! !!!!! !!!!! !!!!! !!

The Solution

Yep, Schlemiel would be well advised to "take your paint with you"... So how can we apply that principle to strcat? Easy, we just have to remember the end of the string, to avoid the cost of finding the end of the string every time we append to it. First lets write another version of strcat called szcat which returns the-new-end-of-the-destination-string (nb: the mnemonic sz denotes a null terminated string).

char *szcat(char *dest, const char *src) {
  // make dest point at the null terminator at the end of the destination string
  while(*dest) {                                         // <-- EXPENSIVE, SO AVOID RELYING IT (REPEATEDLY).
    dest++; 
  }
  // append the source string to the destination string
  while(*src) {
    dest++ = src++;
  }
  // null terminate the destination string
  dest = '\0';
  // return a pointer to the null terminator at the end of the destination string
  return dest;
}

Our new szcat function is basically the same, it just returns the-new-end-of-the-destination-string instead of the-start-of-the-destination-string... So what? Well so the next time we call szcat we can pass the-end-of-the-destination-string, so szcat doesn't have to waste time looping through the destination string in order to find it. Pretty clever huh!

Noew lets write another test harness, and then we'll examine szcat's (hopefully improved) performance characteristics.

char *testPerformanceOfSzcat() {
  // produces a 1Meg string of "word, word, word, ....";
  final int BUFFER_SIZE = 1024*1024;
  char word[] = "word"; 
  char buffer[BUFFER_SIZE] = "";
  char *last = buffer+BUFFER_SIZE-strlen(word)-strlen(", "); // <-- Do this once!
  char *endOfBuffer = szcat(buffer, word);                   // <-- Do this once!
  while( endOfBuffer < last ) {                              // <-- 15 billion times cheaper
    endOfBuffer = strcat(endOfBuffer, word);                 // <-- 30 billion times cheaper
  }
}

The total cost of testPerformanceOfStrcat is:

  • The number of times through the loop is still int(1024*1024/6) = 174,762 // strlen("word, ")==6
  • We've eliminated the overhead of finding the-end-of-the-destination-string in the while condition.... so we're down to one operation per loop = 174,762
  • We've eliminated the overhead of finding the-end-of-the-destination-string in strcat(endOfBuffer, word), which leaves is with just the cost of copying the string, which is 6 cycles per loop = 174,762 * 6 = 1,048,572
  • about 1.2 billion cycles at 2 billion cycles per second (my CPU speed is 2.01GHz) is about a tenth of a second.
  • Much better !

Conclusion

"Schlemiel The Painter" algorithms are a common cause of slow unscalable software implementations. "Take your paint with you" is up there in the panthion of things every budding computer scientist needs to know, along with [[KISS_principle|the KISS principle], [guts of character encoding], and Sunscreen.