Talk:Boyer–Moore–Horspool algorithm

More Typical Wiki Crap

Well, we have an implentation which does not produce correct results using two compilers. And the original poster does not perform the shift correctly. Do not use this code.

151.196.6.139 (talk)noloader

cross-platform

I'm not native to the C programming language and I don't have time to start sifting through the code in this article to develop a pseudocode representation of this algorithm.

Can someone who knows C better than I please read through and put the code into something cross-platform?

--Oddb411 13:01, 17 September 2006 (UTC)[reply]

C is actually a cross-platform language when written properly. The sample I wrote is meant to be of that kind. I'll see if I can edit it to use less of the kind of syntax that is inherent to C to make it better understood for those who don't have experience of languages of C origin. --Bisqwit 21:00, 17 September 2006 (UTC)[reply]

variable names

Many programmers (Wiki:MeaningfulName) now recommend "Always, always, always use good, unabbreviated, correctly-spelled meaningful names." Is there some more meaningful name we could use instead of "hpos" ?

In the Boyer-Moore algorithm article, the "hpos" variable tracks the position where a copy of the needle might possibly start in the haystack, although we generally look at some later position in the haystack. (What would be a more meaningful name for this?)

In this Boyer–Moore–Horspool algorithm article, the "hpos" variable tracks the actual letter in the haystack that we look at. (What would be a more meaningful name for this?)

(The difference is precisely

   BMH_hpos = BM_hpos + npos

).

I think doing it the same way in both articles would be less confusing. Which way (the current BM way, or the current BMH way) would make clearer, easier-to-understand code?

--68.0.120.35 02:32, 25 March 2007 (UTC)[reply]

Possible miscalculation of comparisons

The article says "For instance a 32 byte needle ending in "z" searching through a 255 byte haystack which doesn't have a 'z' byte in it would take 3 byte comparisons."

I think it meant 7 byte comparisons, since each one skips 32 bytes until there's less that 32 bytes remaining.

Can anybody confirm? —Preceding unsigned comment added by 155.210.218.53 (talk) 18:21, 18 January 2008 (UTC)[reply]

Optimising the table size

I might be wrong, but it looks like space can be saved in the bad character skip table by using a hash of the character instead of its actual value. In the case of a collision, the result will just be a smaller shift.

Not a particularly useful idea when the table's only 256 chars long, but it would allow for much better storage requirements if you were using, say, a 32-bit character set. In a case like that, probably only a small fraction of the character set would be seen, and so chances of collision would be low for a reasonably sized hash. CountingPine (talk) 08:27, 17 July 2009 (UTC)[reply]

KMP

How is this related to KMP? If anything, the other heuristic in Boyer-Moore (which is not in this algorithm) is closely related to KMP's table (e.g. the compute_prefix code in http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm is exactly the pre-processing in KMP). —Preceding unsigned comment added by Raduberinde (talk • contribs) 23:24, 17 August 2010 (UTC)[reply]