Jump to content

Sparse binary polynomial hashing

From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sparse binary polynomial hashing (SBPH) is a generalization of Bayesian spam filtering that can match mutating phrases as well as single words.

SBPH is a way of generating a large number of features from an incoming text automatically, and then using statistics to determine the weights for each of those features in terms of their predictive values for spam/nonspam evaluation.

  • A paper on the subject as it relates to spam (some article text comes from this document, which is under the GFDL)
  • Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press. 2005. p. 108. ISBN 978-1-59327-052-0.