User:Atonix/sandbox
Part of a series on |
Probabilistic data structures |
---|
Random trees |
Related |
In computer science, a Flower filter is a space-efficient time-decaying approximate membership filter that is used to approximately determine if an element has recently been added to a set. This data structure is similar in construction to a Bloom filter; however, membership in the set is degraded over time and false negatives and false positives are possible. A Flower Filter remembers newer data with a higher probability and older data with a lower probability, with a low memory overhead. A common application of this data structure is to track large streams of recent user events and behavior.
Algorithm description
[edit]
s= array size n= number of hash functions
There are two sets of hash functions used. The fingerprint hash function, F, reduces the input element (of arbitrary size) to b bits uniformly. Then, there is a set of n position hash functions, H, that each map uniformly to indexes in the storage array.
An empty Flower filter is an array of size s, with all entries set to 0.
To add an element, it is fed to the fingerprint hash function, F, which reduces the element to b bits. Then, QQQ hash functions are randomly chosen from H. The element is fed to each of these QQQ hash functions to get QQQ array positions. Set all of these positions to F(n).
To query for an element, compute the fingerprint of the element, q = F(element). Compute positions for all hash functions in H, and check if the value in the storage array is equal to q. If more than ZZZ positions are equal to q, then assume the element is a member of the storage array.
Notice that over time, fingerprints overwrite each other, which causes membership to be degraded over time.
Probability of false positives
[edit]Comparison to Bloom filters
[edit]Other similar data structures
[edit]Implementation
[edit]Example
[edit]Often, it is valuable to collect a large event stream to improve user experience, and later determine if an event has occured recently. For example, if a user has previously performed a certain action, a web application may change to accomidate the user if it can determine that the action has recently been performed. Unfortunately, with a large web application servicing millions of users, this event stream may be impractical to index and search. A Flower filter can be used to quickly determine if a specific event has likely occured recently.