Jump to content

Judy array

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jason Quinn (talk | contribs) at 21:32, 20 January 2015 (Terminology: removed close paraphrasing copyright violation and tried to reword and used previously uncited source as reference for definitions). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science and software engineering, a Judy array is a data structure that has high performance, low memory usage and implements an associative array. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing and looking up values using integer or string keys. The key benefits of using a Judy array is its scalability, high performance, memory efficiency and ease of use.[1]

Judy arrays are both speed- and memory-efficient [clarification needed], with no tuning or configuration required and therefore they can sometime replace common in-memory dictionary implementations (like red-black trees or hash tables) and work better with very large data sets[dubiousdiscuss][citation needed].

Roughly speaking, Judy arrays are highly optimised 256-ary radix trees.[2] Judy arrays use over 20 different compression techniques on trie nodes to reduce memory usage.

The Judy array was invented by Douglas Baskins and named after his sister.[3]

Terminology

  1. Expanse is a range of possible keys, e.g. 200...300, etc.[4]
  2. Population is the expanse's total number of keys, e.g. a population of 5 could be the keys 200, 360, 400, 512, and 720[4]
  3. Density equals Population/Expanse and is an estimate of spareness[4]

Benefits

Memory allocation

Judy arrays are dynamic and can grow or shrink as elements are added to, or removed from, the array. The maximum size of a Judy array is bounded by machine memory.[5] The memory used by Judy arrays is nearly proportional to the number of elements (population) in the Judy array.

Speed

Judy arrays are designed to keep the number of processor cache-line fills as low as possible, and the algorithm is internally complex in an attempt to satisfy this goal as often as possible. Due to these cache optimizations, Judy arrays are fast, sometimes even faster than a hash table, especially for very big datasets. Despite Judy arrays being a type of trie, they consume much less memory than hash tables. Also because a Judy array is a trie, it is possible to do an ordered sequential traversal of keys, which is not possible in hash tables.

Drawbacks

The HP (SourceForge) implementation of Judy arrays appears to be the subject of US patent 6735595.[6]

References

  1. ^ http://packages.debian.org/wheezy/libjudy-dev
  2. ^ Alan Silverstein, "Judy IV Shop Manual", 2002
  3. ^ http://judy.sourceforge.net/
  4. ^ a b c Baskins, Doug (October 16, 2001). "A 10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST".
  5. ^ Advances in databases: concepts, systems and applications : By Kotagiri Ramamohanarao
  6. ^ http://www.google.com/patents/US6735595