Jump to content

Content Addressable File Store

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Nicholsr (talk | contribs) at 21:21, 6 February 2006. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Content Addressable File Store (CAFS) was a 1982 device developed by International Computers Ltd that provided a disk storage with built-in search capability. The motivation for the device was the discrepancy between the high speed at which a disk could deliver data, and the much lower speed at which a general-purpose processor could filter the data looking for records that matched a search condition.

Development of CAFS started in ICL's Research and Advanced Development Centre under Gordon Scarrott in the late 1960s. In its initial form, the search logic was built in to the disk head. A standalone CAFS device was installed with a few customers, including BT Directory Enquiries, during the 1970s.

The device was subsequently productised and incorporated as a standard feature within ICL's 2900 series and Series 39 mainframes. By this stage, to reduce costs and to take advantage of increased hardware speeds, the search logic was incorporated into the disk controller. A query expressed in a high-level query language could be compiled into a search specification that was then sent to the disk controller for execution. Initially this capability was integrated into ICL's own Querymaster query language, which worked in conjunction with the IDMS database; subsequently it was integrated into the VME port of the Ingres relational database.

ICL also produced a version of CAFS for its DRS minicomputer range. Unlike its mainframe cousin, this was implemented using custom firmware running on an industry-standard microprocessor. The device eventually became obsolete as processor speeds increased, making it easy for a processor-based search to keep pace with data read from the disks.

The term content addressable is also being applied to systems that index files by a hash of their content. The EMC Centera is a commercial product that does this. They term their system content addressed storage or CAS.

The term content addressable network (CAN) is related except that the hash of the content is used to locate a node that hold the content. These are also called distributed hash tables.

Content Addressable Storage offers efficient access to fixed or archival data that should not change over time. Rather than treating data as a file and allowing a file system to handle data storage, data is annotated with metadata and treated as an "object," which is then assigned a unique designator (a content address) and sent to a permanent location on hard disk. Since each object has a unique identifier based solely on its content, it's impossible to store multiple copies of the same file, so duplicate data is eliminated and the total storage requirement is reduced.

The content address is typically based on a hash of the "content" of the file using a function such as md5. Hence the name. If the content changes, the content address changes.

See also