Cryptographic hash function

A cryptographic hash function is a transformation that takes an input (or 'message') and returns a fixed-size string, which is called the hash value (sometimes called a message digest, a digital fingerprint, a digest or a checksum).
The ideal hash function has three main properties:
- It is extremely easy to calculate a hash for any given data.
- It is extremely computationally difficult to calculate a text that has a given hash.
- It is extremely unlikely that two slightly different messages will have the same hash.
Functions with these properties are used as hash functions for a variety of purposes, not only in cryptography. Practical applications include message integrity checks, digital signatures, authentication, and various information security applications.
A hash function takes a string of any length as input and produces a fixed length string which acts as a kind of "signature" for the data provided. In this way, a person knowing the "hash value" is unable to know the original message, but only the person who knows the original message can prove the "hash value" is created from that message.
A cryptographic hash function should behave as much as possible like a random function while still being deterministic and efficiently computable. A cryptographic hash function is considered "insecure" from a cryptographic point of view, if either of the following is computationally feasible:
- Finding a (previously unseen) message that matches a given digest.
- Finding "collisions", in which two different messages have the same message digest.
An attacker who can find any of the above computations can use them to substitute an authorized message with an unauthorized one.
Ideally, it should be impossible to find two different messages whose digests ("hash values") are similar; nor would one want an attacker to be able to learn anything useful about a message given only its digest. Of course the attacker learns at least one piece of information, the digest itself, by which the attacker can recognise if the same message occurred (repeated) again.
In various standards and applications, the two most commonly used hash functions are MD5 and SHA-1.
In 2005, security defects were identified showing that a possible mathematical weakness might exist, like attacks, and recommending a stronger hash function.
In 2007 the National Institute of Standards and Technology announced a contest to design a hash function which will be given the name SHA-3 and be the subject of a FIPS standard.[1]
Other pages
Further reading
- Bruce Schneier. Applied Cryptography. John Wiley & Sons, 1996. ISBN 0-471-11709-9.
References
Other websites
- Hash'em all! – free online text and file hashing with different algorithms
- The Hash function lounge – a list of hash functions and known attacks
- Hash functions: Theory, attacks, and applications – a survey by Ilya Mironov (Microsoft Research)
- Helger Lipmaa's links on hash functions
- Diagrams explaining cryptographic hash functions
- An Illustrated Guide to Cryptographic Hashes by Steve Friedl
- Cryptanalysis of MD5 and SHA: Time for a New Standard by Bruce Schneier
- Hash collision Q&A
- Attacking hash functions by poisoned messages (construction of multiple sensible Postscript messages with the same hash function)
- What is a hash function? from RSA Laboratories
- Password Hashing in PHP by James McGlinn at the PHP Security Consortium
- The code monkey's guide to cryptographic hashes by Val Henson, "in language that any programmer (and even some managers) can understand."
- File Hash for Windows with various algorithms