Security of cryptographic hash functions

In cryptography, we can divide cryptographic hash functions into two main categories. In the first category are those functions, where their security is based on rigorous mathematical prove, complexity theory and formal reduction. These functions are called Provably Secure Cryptographic Hash Functions. To construct such a function is very difficult and only a few examples were introduced. Their practical use is limited.

In the second category are functions that were or are believed to be hard to break, but no such formal prove was given. Almost all widely-spread hash functions fall in this category. Some of these functions that are already broken and are no longer in use.

Types of Security of Hash Functions

Generally, the basic security of cryptographic hash functions can be seen from three different angles: pre-image resistance, second pre-image resistance and collision resistance.

Pre-image resistance: given a hash h it should be hard to find any message m such that h = hash(m). This concept is related to that of one way function. Functions that lack this property are vulnerable to pre-image attacks.
Second pre-image resistance: given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2). This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to second pre-image attacks.
Collision resistance: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2). Such a pair is called a (cryptographic) hash collision, and this property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance, otherwise collisions may be found by a birthday attack.

The Meaning of "Hard"

The basic question is the meaning of "hard". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important."

The second approach is theoretical and is based on the complexity theory. If problem A is hard, there exists a formal security reduction from a problem which is widely supposed to be unsolvable in polynomial time, such as integer factorization problem or discrete logarithm problem.

Classical Hash Functions - Practical Approach to Security

Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various bitwise operations (e.g. rotations), modular additions and compression functions are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, SHA-1, was shown to be less secure than its length suggested: collisions could be found in only 2^{69} tests, rather than the brute-force number of 2^{80}. The search for a "good: hash function has thus become a hot topic.

In other words, most of the hash functions in use nowadays, are not provable collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing algorithms, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is MD5.

Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a few dozen bits to the latter.

Provable Secure Hash Functions

In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions is as hard as breaking the underlying problem. This gives much stronger security than just relying on complex mixing of bits as in the classical approach.

A cryptographic hash has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time.

It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means, that finding collisions cannot be easier than solving P.

Hash functions with the proof of their security are based on mathematical functions.

Hard problems

Examples of problems, that are assumed to be unsolvable in polynomial time

Downsides of Provable Approach

Construct a hash functions with provable security is much more difficult than using a intuitive approach where we just hope that the complexity of hashing algorithm is high enough to prevent adversary from finding collisions.
Current collision-resistant hash algorithms that have provable security reductions are too inefficient to be used in practice.

Example of (unpractical) Provable Hash Function

Hash(m) = x^m mod n where n is hard to factor composite number, and x is some prespecified base value. A collision x^m1 congruent to x^m2 reveals a multiple m1 - m2 of the order of x. Such information can be used to factor n in polynomial time assuming certain properties of x.

But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo n per message-bit.

More practical Provable Hash Functions

VSH - Very Smooth Hash function - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number (this is assumed to be as hard as integer factorization).
ECOH - Elliptic Curve Only hash function
FSB - Fast Syndrome-Based hash function
SWIFT - Software Implemented Fault Tolerance
Chaum, van Heijst, Pfitzmann hash function
Knapsack-based hash functions
The Zémor-Tillich hash function