Verifiable computing
Verifiable computing (or "verified computation" or "verified computing" or "delegating computation" or "checking computations") is enabling a computer to offload a computation to another computer and then check if the latter computer computed correctly. This problem was first introduced in 1991 by Babai, Fortnow, Levin, Szegedy.[1]
Motivation and overview
This classic problem is particularly relevant today, as much computation is now outsourced: it is performed by machines that are rented, remote, or both. For example, service providers (SPs) now offer storage, computation, managed desktops, and more. As a result, relatively weak devices (phones, tablets, laptops, and PCs) can run computations (storage, image processing, data analysis, video encoding, and so on) on banks of machines controlled by someone else.
This arrangement is known as cloud computing, and its promise is enormous. A lone graduate student with an intensive analysis of genome data can now rent a hundred computers for 12 hours for less than $200. And many companies now run their core computing tasks (websites, application logic, storage) on machines owned by SPs, which automatically replicate applications to meet demand. Without cloud computing, these examples would require buying hundreds of physical machines when demand spikes ... and then selling them back the next day.
But with this promise comes risk. SPs are complex and large-scale, making it unlikely that execution is always correct. Moreover, SPs do not necessarily have strong incentives to ensure correctness. Finally, SPs are black boxes, so faults—which can include misconfigurations, corruption of data in storage or transit, hardware problems, malicious operation, and more—are unlikely to be detectable. This raises a central question, which goes beyond cloud computing: How can we ever trust results computed by a third-party, or the integrity of data stored by such a party?
Verifiability via replication
A common answer is to replicate computations. However, replication assumes that failures are uncorrelated, which may not be a valid assumption: the hardware and software platforms in cloud computing are often homogeneous. Another answer is auditing—checking the responses in a small sample—but this assumes that incorrect outputs, if they occur, are relatively frequent. The largest distributed computing system (SETI@home) uses the latter approach. The SETI@home verification process involves one client machine and many worker machines. The client machine sends identical workunits to multiple computers (at least 2).
When not enough results are returned in a reasonable amount of time—due to machines accidentally turned off, communication breakdowns, etc. -- or the results do not agree—due to computation errors, cheating by submitting false data without actually doing the work, etc. -- then the client machine sends more identical workunits to other worker machines. Once a minimum quorum (often 2) of the results agree, then the client assumes those results (and other identical results for that workunit) are correct. The client grants credit to all machines that returned the correct results.
Proof-based verifiable computation
Given the previous approach's apparent drawback, what if the third party returned its results along with a proof that the results were computed correctly? And what if the proof were inexpensive to check, compared to the cost of redoing the computation? Then few assumptions would be needed about the kinds of faults that can occur: either the proof would check, or not. This vision is called proof-based verifiable computation, and the question now becomes: Can this vision be realized for a wide class of computations?
Deep results in complexity theory and cryptography tell us that in principle the answer is “yes.” Probabilistic proof systems—which include interactive proofs,[2][3] probabilistically checkable proofs,[4][5] efficient arguments (PCPs coupled with cryptographic commitments),[6][7] and Micali’s CS proofs.[8]—consist of two parties: a verifier and a prover. In these protocols, the prover can efficiently convince the verifier of a mathematical assertion. In fact, the acclaimed PCP theorem, together with refinements, implies that a verifier only has to check three randomly chosen bits in a suitably encoded proof!
Meanwhile, the claim “this program, when executed on this input, produces that output” can be represented as a mathematical assertion of the necessary form. The only requirement is the verifier knows the program, the input (or at least a digest, or fingerprint, of the input), and the purported output. And this requirement is met in many uses of outsourced computing; examples include MapReduce-style text processing, scientific computing and simulations, database queries, and Web request-response. Indeed, although the modern significance of PCPs lies elsewhere, an original motivation was verifying the correctness of remotely executed computations: the paper quoted in the introduction above was one of the seminal works that led to the PCP theorem.
However, for decades these approaches to verifiable computation were purely theoretical. Interactive protocols were prohibitive (exponential-time) for the prover and did not appear to save the verifier work. The proofs arising from the PCP theorem (despite asymptotic improvements) were so long and complicated that it would have taken thousands of years to generate and check them, and would have needed more storage bits than there are atoms in the universe.
But beginning around 2007, a number of theoretical works achieved results that were especially relevant to the problem of verifying cloud computations. Goldwasser et al., in their influential Muggles paper,[9] refocused the theory community’s attention on verifying outsourced computations, in the context of an interactive proof system that required only polynomial work from the prover, and that applied to computations expressed as certain kinds of circuits. Ishai et al.29 proposed a novel cryptographic commitment to an entire linear function, and used this primitive to apply simple PCP constructions to verifying general-purpose outsourced computations. A couple of years later, Gentry’s breakthrough protocol for fully homomorphic encryption (FHE) led to work (GGP[10]) on non-interactive protocols for general-purpose computations. These developments were exciting, but, as with the earlier work, implementations were thought to be out of the question. So the theory continued to remain theory—until recently.
Practical verifiable computing
The last few years have seen a number of projects overturn the conventional wisdom about the hopeless impracticality of proof-based verifiable computation. These projects aim squarely at building real systems based on the theory mentioned earlier, specifically PCPs and Muggles (FHE-based protocols still seem too expensive). The improvements over naive theoretical protocols are dramatic; it is not uncommon to read about factor-of-a-trillion speedups. The projects take different approaches, but broadly speaking, they apply both refinements of the theory and systems techniques. Some projects include a full pipeline: a programmer specifies a computation in a high-level language, and then a compiler (a) transforms the computation to the formalism that the verification machinery uses and (b) outputs executables that implement the verifier and prover. As a result, achieving verifiability is no harder for the programmer than writing the code in the first place.
Here are the four projects that are tackling practical verifiable computation.
- The Pepper project: The line of work on Pepper,[11] Ginger, and Zaatar has instantiated an argument system (an interactive protocol that assumes a computationally bounded prover) due to Ishai, Kushilevitz, and Ostrovsky (CCC 2007). Through theoretical and practical refinements, we have reduced the cost of this argument system by over 20 orders of magnitude. In another line of work, Allspice, they have built on an interactive proof system due to Goldwasser, Kalai, and Rothblum (STOC 2008), and refined and implemented by Cormode, Mitzenmacher, and Thaler (ITCS 2012). One of the chief advantages of this line of work is that it avoids expensive cryptographic operations. One might wonder: which protocol is “best”? Their experience has been that it depends on the computation. Accordingly, Allspice includes a compiler that takes as input a high-level programming language and performs static analysis to compile to the best available protocol for that computation. In the above works, the computational model does not support stateful computations, namely those that work over RAM, or computations for which the verifier does not have the full input. In Pantry, they overcome this limitation, and apply the verification machinery to MapReduce jobs, queries against remote databases, and applications for which the prover’s state is private.
- CMT: Refines a non-cryptographic interactive proof protocol. Can achieve much lower overhead than the other approaches but requires that computations be naturally parallelizable.
- GGPR and Pinocchio: Introduces the remarkable QAP/QSP encoding, which is borrowed by many of the projects in the area. Pinocchio and Zaatar have similar performance except that Pinocchio has much better amortization behavior.
- BCTV, BCGTV, and TinyRAM: Introduces an innovative program representation and combines it with an optimized implementation of Pinocchio's verification engine. This approach brings excellent programmability (enabling support of the full C programming language) and the best amortization behavior in the literature but overhead is very high. Compared to Buffet, for example, programmability is slightly better but performance is orders of magnitude worse.
References
- ^ Babai, Fortnow, Levin, Szegedy (1991). "Checking computations in polylogarithmic time". Symposium on Theory of Computing (STOC).
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ L. Babai (1985). "Trading group theory for randomness." In Proceedings of the ACM Symposium on Theory of Computing (STOC), New York, NY, USA, pp. 421–429
- ^ S. Goldwasser, S. Micali, and C. Rackoff (1989). "The knowledge complexity of interactive proof-systems." SIAM Journal on Computing, 18(1), pp. 186–208
- ^ S. Arora and S. Safra (1998). "Probabilistically checkable proofs: a new characterization of NP." Journal of the ACM, 45, pp.501-555
- ^ O. Goldreich (1998). Modern Cryptography, Probabilistic Proofs and Pseudorandomness. Algorithms and Combinatorics series, 17, Springer
- ^ J. Kilian (1992). "A note on efficient zero-knowledge proofs and arguments (extended abstract)." In Proceedings of the ACM Symposium on Theory of Computing (STOC)
- ^ J. Kilian (1995). "Improved efficient arguments (preliminary version)." In Proceedings of Crypto, London, UK, pp. 311–324. Springer-Verlag
- ^ S. Micali (1994). "CS proofs (extended abstract)." In Proceedings of the IEEE Symposium on Foundations of Computer Science, pp. 436-453.
- ^ Goldwasser, Kalai, Rothblum (2008). "Delegating Computation: Interactive Proofs for Muggles". Symposium on Theory of Computing.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ Gennaro, Rosario; Gentry, Craig; Parno, Bryan (31 August 2010). Non-Interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers (PDF). CRYPTO 2010.
{{cite conference}}
: External link in
(help); Unknown parameter|conferenceurl=
|conferenceurl=
ignored (|conference-url=
suggested) (help) - ^ Setty, Srinath; McPherson, Richard; Blumberg, Andrew J.; Walfish, Michael (February 2012). Making Argument Systems for Outsourced Computation Practical (Sometimes). Network & Distributed System Security Symposium (NDSS) 2012.