Web cache

A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of multimedia and other files can result in less overall delay when browsing the Web.^[1]^[2]

Parts of the system

Forward and reverse

A forward cache is a cache outside the web server's network, e.g. in the client's web browser, in an ISP, or within a corporate network.^[3] A network-aware forward cache only caches heavily accessed items.^[4] A proxy server sitting between the client and web server can evaluate HTTP headers and choose whether to store web content.

A reverse cache sits in front of one or more web servers, accelerating requests from the Internet and reducing peak server load. This is usually a content delivery network (CDN) that retains copies of web content at various points throughout a network.

HTTP options

The Hypertext Transfer Protocol (HTTP) defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.^[5] This is specified in the header of HTTP response messages from the server.

Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for.

Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The ETag (entity tag) mechanism also allows for both strong and weak validation.

Invalidation is usually a side effect of another request that passes through the cache. For example, if a URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated. Many CDNs and manufacturers of network equipment have replaced this standard HTTP cache control with dynamic caching.

Legality

In 1998, the DMCA added rules to the United States Code (17 U.S.C. §: 512) that exempts system operators from copyright liability for the purposes of caching.

Caching algorithms

Many admission and eviction algorithms have been designed over the years for web caches.

Because DRAM can provide high bandwidth and low latency, but has limited space, Adaptsize uses a Morkov model to decide the size threshold to admit objects into DRAM hot-object cache^[6].

LRB is an eviction algorithm that uses machine learning (gradient boosting tree) to estimate each object's future access time, because of the overhead of machine learning, it targets Content Delivery Network Caches^[7].

S3-FIFO is a new eviction algorithm designed in 2023^[8]. Compared to existing algorithms, which mostly build on LRU (least-recently-used), S3-FIFO is the first algorithm demonstrating that FIFO queues are sufficient to design efficient and scalalbe eviction algorithms. Compared to LRU and LRU-based algorithm, S3-FIFO can achieve 6x higher throughput. Besides, on web cache workloads, S3-FIFO achieves the lowest miss ratio among 11 state-of-the-art algorithms the authors compared with.

SIEVE is a simple eviction algorithm deisgned specifically for web caches, e.g., key-value cache and Content Delivery Networks. It is simpler than LRU, but achieves surprisingly lower miss ratios than LRU on par with state-of-the-art eviction algorithms. Moreover, on stationary skewed workloads, SIEVE is the best known algorithm^[9].

Server-side software

This is a list of server-side web caching software.

Name	Operating system			Forward mode	Reverse mode	License
Name	Windows	Unix-like	Other	Forward mode	Reverse mode	License
Apache HTTP Server	Yes	OS X, Linux, Unix, FreeBSD, Solaris, Novell NetWare	OS/2, TPF, OpenVMS, eComStation	Yes		Apache 2.0
aiScaler Dynamic Cache Control	No	Linux	No			Proprietary
ApplianSys CACHEbox	No	Linux	No			Proprietary
Blue Coat ProxySG	No	No	SGOS	Yes	Yes	Proprietary
Nginx	Yes	Linux, BSD, OS X, Solaris, AIX, HP-UX	Yes	Yes	Yes	2-clause BSD-like
Microsoft Forefront Threat Management Gateway	Yes	No	No	Yes	Yes	Proprietary
Polipo	Yes	OS X, Linux, OpenWrt, FreeBSD	?	Yes	Yes	MIT License
Squid	Yes	Linux	?	Yes	Yes	GPL
Traffic Server	?	Linux	?	Yes	Yes	Apache 2.0
Untangle	No	Linux	No	Yes	Yes	Proprietary
Varnish	No	Linux	No	Needs a VMOD	Yes	BSD
WinGate	Yes	No	No	Yes	Yes	Proprietary (Free for 8 users)
Nuster	No	Linux	No	Yes	Yes	GPL
McAfee Web Gateway	No	McAfee Linux Operating System	No	Yes	Yes	Proprietary

References

^ Fountis, Yorgos (4 May 2017). "How does the browser cache work?".
^ Messaoud, S.; Youssef, H. (2009). "An analytical model for the performance evaluation of stack-based Web cache replacement algorithms". International Journal of Communication Systems. 23: 1–22. doi:10.1002/dac.1036. S2CID 46507769.
^ Shinder, Thomas (2 September 2008). "Understanding Web Caching Concepts for the ISA Firewall". ISA Server. TechGenix Ltd. Archived from the original on 23 July 2011. Retrieved 27 February 2011.
^ Erman, Jeffrey; Gerber, Alexandre; Hajiaghayi, Mohammad T.; Pei, Dan; Spatscheck, Oliver (2008). "Network-Aware Forward Caching" (PDF). AT&T Labs: 291–300. CiteSeerX 10.1.1.159.1786. Archived from the original (PDF) on 1 April 2011. Retrieved 11 March 2019.
^ Kelly, Mike; Hausenblas, Michael. "Using HTTP Link: Header for Gateway Cache Invalidation" (PDF). WS-REST. p. 20. Archived from the original (PDF) on 10 July 2010. Retrieved 14 June 2013.
^ Berger, Daniel S.; Sitaraman, Ramesh K.; Harchol-Balter, Mor (2017). "{AdaptSize}: Orchestrating the Hot Object Memory Cache in a Content Delivery Network": 483–498. ISBN 978-1-931971-37-9. {{cite journal}}: Cite journal requires |journal= (help)
^ Song, Zhenyu; Berger, Daniel S.; Li, Kai; Lloyd, Wyatt (2020). "Learning Relaxed Belady for Content Distribution Network Caching": 529–544. ISBN 978-1-939133-13-7. {{cite journal}}: Cite journal requires |journal= (help)
^ Yang, Juncheng; Zhang, Yazhuo; Qiu, Ziyue; Yue, Yao; Vinayak, Rashmi (2023-10-23). "FIFO queues are all you need for cache eviction". Proceedings of the 29th Symposium on Operating Systems Principles. SOSP '23. New York, NY, USA: Association for Computing Machinery: 130–149. doi:10.1145/3600006.3613147. ISBN 979-8-4007-0229-7.
^ Zhang, Yazhuo; Yang, Juncheng; Yue, Yao; Vigfusson, Ymir; Rashmi, K. V. (2024). "{SIEVE} is Simpler than {LRU}: an Efficient {Turn-Key} Eviction Algorithm for Web Caches": 1229–1246. ISBN 978-1-939133-39-7. {{cite journal}}: Cite journal requires |journal= (help)