Module talk:RSPTest/sandbox

Some notes for the test data while implementing

Noting these comments:

There is no RSPSHORTCUT information

There is no "blacklisted" indicator for the status

There is no name sort value

Only the first entry for RSPUSES is present in the data

Aidan9382, much appreciated. I'm not surprised, but also not (overly) concerned at this point, as this is all in the very initial stages, in what I'd call, proof-of-concept stage. The parsed data we got from Audiodude's offline python parser, which I then converted into Lua-accessible data via regex (not finding a native module for that here; is there one?) could have oversights or errors in the parse step, the conversion step, or the module, which will all eventually get resolved via iteration as we discover them. At this point, I'm just hacking together something that looks like it reproduces a table without those pesky templates that blew PEIS out of the water, and if they lack certain fields, for the moment that is tolerable, but it is great to know that, because I hadn't discovered it yet, and at some point, we will need to deal with them. Your comments are very welcome as we go! Mathglot (talk) 09:34, 14 October 2025 (UTC)[reply]

Yeah, I just wanted to make sure any missing parameters were noted since that means it's harder to generate implementation examples for them. Also, interestingly, regarding my RSPSHORTCUT point - it doesn't seem to exist in the data for MAILONSUNDAY, but does for MASHABLE? No clue what's happened there, just interesting to note. I'll keep trying to implement some of the details, hopefully the edit conflicts won't be too annoying to deal with.

which I then converted into Lua-accessible data via regex (not finding a native module for that here; is there one?) - Not sure quite what you mean, but parsing normal JSON data is something scribunto can do - see mw.loadJsonData Aidan9382 _(talk) 09:54, 14 October 2025 (UTC)[reply]

Great news! This will make our lives easier. Mathglot (talk) 22:33, 14 October 2025 (UTC)[reply]

PEIS difference

@Mathglot: I've been working on a row-builder focused version in a personal sandbox of mine (so that I don't disrupt any work here) so that I could get the earliest views of what difference it would make in PEIS. It's by no means done, but for comparison, I have a test case at User:Aidan9382/sandbox3 (invoked at User:Aidan9382/sandbox2 for raw PEIS testing), and so far the difference between having the original raw table and having the module recreation (which isn't finished yet - it abuses preprocess in a few places for simplicity, which is bad for PEIS) is ~15% (2598 bytes raw VS 2172 bytes row builder). Is what I've implemented there the kind of solution you were looking for, and do these initial numbers look significant enough to be useful? Aidan9382 _(talk) 21:13, 14 October 2025 (UTC)[reply]

Oh, this sounds great, thank you! I am about to go look at those, but I didn't want to delay a moment before notifying WhatamIdoing (whose time zone I am not aware of) and who will be very interested in this I'm sure, as it is a key piece of information needed to write a good Rfc question which we hope to do Friday. Thanks, Mathglot (talk) 22:43, 14 October 2025 (UTC)[reply]

Yes, thanks for the ping. (I'm in California.) WhatamIdoing (talk) 01:53, 15 October 2025 (UTC)[reply]

Housekeeping note: I realize I started a Talk page on a sandbox; this should really be at Module talk:RSPTest instead, and this page turned into a redirect, and at some point I'll do that. Mathglot (talk) 22:43, 14 October 2025 (UTC) [reply]

The numbers – your module and testcases sight unseen – will be very useful in contextualizing the Rfc question for community input, regardless how they turn out, as they will be predictors of how long that solution might last, before the table would have to be revisited once again for being near PEIS capacity. So it is useful, regardless of what value comes out of the process. Mathglot (talk) 22:47, 14 October 2025 (UTC)[reply]

@Mathglot and WhatamIdoing: I have probably the best estimate I can give for what a row builder would accomplish. Using 7 semi-randomly picked entries from table 5, the PEIS reduction between raw table (aka what is currently being used) and a mostly-complete row builder is roughly 18% (38,621 vs 31,656). Here are my test pages for the raw and row-built versions if you want to take a look yourself. Aidan9382 _(talk) 08:34, 15 October 2025 (UTC)[reply]

Aaron Liu, please see the wikicode at User:Aidan9382/sandbox3. As near as I can tell from your earlier comments, this is the kind of row builder you had in mind, for example in this comment. Does this and Aidan's results jibe with your expectations and/or testing?

Aidan9382, do you think it matters wrt PEIS, whether the table is generated by N stacked invocations to a module that builds one row per invocation, which seems to be your approach in User:Aidan9382/sandbox3, or via a single module call that loops within the module to generate a table based on a Lua data structure (or JSON structure, using the loader you mentioned earlier) that contains N rows, as in the prototype at Module:RSPTest? One thing that seems clear, and not surprising, is that avoiding a template call is key; for example, invoking the RSPTest module via {{#invoke:RSPTest|main}} with 61 rows from data_5 has a PEIS of 33,626 bytes, whereas invoking it via template {{RSPTest}} is twice as much with 67,679. (edit conflict) Mathglot (talk) 10:00, 15 October 2025 (UTC)[reply]

My quick testing suggests that there is no difference between a fully in-module approach and standard usage, both with two at once and two seperate. That's because the input for every parameter as I have it set up is just raw wikitext, so the inputs themselves won't inflate PEIS, and the only possible save (that I've noticed so far) from having one module run produce multiple rows is that we could avoid emitting the templatestyles for RSPSHORTCUT every time, which is only a 44 byte save per use. Aidan9382 _(talk) 10:47, 15 October 2025 (UTC)[reply]

@Mathglot: I did just have an alternative idea which is kind of a middle ground between the ideas, and I'm curious if it'd be relevant/usable here. I tried testing what would happen if I replaced all of the {{WP:RSPXYZ}} templates with pure lua implementations but left the table in a raw format, and got some pretty nice numbers (~27-28% improvement depending on how far you take it). Exact PEIS byte diff in my testing is 38,621 (current) vs 31,644 (row builder) vs 28,040 (lua replacement) vs 27,786 (lua replacement, even for listing discussions). The third option of these four would be by far the easiest to implement over the current table, if that's at all important for this. Aidan9382 _(talk) 12:46, 16 October 2025 (UTC)[reply]

Hi, Aidan. This is interesting about the new test, but I just wanted to clarify: does this mean that we would basically take an editing pass through the entire table, converting each RSP template call in the table into a module invocation, which I guess could be one module, with one entry point per previous template? So my understanding is that we'd convert {{WP:RSPSHORTCUT|WP:BBC}} ⟶ {{#invoke: RSProw|shortcut|WP:BBC}} and {{WP:RSPSTATUS|gr}} ⟶ {{#invoke: RSProw|status|gr}}, and so on for all the other RSP templates, and that "left the table in a raw format" means, "apart from these template-to-module replacements, left the rest of the table as it is now" ? And that that's the one with the 28,040, vs 38.621 (current) ? Mathglot (talk) 17:10, 16 October 2025 (UTC)[reply]

Yes, correct. You can check the source of the 28,040 permalink to see what exactly that implementation looks like. Aidan9382 _(talk) 17:20, 16 October 2025 (UTC)[reply]

And that improvement %age value comes from 100-(100*28,040/38,621) ⟶ 27.4 so that projecting the %age into the future: if we have 500 rows @99% PEIS now, this might gain us 500*27.4% ⟶ 137 more rows before we hit 99% again, roughly? (edit conflict) Mathglot (talk) 17:22, 16 October 2025 (UTC)[reply]

I think the maths might be the other way around, though frankly I never remember which way is correct. If I do the calculation in some other ways (e.g. 500*38621/28040), I get an estimated 188 extra rows on top of the current 500 instead of 137. Aidan9382 _(talk) 17:31, 16 October 2025 (UTC)[reply]

Oops! Thanks for that correction! Mathglot (talk) 18:04, 16 October 2025 (UTC)[reply]