Template talk:Section sizes
| This is the talk page for discussing Section sizes and anything related to its purposes and tasks. |
|
| Archives: 1Auto-archiving period: 3 months |
Exclude references section in articles with list-defined refs
[edit]On articles such as Tapir!, which have list-defined references, the References section is far larger than other sections simply because it contains the wikitext for all references, which makes the fill colour for other sections almost indistinguishable in some cases (because the references section is so much larger). Would it be possible to detect if an article has the {{Use list-defined references}} tag (or have a parameter in this template for it) and ignore the size of the references section if so? Suntooooth, it/he (talk/contribs) 17:08, 3 January 2025 (UTC)
Added prose size columns
[edit]Following several requests from this talk page (from Chipmunkdavis, Tpbradbury, Femke and others), I added the prose size count to this template. The modified code is at Module:Sandbox/Ita140188/Section sizes. Here is an example:
If there is consensus to add this, I can move the new code to this module. Let me know what you think. --Ita140188 (talk) 14:09, 28 August 2025 (UTC)
- Thanks, looking through the example it seems like it would be really helpful. It's easy to see how the byte size and prose don't correlate very strongly, 7,338->6,317 in the same article as 6,347->3,494. CMD (talk) 14:23, 28 August 2025 (UTC)
- Thanks for the work! Prose size is still quite a technical quantity. Is it possible to have an option to only display word count? Or would that be too difficult to implement? —Femke 🐦 (talk) 14:39, 28 August 2025 (UTC)
- That's a good idea, and it could be added later. It should not be too hard. I implemented the prose size as this was the original request. Should I update this with the prose size for now, and add the word count later? Ita140188 (talk) 12:16, 29 August 2025 (UTC)
- ..Or should I change this now to display the word count only? This would be very easy to do (no need to add any extra fields) Ita140188 (talk) 12:18, 29 August 2025 (UTC)
- Here is the version with word count instead of bytes (I also changed the colors to avoid confusion):
- Thanks for the work! Prose size is still quite a technical quantity. Is it possible to have an option to only display word count? Or would that be too difficult to implement? —Femke 🐦 (talk) 14:39, 28 August 2025 (UTC)
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- --Ita140188 (talk) 12:34, 29 August 2025 (UTC)
- thanks very much this is really useful, pls switch on in the template? would it be possible to have the option in the code of switching off the byte count as it's often less useful? Tom B (talk) 22:33, 29 August 2025 (UTC)
- This is fantastic, thanks Ita14088. I'd change the heading to say "Word count" rather than "Prose size", as that's the standard term for this. I'd love for the option to turn off byte count, or even for the default to be word counts only! —Femke 🐦 (talk) 09:06, 30 August 2025 (UTC)
- Hi @Ita140188, cc @Femke, i've noticed a discrepancy. The honours article you're using is a list article. The page size tools says, "Prose size (text only): 3379 B (496 words) "readable prose size"" i.e. about 500 words. But your tool says 35,000 words as it counts all the words in the bullets. There will be a smaller discrepancy for non-list articles. As the section sizes template is mainly used to manage readable prose size, it really needs to be brought it into alignment with the existing page size tool please? For the same reason please add a total that excludes the word count in references, which aren't included in prose size? Tom B (talk) 09:25, 30 August 2025 (UTC)
- Updated. There is still ~100 words discrepancy, I'm not sure where it's from, but I think it's acceptable for this use case. As for switching off the bytes count, I don't think it's worth the effort, you can just ignore it if not useful. It may be useful to other users. And for changing the heading to "word count", the problem is that it's really only counting prose words, without lists, captions, etc. Leaving only "word count" would be confusing I think. Ita140188 (talk) 01:54, 31 August 2025 (UTC)
- @Ita140188 This change seems to have caused pages to return generic script errors instead of the correct error messages. I believe I fixed it here, but you might want to take a closer look. --Ahecht (TALK
PAGE) 01:34, 4 September 2025 (UTC)- Thank you for catching this. Can you point me to examples of this happening? Ita140188 (talk) 03:13, 4 September 2025 (UTC)
- @Ita140188 There were about 300 pages added to Category:Pages with script errors, but I cleaned them up after making that change above. I also discovered that your version of the script was calculating the prose sizes even in the section_size_get() function, which doesn't actually use the prose size in any way. This increased its runtime by a factor of two and caused pages like Wikipedia:WikiProject Countries/Section size/Africa to fail. I refactored the code to not get the prose size in that function, and added a
|getprose=parameter that can be used to override fetching the prose for the size() function. --Ahecht (TALK
PAGE) 04:06, 4 September 2025 (UTC)- Thank you so much. Sorry I was not aware of this use case for this module. I should have investigated better. Ita140188 (talk) 04:25, 4 September 2025 (UTC)
- @Ita140188 There were about 300 pages added to Category:Pages with script errors, but I cleaned them up after making that change above. I also discovered that your version of the script was calculating the prose sizes even in the section_size_get() function, which doesn't actually use the prose size in any way. This increased its runtime by a factor of two and caused pages like Wikipedia:WikiProject Countries/Section size/Africa to fail. I refactored the code to not get the prose size in that function, and added a
- Thank you for catching this. Can you point me to examples of this happening? Ita140188 (talk) 03:13, 4 September 2025 (UTC)
- @Ita140188, thank you, the tool is getting closer and closer. But i found a discrepancy in the first article i tried, Korean War: section sizes says it has 11,000 words, but the page size tool on the article itself says 15,000. LeBron James has 15,100 and 16,300. Vietnam War has 10,600 and 14,300. Please have a look? Tom B (talk) 13:58, 5 September 2025 (UTC)
- Yeah I noticed these discrepancies but I'm not sure where they are coming from. We need to create some test articles and try to get as close as possible (or port the javascript code exactly to Lua), but unfortunately I have no time right now Ita140188 (talk) 14:11, 5 September 2025 (UTC)
- Ok @Ita140188. For example in Korean War, this section sizes tool says the lead is 190 prose size, Names section 130 and Background 2,000. I used Wordcounter and got: 550, 205 and 2,735. So the most obvious place to spot the error could be in the lead section. Maybe @Ahecht or @Femke, can see the error in the code please? Tom B (talk) 14:32, 5 September 2025 (UTC)
- @Ita140188, Tpbradbury: There was a bug that caused it to skip text if a self-closing reference (e.g.
<ref name="foo"/>) was used. I fixed that, bringing the count for the top section to 583 and the "Names" section to 182. The count in the "Names" section going to be lower than Wordcounter because the module is processing the unparsed text and excluding all templates, whereas pasting into Wordcounter includes all the transliterations produced by templates. The issue with the lead section will be harder to fix, as it looks like the module didn't correctly remove the infobox and is counting portions of it. I don't think that will be as straightforward to fix. --Ahecht (TALK
PAGE) 21:38, 5 September 2025 (UTC)- Fixed the issue with nested templates. Now it's 547 and 182 words. With the old code, if you had
{{A|{{B}} or {{C}}}}, the pattern{{.-}}was going to match{{A|{{B}}, leaving behindor {{C}}}}. I fixed it here by making sure we only remove templates without any nested ones --Ahecht (TALK
PAGE) 22:39, 5 September 2025 (UTC)- thanks very much, that looked like it closed the gap but there's an error now it says, "Lua error in Module:Section_sizes at line 131: assign to undeclared variable 'n'." Tom B (talk) 22:44, 5 September 2025 (UTC)
- @Tpbradbury
Fixed --Ahecht (TALK
PAGE) 00:32, 6 September 2025 (UTC)
- @Tpbradbury
- thanks very much, that looked like it closed the gap but there's an error now it says, "Lua error in Module:Section_sizes at line 131: assign to undeclared variable 'n'." Tom B (talk) 22:44, 5 September 2025 (UTC)
- Fixed the issue with nested templates. Now it's 547 and 182 words. With the old code, if you had
- @Ita140188, Tpbradbury: There was a bug that caused it to skip text if a self-closing reference (e.g.
- Ok @Ita140188. For example in Korean War, this section sizes tool says the lead is 190 prose size, Names section 130 and Background 2,000. I used Wordcounter and got: 550, 205 and 2,735. So the most obvious place to spot the error could be in the lead section. Maybe @Ahecht or @Femke, can see the error in the code please? Tom B (talk) 14:32, 5 September 2025 (UTC)
- Yeah I noticed these discrepancies but I'm not sure where they are coming from. We need to create some test articles and try to get as close as possible (or port the javascript code exactly to Lua), but unfortunately I have no time right now Ita140188 (talk) 14:11, 5 September 2025 (UTC)
- @Ita140188 This change seems to have caused pages to return generic script errors instead of the correct error messages. I believe I fixed it here, but you might want to take a closer look. --Ahecht (TALK
- Updated. There is still ~100 words discrepancy, I'm not sure where it's from, but I think it's acceptable for this use case. As for switching off the bytes count, I don't think it's worth the effort, you can just ignore it if not useful. It may be useful to other users. And for changing the heading to "word count", the problem is that it's really only counting prose words, without lists, captions, etc. Leaving only "word count" would be confusing I think. Ita140188 (talk) 01:54, 31 August 2025 (UTC)
- Thanks, a useful and welcome expansion. Cheers, · · · Peter Southwood (talk): 13:17, 5 September 2025 (UTC)
- thanks very much this is really useful, pls switch on in the template? would it be possible to have the option in the code of switching off the byte count as it's often less useful? Tom B (talk) 22:33, 29 August 2025 (UTC)
- --Ita140188 (talk) 12:34, 29 August 2025 (UTC)
- Test cases, please; looks great, but there is no hurry; let's vet this first. Mathglot (talk) 02:08, 6 September 2025 (UTC)
- I just noticed the addition of this on WT:RSN and the prose numbers don't look right. Am I missing something? -- LCU ActivelyDisinterested «@» °∆t° 21:51, 6 September 2025 (UTC)
- As an examples the byte count for WP:RSN#this section on RSN is 11,596 but it's giving the word count as 21, which cant be right unless it's not counting what I expect it to be counting. -- LCU ActivelyDisinterested «@» °∆t° 21:57, 6 September 2025 (UTC)
- This tool's aim is to replicate the count in MediaWiki:Gadget-Prosesize.js, which gives approximately the same count for words in that page (2508 vs. 2345 for this module). The count is meant to only measure prose, excluding lists, text in templates, etc. so not necessarily useful for non-mainspace articles Ita140188 (talk) 14:13, 7 September 2025 (UTC)
- That's all understood, but it's also very wrong. -- LCU ActivelyDisinterested «@» °∆t° 14:56, 7 September 2025 (UTC)
- If it's not meant for use outside mainspace is there someway to suppress it? Rather than having very incorrect numbers displayed. -- LCU ActivelyDisinterested «@» °∆t° 14:57, 7 September 2025 (UTC)
- Seriously the figures at WT:RSN are just nonsense, and are just a distraction. Having a way of only showing btne byte size in this situation would be very helpful. -- LCU ActivelyDisinterested «@» °∆t° 18:04, 12 September 2025 (UTC)
- You can just ignore the prose size if you are not interested in that particular metric Ita140188 (talk) 01:19, 14 September 2025 (UTC)
- I know I can ignore it, but it takes up a large part of the chart and contains incorrect data. Suppressing it would therefore be useful in this situation. -- LCU ActivelyDisinterested «@» °∆t° 07:12, 14 September 2025 (UTC)
- It may not be relevant to your specific use case, but it's not incorrect. It is what it says it is: prose size. If the article is made up exclusively of lists or templates, that will not be counted, per definition (WP:RPS) Ita140188 (talk) 11:56, 14 September 2025 (UTC)
- The figures that are being displayed on the RSN (not RSP) talk page are wrong. Templates and lists can't make up for the discrepancy. -- LCU ActivelyDisinterested «@» °∆t° 18:25, 14 September 2025 (UTC)
- Can you point to a specific section that has a discrepancy that cannot be explained by removing lists, templates, tables, and other non-prose structures? I am open to fix any bug in the calculation of course. I think the calculation is mostly correct since it agrees with the prosesize tool within ~5%, but there still may be problems of course (and surely there are minor discrepancies since the two are not exactly the same). Ita140188 (talk) 06:58, 15 September 2025 (UTC)
- The first section on RSN has an apparey prose size of 33 words, this includes all subsections. Have a look at WP:RSN#RfC: Channel NewsAsia (CNA) and other Mediacorp-affiliated media and see if you agree. Or for a later example how about WP:RSN#RT exception that it states has a prose size of 90 words.
It may be that the prosesize tool would agree with these figures, but that just means it does work correctly in this case and so shouldn't be used (displayed) for this use case. -- LCU ActivelyDisinterested «@» °∆t° 13:34, 15 September 2025 (UTC)- Seems correct to me. The only thing that should be counted is the first line and the first signature, which indeed are exactly 33 words. The rest of the section are lists (they start with *) which should not be counted as prose Ita140188 (talk) 15:58, 15 September 2025 (UTC)
- In this case they obviously should be counted, or the details are just worthless. If that's outside the prosesize use case that's fine, but then it doesn't need to be shown in this case. -- LCU ActivelyDisinterested «@» °∆t° 16:55, 15 September 2025 (UTC)
- As I said before, this template is meant to be used on mainspace articles, not other namespaces, since the requirements are different. You are free to use it somewhere else, but that's not what it's designed to do. If you are interested to count words for specific purposes outside of this scope (for example, counting number of words in lists too), you can always create another module based on this one that count words according to your own definition. It should not be hard to modify the code in this way. Ita140188 (talk) 17:01, 15 September 2025 (UTC)
- Ok nevermind, it's as simple as using
|getprose=no[1], which was all it would have taken. Maybe it should be added to the documentation. -- LCU ActivelyDisinterested «@» °∆t° 17:02, 15 September 2025 (UTC)
- In this case they obviously should be counted, or the details are just worthless. If that's outside the prosesize use case that's fine, but then it doesn't need to be shown in this case. -- LCU ActivelyDisinterested «@» °∆t° 16:55, 15 September 2025 (UTC)
- Seems correct to me. The only thing that should be counted is the first line and the first signature, which indeed are exactly 33 words. The rest of the section are lists (they start with *) which should not be counted as prose Ita140188 (talk) 15:58, 15 September 2025 (UTC)
- The first section on RSN has an apparey prose size of 33 words, this includes all subsections. Have a look at WP:RSN#RfC: Channel NewsAsia (CNA) and other Mediacorp-affiliated media and see if you agree. Or for a later example how about WP:RSN#RT exception that it states has a prose size of 90 words.
- Can you point to a specific section that has a discrepancy that cannot be explained by removing lists, templates, tables, and other non-prose structures? I am open to fix any bug in the calculation of course. I think the calculation is mostly correct since it agrees with the prosesize tool within ~5%, but there still may be problems of course (and surely there are minor discrepancies since the two are not exactly the same). Ita140188 (talk) 06:58, 15 September 2025 (UTC)
- The figures that are being displayed on the RSN (not RSP) talk page are wrong. Templates and lists can't make up for the discrepancy. -- LCU ActivelyDisinterested «@» °∆t° 18:25, 14 September 2025 (UTC)
- It may not be relevant to your specific use case, but it's not incorrect. It is what it says it is: prose size. If the article is made up exclusively of lists or templates, that will not be counted, per definition (WP:RPS) Ita140188 (talk) 11:56, 14 September 2025 (UTC)
- I know I can ignore it, but it takes up a large part of the chart and contains incorrect data. Suppressing it would therefore be useful in this situation. -- LCU ActivelyDisinterested «@» °∆t° 07:12, 14 September 2025 (UTC)
- You can just ignore the prose size if you are not interested in that particular metric Ita140188 (talk) 01:19, 14 September 2025 (UTC)
- This tool's aim is to replicate the count in MediaWiki:Gadget-Prosesize.js, which gives approximately the same count for words in that page (2508 vs. 2345 for this module). The count is meant to only measure prose, excluding lists, text in templates, etc. so not necessarily useful for non-mainspace articles Ita140188 (talk) 14:13, 7 September 2025 (UTC)
- As an examples the byte count for WP:RSN#this section on RSN is 11,596 but it's giving the word count as 21, which cant be right unless it's not counting what I expect it to be counting. -- LCU ActivelyDisinterested «@» °∆t° 21:57, 6 September 2025 (UTC)
Edit request: roll back prose size changes
[edit]This edit request to Module:Section sizes has been answered. Set the |answered= parameter to no to reactivate your request. |
Please roll back to revision 1281963968 of 15:09, 23 March 2025. Prose size values are off by almost an order of magnitude. See for example
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
which returns 2,414 total prose words for the #18 longest page (601kb). Mathglot (talk) 07:54, 27 September 2025 (UTC)
- Not an error: this is in line with the prose size calculated with the prosesize tool, which returns 2311 words. Ita140188 (talk) 19:18, 29 September 2025 (UTC)
- Note that tables and lists are not included in the prose size, which is the reason why the byte count and prose word count are so different for this page. Ita140188 (talk) 19:19, 29 September 2025 (UTC)
- Yes, I know about non-inclusion of tables and lists in the tool, but that is not the point. The point is what is best for the users of this module, and this change makes it worse.
- The prose size tool calculates 2,414 words for the #18 longest page, whereas in reality it has 25,054 words (according to my text editor count, counting text on the rendered page and excluding ToC, appendixes, and all rendered citation words; but including tables, lists, and captions). That is greater than a 10x discrepancy. This makes the count from the tool useless (or worse: misleading). Compare that article with, say, Transport in Iran, which per my text editor is 3,541 actual words: any reasonable person eyeballing the Iran article and Foreign relations of the United Kingdom side-by-side might guess that the latter was about ten times bigger; a reasonable guess. In prose size, the tool counts Iran as 1,526 words, or 2/3 as large as the UK article; a perfectly ridiculous and misleading value, useless for any real-world calculation that has any basis in reality.
- The point, is, that the addition of prose size to {{section sizes}} is in no way an improvement to the section sizes tool, nor a help to editors wondering whether to split an article or not. Let's be honest here: labeling Foreign relations of the United Kingdom as 2,414 words long—whether you call them prose words, Klingon words, or something else—is not remotely helpful to anybody for anything, when there are short articles that are greater than that that need to be greatly expanded. Please understand that I am a big fan of the {{section sizes}} module, and have promoted its usage and in discussions, and would like to see it have a word count metric that has value for readers, but this is not it; it is the opposite: it is misleading, and it makes things worse. Please remove it, and get consensus for it first. I do not believe that the previous discussion represents consensus, unless participants can confirm that they are aware that prose size often counts small articles like the Iran article as larger than some of the top 25 longest pages in Wikipedia, and that they are okay with including prose size in {{section sizes}} with that understanding. Mathglot (talk) 20:18, 29 September 2025 (UTC)
French comment?
[edit]There's a comment on line 440 that appears to be in French (I think)? Would someone who understands French mind replacing that with its English translation? jcgoble3 (talk) 02:55, 28 October 2025 (UTC)