Wikipedia:Bots/Requests for approval/DYK-Tools-Bot
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: RoySmith (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 00:35, Thursday, December 15, 2022 (UTC)
Function overview: A bot to assist in various tasks related to WP:DYK maintenance.
Automatic, Supervised, or Manual:Automatic
Programming language(s): Python
Source code available: https://github.com/roysmith/dyk-tools
Links to relevant discussions (where appropriate):
Edit period(s): Hourly
Estimated number of pages affected: Category:Pending DYK nominations currently has 268 entries
Namespace(s): Template
Exclusion compliant (Yes/No): No
Function details: This is a proposal for a new bot to help out at WP:DYK. A big part of the back-end work of DYK is building prep sets. Each set consists of 8 "hooks", which are chosen from those proposed in nominations. The selection of hooks needs to comply with an absurdly large number of rules. These rules include:
- The hook must be previously approved, indicated by a checkmark icon on the nomination template.
- Once approved, a hook can be unapproved by somebody raising an objection, requiring that it be re-approved
- If you are the author of a hook or have approved it, you can't promote it to a set yourself
- The first hook in set must include an image (which in turn must be approved)
- Within a set, it is strongly discouraged to run two hooks that are biographies next to each other
- It is similarly strongly discouraged to run two hooks about American topics next to each other
- The total number of biography and/or American topic hooks in a set is capped
- Between sets, it is discouraged to have the lead hooks be of similar types
- Certain hooks are tagged to be run on particular dates
- And so on
In the current process, people building prep sets scan the list pending hooks looking for ones that meet all the requirements. It would be good to have a tool which automates as much of this as possible and presents to the human a list of potential hooks that might fit a given slot. It would then be up to the human to confirm the suitability and pick from the suggestions presented (or ignore them completely).
A POC implementation of the evaluation system is currently running on toolforge. Source is available in github.
The next step is to repackage the nomination evaluation code as a bot which runs under cron on toolforge. This would:
- Run at some reasonable interval. Hourly seems like a good starting point. Based on some initial measurements, I estimate a run will take a couple of minutes to complete.
- Iterate over the articles in Category:Pending DYK nominations to find nominations to examine.
- For each unassessed nomination, evaluate it to determine if it's a biography and/or an American topic.
- Add Category:Pending DYK biographies and/or Category:Pending DYK American hooks to the nomination template as appropriate. The edit summary will include a link back to the bot's user page. A human can override the automatic assignments by adding or deleting classification templates manually.
- Keep track of which nominations it has processed so it doesn't keep reprocessing the same ones. Any nomination which already has any of the classification templates will be automatically skipped. Thus, if a human does a manual evaluation, the bot will never override the human.
- Iterate over [[:Category:Pending DYK biographies and Category:Pending DYK American hooks to find any templates which are (no longer) in Category:Pending DYK nominations and remove the classification categories.
- Alternative to that would be to have the bot edit the {{DYKsubpage}} which is on every nomination, adding new parameters to indicate the categories. That will clean up the cats automatically when the {{DYKsubpage}} during the nomination close process.
- I'll implement some kind of emergency button so anybody can stop it if it goes haywire.
- Assert will be used to prevent logged-out editing (I need to figure out how that works in pywikibot).
Future work will be to build a tool that a user can run (probably as part of the existing toolforge web service) to filter based on these categories and/or other criteria. I could also see additional classification categories being added in the future if needed.
The code that touches the wiki is pywikibot. The web app is Flask.
I don't anticipate the need to persist much data. What little bits of state I need, I'll probably use redis to keep things simple.
I've created User:DYK-Tools-Bot.
Discussion
So to clarify, this BRFA is about the addition and removal of Category:Pending DYK biographies and/or Category:Pending DYK American hooks to pages in Category:Pending DYK nominations? How does it make this assessment? I presume by the associated article/article talk containing certain categories (like biographies or america-related wikiprojects)? Or some other heuristic? ProcrastinatingReader (talk) 23:24, 15 December 2022 (UTC)
- The code is Article.is_biography() and Article.is_american(). The gist is:
- Biography: there's a birth year category or an infobox which descends from Category:People and person infobox templates.
- American: the word "american" appears in the intro, there's a category that ends in "in the united states", or there's a link anywhere in the article to a US State (or state-like area) page.
- These are probably not perfect, but they seem to be working. The heuristics can always be tweaked. Errors (in either direction) are not critical, since this is just an aid to a human who makes the final decision. -- RoySmith (talk) 23:48, 15 December 2022 (UTC)
- Will the bot also be differentiating between approved and non-approved noms, by the way? theleekycauldron (talk • contribs) (she/her) 03:06, 16 December 2022 (UTC)
- The existing code certainly has the ability to figure out if a nomination is approved. Ultimately I envision a front-end where you can say, for example, "Show me all the non-American biographies that are approved". But that's not something the bot part of this needs to know about when it's assigning categories. -- RoySmith (talk) 03:24, 16 December 2022 (UTC)
- Will the bot also be differentiating between approved and non-approved noms, by the way? theleekycauldron (talk • contribs) (she/her) 03:06, 16 December 2022 (UTC)
Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. ProcrastinatingReader (talk) 13:36, 16 December 2022 (UTC)
- OK, thanks. I haven't actually written the bot code yet; I assume the 7 days runs from whenever I turn it on? -- RoySmith (talk) 14:27, 16 December 2022 (UTC)
- Template:Did you know nominations/Rosa Diaz – fictional character articles are usually treated as biographical for the purposes of prep set building. theleekycauldron (talk • contribs) (she/her) 21:45, 18 December 2022 (UTC)
- I've done one full run, kicked off manually. Took longer than I expected:
2022-12-18 21:49:35,789 INFO dykbot Done. Processed 242 nominations in 0:40:22.708095
- but additional runs should take a lot less time since they will just be working on the new nominations. -- RoySmith (talk) 21:59, 18 December 2022 (UTC)
- I saw this on my watchlist. Roy, I think it would be best for soliciting feedback to have a link to this BRFA in both the bot's edit summaries and on its user page – I don't currently have anything to say on the task itself, but a user who wanted to would have to go to WP:BRFA (which is not linked on the user page) and find the specific subpage, which is not very accessible. Thanks, Sdrqaz (talk) 22:40, 18 December 2022 (UTC)
- Thanks for the suggestion. I've added a link to here from the bot's user page. I'll provide something better as things progress. -- RoySmith (talk) 00:40, 19 December 2022 (UTC)
- Users are breaking MOS:LISTGAP by continuing lists after {{DYK-Tools-Bot was here}} is placed without being integrated into the list. I would suggest either placing the template outside of {{DYKsubpage}} or integrating it into the template with a
|DYK-Tools-Bot=
parameter. Same goes for any relevant categories, although those could be made parameters of {{DYK-Tools-Bot was here}} as well. theleekycauldron (talk • contribs) (she/her) 08:42, 21 December 2022 (UTC)- Yeah, I'll work on that, thanks. My template-fu is kind of weak, but I'll see what I can figure out. I'm not a fan of the whole "HTML comments as delimiters" thing; it really breaks the model of being able to parse wikitext in some structured way. -- RoySmith (talk) 14:03, 21 December 2022 (UTC)
- All right, works for me. Do you know of a way to sort these into the "Approved" and "Pending" categories as well? This could be as simple as checking whether it's transcluded to WP:DYKN or WP:DYKNA. Would be a huge help. theleekycauldron (talk • contribs) (she/her) 12:10, 22 December 2022 (UTC)
- There's already code which knows how to follow the chain of approvals and dis-approvals. I'm working on a fix for the issue you pointed out yesterday, I want to get that out the door before I look at other stuff. -- RoySmith (talk) 12:42, 22 December 2022 (UTC)
- All right, works for me. Do you know of a way to sort these into the "Approved" and "Pending" categories as well? This could be as simple as checking whether it's transcluded to WP:DYKN or WP:DYKNA. Would be a huge help. theleekycauldron (talk • contribs) (she/her) 12:10, 22 December 2022 (UTC)
- @Theleekycauldron How does Special:Diff/1129331559 look. Will that work? -- RoySmith (talk) 19:42, 24 December 2022 (UTC)
- @RoySmith: Sigh, my mistake – I was under the impression that {{DYKsubpage}} came with <noinclude> tags pre-installed. Now the categories are being transcluded onto WP:DYKN. I'd say probably your best bet is gonna be adding
|american=
and|biographical=
templates to DYKsubpage. I'm happy to assist you with that, if you'd like :) theleekycauldron (talk • contribs) (she/her) 22:29, 24 December 2022 (UTC)- I had an earlier version that wrapped the cats in noinclude tags, but I got rid of that in the latest go-round because it added a lot of complication to the code. I'm really hesitant to bury this in the DYKsubpage template because that will add its own layer of complication and cross-dependencies. What I'm thinking is {{Pending DYK biographies}} which would look something like:
<noinclude>Category:Pending DYK biographies</noinclude>
- but I haven't been able to find the right combination of tags that would let the category apply to the Template:Did you know nominations/... page, but not to the pages that include that. I'd certainly appreciate help figuring that out. -- RoySmith (talk) 22:43, 24 December 2022 (UTC)
- @Theleekycauldron OK, I think I've got this figured out. Take a look at:
- {{Did_you_know_nominations/East_Germany–Zanzibar_relations}} is in Category:Pending DYK biographies, the other two, which transclude the first, are not in the category. From my coding point of view, all the bot needs to do is add or remove {{Pending DYK biographies}}, so the code is relatively clean. Will that work for you?
- With only a small amount of encouragement, I could go off on a frothing-at-the-mouth rant about Mediawiki markup language, but I'll behave myself. -- RoySmith (talk) 02:11, 25 December 2022 (UTC)
- @RoySmith: Okay, I was definitely very wrong! Seems that it absolutely needs to go inside the {{DYKsubpage}} template, because otherwise the note persists after the nomination is closed. Other than that (and please please pretty please a category for noms transcluded to WP:DYKNA), looks good to me! theleekycauldron (talk • contribs) (she/her) 11:03, 25 December 2022 (UTC)
- @Theleekycauldron Let's take a step back. What is it that you actually are concerned about which moving the categories inside or outside {{DYKsubpage}} will solve? -- RoySmith (talk) 14:59, 25 December 2022 (UTC)
- Related question: is there some documentation which describes how the morass of DYK templates are supposed to work? Looking at the source for {{DYKsubpage}} I can't make heads or tails of what's supposed to be happening. Specifically, I've been trying to dig my way down to where the "(Review or comment . Article history)" text is produced and can't find it. -- RoySmith (talk) 15:40, 25 December 2022 (UTC)
- Ugh, the problem was that it's not in a template at all. It's in Module:DYK nompage links. -- RoySmith (talk) 15:46, 25 December 2022 (UTC)
- Related question: is there some documentation which describes how the morass of DYK templates are supposed to work? Looking at the source for {{DYKsubpage}} I can't make heads or tails of what's supposed to be happening. Specifically, I've been trying to dig my way down to where the "(Review or comment . Article history)" text is produced and can't find it. -- RoySmith (talk) 15:40, 25 December 2022 (UTC)
- @Theleekycauldron Let's take a step back. What is it that you actually are concerned about which moving the categories inside or outside {{DYKsubpage}} will solve? -- RoySmith (talk) 14:59, 25 December 2022 (UTC)
- @RoySmith: Okay, I was definitely very wrong! Seems that it absolutely needs to go inside the {{DYKsubpage}} template, because otherwise the note persists after the nomination is closed. Other than that (and please please pretty please a category for noms transcluded to WP:DYKNA), looks good to me! theleekycauldron (talk • contribs) (she/her) 11:03, 25 December 2022 (UTC)
- @RoySmith: Sigh, my mistake – I was under the impression that {{DYKsubpage}} came with <noinclude> tags pre-installed. Now the categories are being transcluded onto WP:DYKN. I'd say probably your best bet is gonna be adding
- Yeah, I'll work on that, thanks. My template-fu is kind of weak, but I'll see what I can figure out. I'm not a fan of the whole "HTML comments as delimiters" thing; it really breaks the model of being able to parse wikitext in some structured way. -- RoySmith (talk) 14:03, 21 December 2022 (UTC)
@RoySmith: I'm concerned because when you have a list that looks like this:
: something :: something ::: something :::: something {{DYK-Tools-Bot was here}} ::::: something
we end up with a LISTGAP problem for screenreaders. Moving it outside {{DYKsubpage}} would theoretically prevent users from talking around it. theleekycauldron (talk • contribs) (she/her) 06:18, 26 December 2022 (UTC)