Jump to content

Wikipedia:Bots/Requests for approval/DYK-Tools-Bot

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by RoySmith (talk | contribs) at 15:06, 26 December 2022 (Discussion: Reply). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

New to bots on Wikipedia? Read these primers!

Operator: RoySmith (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:35, Thursday, December 15, 2022 (UTC)

Function overview: A bot to assist in various tasks related to WP:DYK maintenance.

Automatic, Supervised, or Manual:Automatic

Programming language(s): Python

Source code available: https://github.com/roysmith/dyk-tools

Links to relevant discussions (where appropriate):

Edit period(s): Hourly

Estimated number of pages affected: Category:Pending DYK nominations currently has 268 entries

Namespace(s): Template

Exclusion compliant (Yes/No): No

Function details: This is a proposal for a new bot to help out at WP:DYK. A big part of the back-end work of DYK is building prep sets. Each set consists of 8 "hooks", which are chosen from those proposed in nominations. The selection of hooks needs to comply with an absurdly large number of rules. These rules include:

  • The hook must be previously approved, indicated by a checkmark icon on the nomination template.
  • Once approved, a hook can be unapproved by somebody raising an objection, requiring that it be re-approved
  • If you are the author of a hook or have approved it, you can't promote it to a set yourself
  • The first hook in set must include an image (which in turn must be approved)
  • Within a set, it is strongly discouraged to run two hooks that are biographies next to each other
  • It is similarly strongly discouraged to run two hooks about American topics next to each other
  • The total number of biography and/or American topic hooks in a set is capped
  • Between sets, it is discouraged to have the lead hooks be of similar types
  • Certain hooks are tagged to be run on particular dates
  • And so on

In the current process, people building prep sets scan the list pending hooks looking for ones that meet all the requirements. It would be good to have a tool which automates as much of this as possible and presents to the human a list of potential hooks that might fit a given slot. It would then be up to the human to confirm the suitability and pick from the suggestions presented (or ignore them completely).

A POC implementation of the evaluation system is currently running on toolforge. Source is available in github.

The next step is to repackage the nomination evaluation code as a bot which runs under cron on toolforge. This would:

  • Run at some reasonable interval. Hourly seems like a good starting point. Based on some initial measurements, I estimate a run will take a couple of minutes to complete.
  • Iterate over the articles in Category:Pending DYK nominations to find nominations to examine.
  • For each unassessed nomination, evaluate it to determine if it's a biography and/or an American topic.
  • Add Category:Pending DYK biographies and/or Category:Pending DYK American hooks to the nomination template as appropriate. The edit summary will include a link back to the bot's user page. A human can override the automatic assignments by adding or deleting classification templates manually.
  • Keep track of which nominations it has processed so it doesn't keep reprocessing the same ones. Any nomination which already has any of the classification templates will be automatically skipped. Thus, if a human does a manual evaluation, the bot will never override the human.
  • Iterate over [[:Category:Pending DYK biographies and Category:Pending DYK American hooks to find any templates which are (no longer) in Category:Pending DYK nominations and remove the classification categories.
    • Alternative to that would be to have the bot edit the {{DYKsubpage}} which is on every nomination, adding new parameters to indicate the categories. That will clean up the cats automatically when the {{DYKsubpage}} during the nomination close process.
  • I'll implement some kind of emergency button so anybody can stop it if it goes haywire.
  • Assert will be used to prevent logged-out editing (I need to figure out how that works in pywikibot).

Future work will be to build a tool that a user can run (probably as part of the existing toolforge web service) to filter based on these categories and/or other criteria. I could also see additional classification categories being added in the future if needed.

The code that touches the wiki is pywikibot. The web app is Flask.

I don't anticipate the need to persist much data. What little bits of state I need, I'll probably use redis to keep things simple.

I've created User:DYK-Tools-Bot.

Discussion

So to clarify, this BRFA is about the addition and removal of Category:Pending DYK biographies and/or Category:Pending DYK American hooks to pages in Category:Pending DYK nominations? How does it make this assessment? I presume by the associated article/article talk containing certain categories (like biographies or america-related wikiprojects)? Or some other heuristic? ProcrastinatingReader (talk) 23:24, 15 December 2022 (UTC)[reply]

The code is Article.is_biography() and Article.is_american(). The gist is:
These are probably not perfect, but they seem to be working. The heuristics can always be tweaked. Errors (in either direction) are not critical, since this is just an aid to a human who makes the final decision. -- RoySmith (talk) 23:48, 15 December 2022 (UTC)[reply]
Will the bot also be differentiating between approved and non-approved noms, by the way? theleekycauldron (talkcontribs) (she/her) 03:06, 16 December 2022 (UTC)[reply]
The existing code certainly has the ability to figure out if a nomination is approved. Ultimately I envision a front-end where you can say, for example, "Show me all the non-American biographies that are approved". But that's not something the bot part of this needs to know about when it's assigning categories. -- RoySmith (talk) 03:24, 16 December 2022 (UTC)[reply]

Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. ProcrastinatingReader (talk) 13:36, 16 December 2022 (UTC)[reply]

OK, thanks. I haven't actually written the bot code yet; I assume the 7 days runs from whenever I turn it on? -- RoySmith (talk) 14:27, 16 December 2022 (UTC)[reply]
Yeah ProcrastinatingReader (talk) 16:52, 16 December 2022 (UTC)[reply]
2022-12-18 21:49:35,789 INFO dykbot Done. Processed 242 nominations in 0:40:22.708095
but additional runs should take a lot less time since they will just be working on the new nominations. -- RoySmith (talk) 21:59, 18 December 2022 (UTC)[reply]

@RoySmith: I'm concerned because when you have a list that looks like this:

: something
:: something
::: something
:::: something
{{DYK-Tools-Bot was here}}
::::: something

we end up with a LISTGAP problem for screenreaders. Moving it outside {{DYKsubpage}} would theoretically prevent users from talking around it. theleekycauldron (talkcontribs) (she/her) 06:18, 26 December 2022 (UTC)[reply]

That makes sense. But it's at odds with your last statement, Seems that it absolutely needs to go inside the DYKsubpage template -- RoySmith (talk) 15:06, 26 December 2022 (UTC)[reply]