Wikipedia:Moderator Tools/Automoderator

The Moderator Tools team is exploring a project to build an 'automoderator' tool for Wikimedia projects. It would allow moderators to configure automated prevention or reversion of bad edits based on scoring from a machine learning model. In simpler terms, we're looking to build software which performs a similar function to ClueBot NG, but make this available to all language communities.

Our hypothesis is: If we enable communities to automatically prevent or revert obvious vandalism, moderators will have more time to spend on other activities.

We will be researching and exploring this idea during the rest of 2023, and expect to be able to start engineering work by the start of the 2024 calendar year.

Further details and centralised discussion can be found on MediaWiki, but we wanted to also create a discussion venue on the English Wikipedia to discuss how Automoderator might be used here, particularly because of the existence of ClueBot NG. Below you'll find a summary of the MediaWiki overview, as well as some English Wikipedia-specific questions we have.

Summary

A substantial number of edits are made to Wikimedia projects which should unambiguously be undone, reverting a page back to its previous state. Patrollers and administrators have to spend a lot of time manually reviewing and reverting these edits, which contributes to a feeling on many larger wikis that there is an overwhelming amount of work requiring attention compared to the number of active moderators. We would like to reduce these burdens, freeing up moderator time to work on other tasks.

Our goals are:

Reduce moderation backlogs by preventing bad edits from entering patroller queues.
Give moderators confidence that automoderation is reliable and is not producing significant false positives.
Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.

Potential solution

We are envisioning a tool which could be configured by a community's moderators to automatically prevent or revert edits. Reverting edits is the more likely scenario - preventing an edit requires high performance so as not to impact edit save times. Additionally, it provides less oversight of what edits are being prevented, which may not be desirable, especially with respect to false positives. Moderators should be able to configure whether the tool is active or not, and have options for how strict the model should be.

Lower thresholds would mean more edits get reverted, but the false positive rate is higher, while a high threshold would revert a smaller number of edits, but with higher confidence.

ClueBot NG

On English Wikipedia this function is currently performed by ClueBot NG, a volunteer-maintained tool which automatically reverts vandalism based on a long-running machine learning model. The bot is configured to have a 0.1% false positive rate, and enables editors to report false positives for review by the community.

Based on our analysis, ClueBot NG currently reverts approximately 150-200 edits per day, though it previously reverted considerably more - as many as 1500-2500 per day in 2010.

Questions

We'd like to know more about your experiences with ClueBot NG. Below are some questions we have, but any thoughts you want to share are welcome:

Do you think ClueBot NG has a substantial impact on the volume of edits you need to review?
Do you review ClueBot NG's edits or false positive reports? If so, how do you find this process?
Are there any obvious improvements or feature requests you can think of for ClueBot NG?