Jump to content

Wikipedia:Edit filter/Requested

From Wikipedia, the free encyclopedia
    Requested edit filters

    This page can be used to request edit filters, or changes to existing filters. Edit filters are primarily used to address common patterns of harmful editing.

    Private filters should not be discussed in detail. If you wish to discuss creating an LTA filter, or changing an existing one, please instead email details to wikipedia-en-editfilters@lists.wikimedia.org.

    Otherwise, please add a new section at the bottom using the following format:

    == Brief description of filter ==
    *'''Task''': What is the filter supposed to do? To what pages and editors does it apply?
    *'''Reason''': Why is the filter needed?
    *'''Diffs''': Diffs of sample edits/cases. If the diffs are revdelled, consider emailing their contents to the mailing list.
    ~~~~
    

    Please note the following:

    • Edit filters are used primarily to prevent abuse. Contributors are not expected to have read all 200+ policies, guidelines and style pages before editing. Trivial formatting mistakes and edits that at first glance look fine but go against some obscure style guideline or arbitration ruling are not suitable candidates for an edit filter.
    • Filters are applied to all edits on all pages. Problematic changes that apply to a single page are likely not suitable for an edit filter. Page protection may be more appropriate in such cases.
    • Non-essential tasks or those that require access to complex criteria, especially information that the filter does not have access to, may be more appropriate for a bot task or external software.
    • To prevent the creation of pages with certain names, the title blacklist is usually a better way to handle the problem - see MediaWiki talk:Titleblacklist for details.
    • To prevent the addition of problematic external links, please make your request at the spam blacklist.
    • To prevent the registration of accounts with certain names, please make your request at the global title blacklist.
    • To prevent the registration of accounts with certain email addresses, please make your request at the email blacklist.




    Edits adding raw text maintenance tags instead of standard templates

    [edit]
    • Task: Log edits meeting above criteria
    • Reason: Editors have been adding maintenance tags in raw text instead of using the correct templates
    • Diffs: Examples of fix: Special:Diff/1293287735, 74 JWB edits

    I wish to log edits where editors add Wikipedia:, WP:, Help: (case insensitive) inside of <sup></sup> to check the extent to which new editors use raw text instead of actual maintenance tags, and if a bot will be required for regular maintenance of this or not. Thanks! CX Zoom[he/him] (let's talk • {CX}) 21:35, 31 May 2025 (UTC)[reply]

    This might work:
    added_lines contains "<sup>[''[[Wikipedia:"
    
    『π』BalaM314〘talk〙 14:37, 11 June 2025 (UTC)[reply]
    The basics could be something like
    equals_to_any(page_namespace, 0) &
    added_lines irlike "<sup>(\[|&#x5B;)\'\'\[\[(Wikipedia|Help|WP|H)\:"
    
    I haven't checked yet for how many of those edits it works, but it could be a start. Nobody (talk) 06:10, 12 June 2025 (UTC)[reply]
    And instead of using equals_to_any(page_namespace, 0), we could just use page_namespace == 0 since there is only one namespace being checked here. – PharyngealImplosive7 (talk) 07:37, 12 June 2025 (UTC)[reply]
    I left it like that because I'm not sure if we should add the draftspace too. Nobody (talk) 07:40, 12 June 2025 (UTC)[reply]
    The regex looks to be working as intended. CX Zoom[he/him] (let's talk • {CX}) 01:21, 13 June 2025 (UTC)[reply]
    I'll test the regex with all of the hits from above. @CX Zoom looks like there's already some more again according to this search. Want to start JWB again or wait until we got the filter in place? Nobody (talk) 06:29, 18 June 2025 (UTC)[reply]
    Its been quite some time. Can anyone please implement the regex by 1AmNobody24? CX Zoom[he/him] (let's talk • {CX}) 13:19, 23 June 2025 (UTC)[reply]
    @CX Zoom I've started the list for edit's to check against the regex at User:1AmNobody24/sandbox. Still missing around 2/3 of the total edits, but I'll get there eventually. Nobody (talk) 13:28, 23 June 2025 (UTC)[reply]
    I believe we can also exclude bots and check for !removed_lines as well:
    equals_to_any(page_namespace, 0) &
    !("bot" in user_groups) &
    (
        nope := "<sup>(?:\[|&#x5B;)''\[\[(?:Wikipedia|Help|WP|H):";
        
        added_lines rlike nope &
        !(removed_lines rlike nope)
    )
    
    Codename Noreste (talk · contribs) 22:34, 23 June 2025 (UTC)[reply]
    I've checked around 200 edits (some listed here, the others are part of this list). This code:
    pattern :="<sup>(?:\[|&#x5B;)''\[\[(?:Wikipedia|Help):";
    
    equals_to_any(page_namespace, 0, 2, 118) &
    !contains_any(user_groups, "sysop", "bot") &
    added_lines irlike pattern &
    !(removed_lines irlike pattern)
    
    worked for all I checked. There's some more variables that could be used like user_editcount for example, but not sure it really matters. Nobody (talk) 12:59, 24 June 2025 (UTC)[reply]
    Should we exempt sysops - they can make this mistake also? – PharyngealImplosive7 (talk) 15:46, 26 June 2025 (UTC)[reply]
    They are not likely to do it, but I am also not really in favour of increasing the complexity of the filter to exclude them when we just log the edits, not prevent or tag it. CX Zoom[he/him] (let's talk • {CX}) 15:51, 26 June 2025 (UTC)[reply]
    I think it's highly unlikely a sysop makes that kind of mistake. But I'm also unsure if a warn filter with a custom warning wouldn't be better. Nobody (talk) 15:54, 26 June 2025 (UTC)[reply]
    Any updates on this? Looks like there have been a few more edits of this kind. CX Zoom[he/him] (let's talk • {CX}) 20:07, 10 July 2025 (UTC)[reply]
    @1AmNobody24 and CX Zoom: Filter created and Monitoring... to see whether warnings, tagging, or just logging would be best. – PharyngealImplosive7 (talk) 20:32, 13 July 2025 (UTC)[reply]

    Adding nonexistent templates

    [edit]
    • Task: Log (and possibly tag?) whenever the user adds a transclusion of a template that does not exist. Also, warn on certain edits. Warning all such edits would be far too bitey in my opinion. However, in my opinion, I think the following categories of edits would likely benefit from a warning. (Also, I think the warnings should mainly be restricted to mainspace or maybe talkspace; drafts and userpages should be free for now.) The categories of edits to warn could be, of course, adjusted once the filter is in.
    • Edits by IPs and new users. In many cases, these are vandalism.
    • Edits made in the mobile Wikipedia app. (Since it doesn't have a visual editor, it's incredibly easy to forget to close your curly braces; a warning would really help with this.)
    • Malformed citations (this could be detected by looking for the word "cite" or a URL in the title).
    • Nonexistent "country data" templates; these can arise from misuse of templates like {{flag}} or {{flagicon}}
    • Nonexistent WikiProject tags in talkspace (which could both arise from vandalism and from AWB mistakes).
    (One possible approach could be to pair a log-only filter, which catches all nonexistent templates, with a separate warn filter. This two-filter approach has been tried before, for example in 1296 and 1297, so I don't see any reason why it wouldn't work here. Given the heterogeneity of these categories, it might be even better to have multiple warn filters, so we could show a different warning for each category.)
    • Diffs: For some of these categories:

    Duckmather (talk) 00:38, 3 June 2025 (UTC)[reply]

    @Duckmather: I'm commenting on the technical aspects here, to see if such a filter (or multiple) could exist. I'm not sure if the abusefilter can check whether a template exists or not; maybe it could check if a the text in a template exists (in the template namespace of course) using page_last_edit_age (if it's not null, then the page exists) but I'm not sure. The mobile app constraint is fairly easy to check (the AbuseFilter extension) has a variable user_app built into it for this exact purpose.
    I'm not sure if the "country data" thing is actually possible with an AbuseFilter - that would require crosschecking with whatever module supports those the {{flag}} and {{flagicon}} templates which isn't possible as far as I know. A similar issue occurs with the wikiproject tags issue you bring up; the AbuseFilter can not cross-reference a module as far as I know. I also don't think checking if someone has properly closed template brackets or otherwise is possible with the AbuseFilter in a feasible way; the logic would be pretty complex and would still produce a lot of FPs. – PharyngealImplosive7 (talk) 03:50, 3 June 2025 (UTC)[reply]
    @PharyngealImplosive7: The filter is in fact doable. The key idea is to use the new_html variable, which uses parsing to detect whether templates exist or not. For example, if I were to write {{fake example}}, this would translate into HTML as <a href="/w/index.php?title=Template:Fake_example&action=edit&redlink=1" class="new" title="Template:Fake example (page does not exist)">Template:Fake example</a>. You could in turn detect this redlink using the regex <a href="\/w\/index\.php\?title=Template:[^"]*\;redlink=1\" class=\"new\". Of course, this line of code by itself would generate false positives, as wikilinking a nonexistent template the usual way will also produce identical HTML, so it would need some refining. But I don't see any fundamentally technical barriers preventing you from pulling this off. Duckmather (talk) 04:37, 3 June 2025 (UTC)[reply]
    @Duckmather: That indeed is a smart approach; I didn't think of that. However, one more thing to note is that new_html is a large variable, so ideally it should be placed at the end of any filter for performance reasons. I believe that my point of detecting if someone has left brackets closed or not is unfeasible still stands though. – PharyngealImplosive7 (talk) 06:07, 3 June 2025 (UTC)[reply]
    After a moderate amount of thought and a lot of procrastination, I have some draft code. Define the following regular expressions:
    wikitext_template := "{{[^\||\n|}]*(\||\n|}})";
    common_template := "(?x){{(?:
    !
    |[Aa]nchor
    |[Aa]s\ of|[Aa]uthority\ control
    |[Bb]irth\ date(?:\ and\ age)?
    |[Bb]lockquote
    |[Cc]-SPAN
    |[Cc]bignore
    |[Cc]irca
    |[Cc]itation needed
    |[Cc]ite\ (?:AV\ media|book|conference|encyclopedia|interview|journal|magazine|news|press\ release|tweet|web)
    |[Cc]lear
    |[Cc]n
    |[Cc]oord
    |[Ee]fn
    |[Ee]?m(?:dash)?
    |[Ff]urther
    |[Gg]Burl
    |[Gg]loss
    |[Gg]oogle\ [Bb]ooks(?:\ URL)?
    |[Hh]arvnb
    |[Ii]PAc-(?:ar|cmn|en|hu|pl|yue)
    |(?:ISBN|isbn)\??
    |[Ii]nfobox\ (?:album|book|company|film|football\ biography|musical\ artist|NRHP|officeholder|person|settlement|song|television)
    |[Ll]angx?
    |[Ll]egend
    |[Mm]ain
    |[Mm]ath
    |[Mm]dash
    |[Mm]ultiple\ image
    |[Nn]bsp
 |[Nn]owrap
    |[Oo]fficial\ website
    |[Pp]lainlist
    |[Pp]p(?:-(blp|dispute|extended|semi-indef|sock|vandalism))?
    |[Pp]roQuest
    |[Rr]ef(?:begin|end|h|list)
    |[Ss]fnm?
    |[Ss]hort\ description
    |[Tt]OC\ limit
    |[Uu]se\ (dmy|mdy)\ dates
    |[Uu]se\ (American|Australian|British|Canadian|Hong\ Kong|Indian|Jamaican|Kenyan|Liberian|New\ Zealand|Nigerian|Pakistani|Philippine|Singapore|South\ African|Sri\ Lankan|Trinidad\ and\ Tobago|Ugandan)\ English
    |[Ww]ebarchive
    |[Ww]ikiProjectBannerShell

    |[Ww]ikiProject\ (Albums|Anthroponymy|Australia|Articles\ for\ creation|Biography|Canada|Cities|banner\ shell|Disambiguation|Football|Film|France|Germany|India|Lepidoptera|Lists|Military\ history|Olympics|Songs|Television|United\ States)
    )\s*(?:\||\n|}})";
    nonexistent_template = '<a href="\/w\/index\.php\?title=Template:([^" ]*)\;redlink=1\" class=\"new\"';
    (To explain: wikitext_template catches the use of any template in wikitext; common_template catches various commonly used templates; and nonexistent_template catches a HTML link to a nonexistent template. Part of why this took so long is that I had to try several different things before I could get a satisfactory list of common templates.)
    With these regular expressions in place, the logging filter could be defined as follows:
    added_lines rlike wikitext_template &
    rcount(common_template, added_lines) < rcount(wikitext_template, added_lines) &
    new_html rlike nonexistent_template
    and with the same regular expressions in place, the warning filter could be defined as follows:
    equals_to_any(page_namespace, 0, 1, 118) &
    rcount(common_template, added_lines) < rcount(wikitext_template, added_lines) &
    new_html like nonexistent_template &
    (
    !(contains_any(user_groups, "autoconfirmed", "bot", "confirmed"))
    | user_mobile
    | user_app
    | (summary rlike "^Created by translating the page" & page_id == 0)
    | new_html rlike '<a href="\/w\/index\.php\?title=Template:[^" ]*(cite|https?:\/\/|doi|isbn|(IPA|lang-)[\w-]+|\w+\ icon|wikiproject|[^\x00-\xFF\s–—])[^" ]*\;redlink=1\" class=\"new\"'
    ) &
    !(summary irlike "restor(?:ed?|ing)|revert(?:ed|ing)?|und(?:o|id)" & page_id != 0)
    Duckmather (talk) 05:43, 23 June 2025 (UTC)[reply]
    Wouldn't simply looking for class="new" title="Template: in the new_html also work? Nobody (talk) 05:57, 23 June 2025 (UTC)[reply]
    Also, another thing you could watch out for when it comes to malformed citations are DOIs (example) and ISBNs (example). Duckmather (talk) 05:14, 3 June 2025 (UTC)[reply]

    Changing countries

    [edit]
    • Task: This abuse filter would warn against edits that change country names in an infobox or in at least 30% of occurrences in a mainspace page by non-confirmed users and tag such edits if they are actually made.
    • Reason: I've seen 2A02:CB80:4225:3210:882:ABCE:48ED:C7E6 erroneously change country names in infoboxes multiple times without a source despite multiple warnings.
    • Diffs: Edits of 2A02:CB80:4225:3210:882:ABCE:48ED:C7E6

    Faster than Thunder (talk | contributions) Tamil speakers: Contribute here 03:31, 9 July 2025 (UTC)[reply]

    Could you give more examples of such disruption? This is because it is not worth it to use up conditions of the AbuseFilter for rare types of disruption. – PharyngealImplosive7 (talk) 22:58, 9 July 2025 (UTC)[reply]
    I agree with PI7, if it's just one IP, Deferred to WP:AIV. If not, more diffs or examples would be needed here. EggRoll97 (talk) 18:31, 13 July 2025 (UTC)[reply]

    AfC template filters

    [edit]

    Addition of spurious AfC decline templates

    [edit]
    • Task: This filter tracks the addition of pre-declined AfC templates, {{afc submission|d|...}}, that were not converted from a pending {{afc submission|...}} AfC template. The unaffected user groups (patroller and sysop) correspond to trusted users that would be expected to know their way around AfC reviewing and might need to write such a code for technical reasons – as far as I know, there isn't a defined user group for AfC reviewers that can be used in the edit filter code.
    • Reason: Recently, LLMs such as ChatGPT have had a tendency to generate drafts with a pre-existing declined AfC template (with spurious or missing additional parameters), which causes issues for reviewers having to clean it up.
    • Diffs: Special:Diff/1300662022, Special:Diff/1300655195, Special:Diff/1296316116
    • Proposed code:
    page_namespace == 118 &
    !contains_any(user_groups, "patroller", "sysop") &
    added_lines irlike "\{\{afc(\ submission)?\|d" &
    !(removed_lines irlike "\{\{afc")

    Chaotic Enby (talk · contribs) 22:04, 15 July 2025 (UTC)[reply]

    Should we also track this in the user (2) namespace? Some people put their articles in their userpage or in their user sandbox. 🧙‍♀️ Children Will Listen (🐄 talk, 🫘 contribs) 22:58, 15 July 2025 (UTC)[reply]
    Good idea!
    equals_to_any(page_namespace, 2, 118) &
    !contains_any(user_groups, "patroller", "sysop") &
    added_lines irlike "\{\{afc(\ submission)?\|d" &
    !(removed_lines irlike "\{\{afc")
    
    Chaotic Enby (talk · contribs) 22:59, 15 July 2025 (UTC)[reply]
    [1] and [2] are some examples of this (both were originally in the user namespace before they were moved.) 🧙‍♀️ Children Will Listen (🐄 talk, 🫘 contribs) 23:05, 15 July 2025 (UTC)[reply]
    @Chaotic Enby: Any thoughts on using page_id == 0 for this suggested filter, as many of these editors add these decline templates when first creating the page with an LLM? It probably would also help reduce false positives where AFC reviewers decline pages. – PharyngealImplosive7 (talk) 23:52, 15 July 2025 (UTC)[reply]
    I thought so, but I have seen cases like Special:Diff/1299636907 where the template was added later. False positives should already be handled by the bottom line (which checks if the decline template didn't replace an existing AfC template), but I'm okay with adding the page_id check to be careful. Chaotic Enby (talk · contribs) 00:14, 16 July 2025 (UTC)[reply]
    Maybe a better check would be to use a boolean or with page_first_contributor == user_name and page_id == 0 to see if the template was added by the page creator or added when the page itself was created. – PharyngealImplosive7 (talk) 00:31, 16 July 2025 (UTC)[reply]
    That would work great, thanks! Chaotic Enby (talk · contribs) 00:36, 16 July 2025 (UTC)[reply]

    Removal of AfC decline templates

    [edit]
    • Task: This filter tracks the removal of declined AfC templates.
    • Reason: Removing declined AfC templates is disallowed as it can obscure valid issues pointed out by previous reviewers. Using the WP:AFCH script will not activate this filter, as it removes the templates immediately after moving the page to mainspace.
    • Diffs: Special:GoToComment/c-Asilvering-20250711151200-Chaotic_Enby-20250711102800
    • Proposed code:
    equals_to_any(page_namespace, 2, 118) &
    !contains_any(user_groups, "patroller", "sysop") &
    removed_lines irlike "\{\{afc(\ submission)?\|d" &
    !(added_lines irlike "\{\{afc(\ submission)?\|d")

    Chaotic Enby (talk · contribs) 22:04, 15 July 2025 (UTC)[reply]

    Addition of AfC template redirect

    [edit]
    • Task: This filter tracks the addition of {{afc}}.
    • Reason: The incorrectly formatted {{afc}} is also a recently observed LLM trend. While it is a redirect to {{afc submission}}, it is not correctly processed by WP:AFCH and can cause issues for reviewers.
    • Diffs: Special:Diff/1300159746
    • Proposed code:
    equals_to_any(page_namespace, 2, 118) &
    !contains_any(user_groups, "patroller", "sysop") &
    added_lines irlike "\{\{afc\ ?(\}\}|\|)" &
    !(removed_lines irlike "\{\{afc\ ?(\}\}|\|)")

    Chaotic Enby (talk · contribs) 22:29, 15 July 2025 (UTC)[reply]

    @Chaotic Enby: For all three tasks, Filter created and Monitoring...PharyngealImplosive7 (talk) 00:40, 16 July 2025 (UTC)[reply]

    Markdown

    [edit]
    • Task: This filter tracks the use of Markdown-formatted text (specifically, Markup italics/bold and link syntax).
    • Reason: Markdown is a markup language distinct from (and incompatible with) wikitext, but often generated by language models such as ChatGPT, or used by newer users unfamiliar with wikitext syntax. This filter could help track these uses and fix them when needed. This is not an AI-specific filter (as new users might legitimately make the mistake of using Markdown) and shouldn't be combined with 1325 or 1346.
    • Diffs: Special:Diff/1300655195 (italics), Special:Diff/1300650900 (bold), Special:Diff/1300531575 (link syntax), Special:Diff/1300622204 (bold and link syntax in mainspace)
    • Proposed code:
    equals_to_any(page_namespace, 0, 118) &
    !("extendedconfirmed" in user_groups) &
    added_lines irlike "\*[^\*]*\*|\[.*\]\((https?\:\/\/|www.).*\)" &
    !(removed_lines irlike "\*[^\*]*\*|\[.*\]\((https?\:\/\/|www.).*\)")
    

    Chaotic Enby (talk · contribs) 22:20, 15 July 2025 (UTC)[reply]

    @Chaotic Enby: Filter created and Monitoring.... – PharyngealImplosive7 (talk) 23:02, 15 July 2025 (UTC)[reply]
    @PharyngealImplosive7: Just curious, isn't there a risk of \*.*\* catching indented bullet lists? I made a last-minute fix but not sure if it was caught in time. Chaotic Enby (talk · contribs) 23:04, 15 July 2025 (UTC)[reply]
    I fixed my fix, \*[^\*]+\*|\[.*\]\((https?\:\/\/|www.).*\) should work. Another source of false positives I'm thinking about is multi-element lists defined on one line (for example, in infobox parameters) as *first element<br>*second element, I'll have to code another fix to avoid those. Chaotic Enby (talk · contribs) 23:09, 15 July 2025 (UTC)[reply]
    Unless someone isn't using a newline for some reason (like in the *x<br>*y) example you just gave, the code should work fine, but nice catch. We may want to see how common this actually is first (if people don't really use <br> to separate bullets, this isn't a problem necesarily) – PharyngealImplosive7 (talk) 23:15, 15 July 2025 (UTC)[reply]
    Funnily enough, I saw that false positive in the exact same example I gave above (Special:Diff/1300622204). \*(?![^\*]+\<br\s?\/?\>.*\*)[^\*]+\*|\[.*\]\s?\((https?\:\/\/|www.).*\) should do it, but might be unnecessarily complicated, as I'm less familiar with optimizing negative lookaheads. Chaotic Enby (talk · contribs) 23:25, 15 July 2025 (UTC)[reply]
    So it turns out that the first hit of the filter was a false positive. I've tweaked the regex a bit to exclude br tags as well as double bullet points. – PharyngealImplosive7 (talk) 23:46, 15 July 2025 (UTC)[reply]

    Spurious oaicite/oai_citation syntax

    [edit]
    • Task: Tracks the use of the LLM-generated oaicite/oai_citation syntax.
    • Reason: This incorrect syntax is often generated by LLMs for citing sources. This filter element can be added to either the proposed Markdown filter (above) or 1346 which already tracks AI-sourced citations. The oai_citation syntax is often followed by a special character (e.g. oai_citation:0‡), although it is not distinctive enough to serve as a tell separately from oai_citation (cf. Moons of Saturn for its use as a list key).
    • Diffs: Special:Diff/1300622204, Special:Diff/1300270904
    • Proposed addition:
    added_lines irlike "oai_?cit(e|ation)" &
    !(removed_lines irlike "oai_?cit(e|ation)")

    Chaotic Enby (talk · contribs) 22:42, 15 July 2025 (UTC)[reply]

    @Chaotic Enby: Y DonePharyngealImplosive7 (talk) 22:53, 15 July 2025 (UTC)[reply]