Jump to content

Wikipedia:Edit filter/Requested

From Wikipedia, the free encyclopedia
    Requested edit filters

    This page can be used to request edit filters, or changes to existing filters. Edit filters are primarily used to address common patterns of harmful editing.

    Private filters should not be discussed in detail. If you wish to discuss creating an LTA filter, or changing an existing one, please instead email details to wikipedia-en-editfilters@lists.wikimedia.org.

    Otherwise, please add a new section at the bottom using the following format:

    == Brief description of filter ==
    *'''Task''': What is the filter supposed to do? To what pages and editors does it apply?
    *'''Reason''': Why is the filter needed?
    *'''Diffs''': Diffs of sample edits/cases. If the diffs are revdelled, consider emailing their contents to the mailing list.
    ~~~~
    

    Please note the following:

    • Edit filters are used primarily to prevent abuse. Contributors are not expected to have read all 200+ policies, guidelines and style pages before editing. Trivial formatting mistakes and edits that at first glance look fine but go against some obscure style guideline or arbitration ruling are not suitable candidates for an edit filter.
    • Filters are applied to all edits on all pages. Problematic changes that apply to a single page are likely not suitable for an edit filter. Page protection may be more appropriate in such cases.
    • Non-essential tasks or those that require access to complex criteria, especially information that the filter does not have access to, may be more appropriate for a bot task or external software.
    • To prevent the creation of pages with certain names, the title blacklist is usually a better way to handle the problem - see MediaWiki talk:Titleblacklist for details.
    • To prevent the addition of problematic external links, please make your request at the spam blacklist.
    • To prevent the registration of accounts with certain names, please make your request at the global title blacklist.
    • To prevent the registration of accounts with certain email addresses, please make your request at the email blacklist.



    AfD closures by anonymous users

    [edit]

    Someone who's wrong on the internet (talk) 14:58, 17 March 2025 (UTC)[reply]

    !(user_type in [named])
    & page_namespace == 4
    & page_title contains "Articles for deletion"
    & added_lines contains "'''Please do not modify it.'''</span>"
    & !(removed_lines contains "'''Please do not modify it.'''</span>")
    I'm using "Please do not modify it" as it's the most consistent part of closure statements, but the style of the div could also be used, assuming there is no hatting template that generates the same style. That last line might be a bit unnecessary as IPs messing with closed discussions isn't something we'd want either, but that's probably another issue. I've futureproofed it by also including temporary accounts. Chaotic Enby (talk · contribs) 15:07, 17 March 2025 (UTC)[reply]
    Is it possible to look for substituded template use? Since it looks like they properly used {{subst:Afd top}}. Nobody (talk) 15:12, 17 March 2025 (UTC)[reply]
    That's the thing, they didn't really use it properly, their close reads The following discussion is an closed debate instead of The following discussion is an archived debate. Chaotic Enby (talk · contribs) 15:18, 17 March 2025 (UTC)[reply]
    Noting that user_type in [ip, temp] should be replaced with !("autoconfirmed" in user_groups). – PharyngealImplosive7 (talk) 16:45, 17 March 2025 (UTC)[reply]
    Why should it be? I thought IPs weren't allowed to close discussions, not non-autoconfirmed users. Chaotic Enby (talk · contribs) 17:02, 17 March 2025 (UTC)[reply]
    Because in your current set-up, this issue may arise: Expressions like page_namespace in [14, 15] may not work as expected. This one will evaluate to true also if page_namespace is 1, 4, or 5. However, I agree my set-up also excludes new users. – PharyngealImplosive7 (talk) 19:53, 17 March 2025 (UTC)[reply]
    I don't think that will be an issue, as the five values user_type can have are ip, temp, named, external, and unknown. None of them are substrings of ip or temp, so the code should work as expected. Chaotic Enby (talk · contribs) 20:58, 17 March 2025 (UTC)[reply]
    FYI, further discussions of this should continue on the edit filter mailing list, as this is looks like an LTA. Codename Noreste (talk) 21:38, 17 March 2025 (UTC)[reply]
    It doesn't matter that this is an LTA. IPs are prohibited from closing AfDs regardless. Someone who's wrong on the internet (talk) 19:49, 18 March 2025 (UTC)[reply]
    Seconded. If an IP wants to start closing AfDs, they need to create an account, period. That is set in stone. BD2412 T 20:27, 18 March 2025 (UTC)[reply]
    Minor change here, but the double ampersands should be single ampersands for the and operators. I'm not sure if the abuse filter can tell the difference but it's better to be safe than sorry. – PharyngealImplosive7 (talk) 02:10, 19 March 2025 (UTC)[reply]
    How soon will this filter be activated? Someone who's wrong on the internet (talk) 00:46, 23 March 2025 (UTC)[reply]
    It's been a while since this filter has been requested and no EFM has reviewed it yet. @Daniel Quinlan: Do you have time to look at this? – PharyngealImplosive7 (talk) 17:52, 2 April 2025 (UTC)[reply]
    @PharyngealImplosive7: Wouldn't this work with !user_type in [named]? EggRoll97 (talk) 02:26, 30 March 2025 (UTC)[reply]
    Yeah, as I suppose that we won't be seeing much of external and unknown, and ip and temp are what we are aiming for (which covers all 5 options). – PharyngealImplosive7 (talk) 04:43, 30 March 2025 (UTC)[reply]
    @EggRoll97: Do you have the time to start testing this code and possibly create a filter? – PharyngealImplosive7 (talk) 18:02, 6 April 2025 (UTC)[reply]
    Apologies for the late response, my time isn't nearly as free as I'd like lately. As for the code, LGTM, other than apparently user_type which wasn't working with any configuration of trying to exempt "named", trying to catch only "ip, temp", or anything else. Someone else is welcome to try a fix for that at a later date, but I've used user_age = 0 as a replacement in the meantime, which should do largely the same thing. Filter created and Monitoring... for now. EggRoll97 (talk) 04:30, 7 April 2025 (UTC)[reply]

    IP editing of triple quoted text

    [edit]
    • Task: What is the filter supposed to do? To what pages and editors does it apply?

    The filter is meant to prevent vandalism of what is typically the name of the article in text.

    • Reason: Why is the filter needed?

    I see this once in a while in vandalism by IPs.

    • Diffs: Diffs of sample edits/cases. If the diffs are revdelled, consider emailing their contents to the mailing list.

    https://en.wikipedia.org/w/index.php?diff=1281666529 The way this filter would work is by detecting text encompassed in triple quotes at the start of the article (although probably after infoboxes) and doing something in that case. Wildfireupdateman :) (talk) 20:56, 22 March 2025 (UTC)[reply]

    I'm thinking about edge cases such as multiple bolded names being present in the title (like Cougar, or for a less extreme case most species with both a scientific name and a common name). Also, are you planning to just log or tag them? Chaotic Enby (talk · contribs) 21:17, 22 March 2025 (UTC)[reply]
    I think maybe we could test whether in the old wikitext, the bolded text that was changed is the same as the page title. I think the end goal should be tag/warn/captcha. – PharyngealImplosive7 (talk) 23:12, 22 March 2025 (UTC)[reply]
    Some filter code could include:
    page_namespace == 0 &
    !("confirmed" in user_groups) &
    edit_delta < 5 &
    (
        stringy := "(?s)^.*?'''.+?'''";
        added_lines rlike stringy &
        removed_lines rlike stringy
    )
    PharyngealImplosive7 (talk) 23:32, 22 March 2025 (UTC)[reply]
    Probably also should add !(added_lines rlike "'''" + page_title + "'''"), otherwise it might flag other changes to the same paragraph. Chaotic Enby (talk · contribs) 23:42, 22 March 2025 (UTC)[reply]
    Good catch. I will add it now. – PharyngealImplosive7 (talk) 23:44, 22 March 2025 (UTC)[reply]
    The page_title filter would not work for the example that was linked in the request. Many pages are like that. You'd probably want to match the first bolded term in the removed lines and check if it's still in the added lines. Ponor (talk) 23:55, 22 March 2025 (UTC)[reply]
    If we disabled the global flag, we probably could make the filter only match the first bolded text. I'll implement that in the sample above. I just realized that you can't modify the global flag because it is controlled by the engine's settings, so I modified the pattern slightly. – PharyngealImplosive7 (talk) 00:16, 23 March 2025 (UTC)[reply]
    I have a filter like that on another wiki and it works great, probably one of the best filters when it comes to casual vandals. It's set to prevent saving unless an edit summary (10ish characters) is given: most vandals don't bother to read the notice and eventually quit. Not all cases need to be covered, checking whether ^'''+(...) are the same in removed and added lines is sufficient. Ponor (talk) 21:53, 22 March 2025 (UTC)[reply]
    Do you have exceptions for summaries like "fixed typo" and "added content" (typical canned ip summaries?) Edits with those summaries should probably not be saved. Wildfireupdateman :) (talk) 00:58, 23 March 2025 (UTC)[reply]
    I have it in some other filters, though I can't say I see those canned responses very often. When asked for input, in a message that starts with "This action has been automatically identified as harmful, and therefore disallowed.", most vandals just quit. That's my experience IRL. Ponor (talk) 00:14, 24 March 2025 (UTC)[reply]

    First of all, I'd set some nice goals. These edits should pass:

    '''Subject''' is
    +
    A '''subject''' is
    '''Subject''' is
    +
    '''''Subject''''' is
    In architecture, the '''subject''' is
    +
    The '''subject''' is
    The '''subject''' is
    +
    In architecture, the '''Subject One''' is

    These edits should be prevented or challenged (Green tickY ask for edit summary? Red XN captcha?):

    '''Subject''' is
    +
    '''Subject vandal''' is
    '''Subject''' is
    +
    '''Vandal''' is
    The '''subject''' is
    +
    The '''subject vandal''' is
    The '''subject''' is something that
    +
    is something that

    So something along these lines should work for most articles:

    &
    action == "edit"
    &
    (
       subject := get_matches("(?:^|\n)(?:In [^,]{1,25}, )?(?:[Aa] |[Tt]he )?'''+([-–\w ]+)'''", removed_lines)[1];
       
       subject /*no action if subject was not found, for any reason*/
       & 
       ( lcase(subject) != lcase(get_matches("'''+([-–\w ]+)'''", added_lines)[1]) )
    )

    If you want to ask for their edit summary (anything longer than 15 characters, for example), set filter to disallow (with a nice message) and add to the filter the following:

    &
    
    (/*change of subject needs to be explained, most vandals will quit*/
       summ := get_matches("(?:/\*[^*]+\*/)?(.*)", summary)[1];
       (length(summ) < 15)
      |(length(summ) > 250)
    )

    I've had a filter like this running for a few years, and from the log I can tell it works perfectly fine. Ponor (talk) 00:08, 24 March 2025 (UTC)[reply]

    Significance-misleading edits

    [edit]
    • Task: Catch edit summaries usually associated with minor edits, but attached to major edits instead.
    • Reason: It is not allowed to use misleading edit summaries, and patrolling recent changes, I've encountered misleading edit summaries.
    • Diffs: Special:Diff/1282174235 (Way more than this are targeted)
    • Code: sum := "typo|spelling|error|( |^)link( |$)|gramm[ae]r"; significant := edit_delta > 15; significant & (summary rlike sum)

    Faster than Thunder (talk | contributions) 20:34, 24 March 2025 (UTC)[reply]

    I would bump the size up from 10 to maybe 25-50 (although it actually wouldn't be able to catch the example edit even at >10). Another idea might be to check IP edits for "typo" and see if they added any extra spaces (indicative of adding another word, which means they were not fixing typos). Wildfireupdateman :) (talk) 22:42, 24 March 2025 (UTC)[reply]
    For the typical "canned" summaries we can use the regex in 633 (hist · log): "^(?:/\* .* \*/\s?)?(?:Fixed typo|Fixed grammar|Added links|Added content)$". – PharyngealImplosive7 (talk) 23:32, 24 March 2025 (UTC)[reply]
    1. Done.
    2. Not "^...$", to prevent bypassing. Faster than Thunder (talk | contributions) 00:59, 25 March 2025 (UTC)[reply]
    any extra spaces (indicative of adding another word, which means they were not fixing typos). I recently corrected "atleast" to "at least". We need to make sure the added spaces are outside of the word. The code should not match something like "sp, unsourced" where I'm both fixing a typo and removing an unsourced statement in one edit. That would have a high edit delta, but the presence of the major edit keyword "unsourced" in addition to the minor edit keywords means it's a major edit. This could be done by adding ^( and )$ from the other filter. The synonyms at WP:ESL#Spelling, WP:ESL#Typo, WP:ESL#Grammar, and WP:ESL#Links: internal may be useful. Finally, I don't see why the "added content" part of added (links|content) is "usually associated with minor edits". 216.58.25.209 (talk) 06:42, 26 March 2025 (UTC)[reply]
    Filter 970 (hist · log) would have caught this edit, but the edit_delta was only 7. What you're really looking for is edit distance, which unfortunately AbuseFilter does not measure at the byte level. Not saying that 970 can't be improved in some other ways. Suffusion of Yellow (talk) 00:55, 25 March 2025 (UTC)[reply]
    Suggested at phab:T390508. Faster than Thunder (talk | contributions) 18:40, 30 March 2025 (UTC)[reply]

    Careless moves to mainspace

    [edit]

    There should be a filter to block moves from "User:Username/Foo" to "Username/Foo". This is a fairly common error, and never what we want. * Pppery * it has begun... 15:10, 26 March 2025 (UTC)[reply]

    I'll go ahead and make some sample code:
    exp := rescape(user_name) + "\/.+";
    action == "move" &
    (
        (
            moved_from_namespace == 2 &
            moved_to_namespace == 0 &
            (
                moved_from_title rlike exp &
                moved_to_title rlike exp
            )
        ) ^ (
            moved_from_namespace == 2 &
            moved_to_namespace == 2 &
            (
                moved_from_title rlike exp &
                moved_to_title rlike ".+" &
                moved_to_title != user_name
            )
        )
    )
    PharyngealImplosive7 (talk) 18:34, 26 March 2025 (UTC)[reply]
    We might want to start by looking at all page moves from namespace 2 to namespace 0 that are ultimately undone or result in a deletion. We have the log data. Daniel Quinlan (talk) 00:50, 27 March 2025 (UTC)[reply]
    Another thing to check for could be "User:Username/Foo" -> "User:Foo". * Pppery * it has begun... 19:04, 30 March 2025 (UTC)[reply]
    Added code for that scenario also. – PharyngealImplosive7 (talk) 17:22, 31 March 2025 (UTC)[reply]
    PharyngealImplosive7, I fixed the code above and added some more parenthesis for the boolean OR logic. Codename Noreste (talk) 20:00, 31 March 2025 (UTC)[reply]
    I feel that here, it's better to use a boolean XOR ^ to exclude any weird edge cases, but otherwise thanks for the help. – PharyngealImplosive7 (talk) 20:08, 31 March 2025 (UTC)[reply]

    I think this can be simplified to something like this:

    self_page_pattern := rescape(user_name) + "\/";
    action == "move" &
    moved_from_namespace == 2 &
    moved_from_title rlike self_page_pattern &
    (
      (moved_to_namespace == 0 & moved_to_title rlike self_page_pattern) |
      (moved_to_namespace == 2 & !(moved_to_title rlike self_page_pattern))
    )

    Some notes:

    • The common conditions are moved to the top to simplify and improve the performance a bit. We can move them back inside later if needed.
    • a & b & (c & d) is same as a & b & c & d and the latter is simpler.
    • moved_to_title rlike ".+" is always true for any title that's not an empty string. If it's always true, it can be dropped.
    • moved_to_title != user_name would be true for any title that's not the same as the username, warning people for reasonable moves. If we're planning to do a warning filter, I think we can warn people for any moves from their user space to any other user's space so I updated that term.
    • Boolean "or" is fine here. It's clearer and we'd have bigger issues to worry about if both conditions were ever true.
    • I renamed the regex variable.

    I hope to carve out some time to do a basic analysis on page move logs to see if there are any other common cases to consider. I think there are probably additional namespaces that can be included in the moved_to_namespace terms. Daniel Quinlan (talk) 20:55, 31 March 2025 (UTC)[reply]

    Thanks for simplifying it. I don't know what I was thinking in terms of the XOR (someone should really trout me for that) - it's probably because I'm sleep-deprived. – PharyngealImplosive7 (talk) 21:58, 31 March 2025 (UTC)[reply]
    Filter created. I ended up making some changes based on testing the logic on the last one million page moves.
    • User talk space moves are handled.
    • The first check is expanded to more namespaces (0, 1, 4, 5, 8, 9, 10, 11, 118, 119, 828, 829) and matches more broadly for the username.
    • I added !contains_any(user_groups, "extendedconfirmed", "sysop", "bot") as an early condition.
    • Testing exposed some uncommon false positives. I added exceptions for those.
    I used MediaWiki:Abusefilter-warning-badmove for the disallow message after revising the message to be more generalized and a bit more AGF. Daniel Quinlan (talk) 04:38, 2 April 2025 (UTC)[reply]
     Done for the bot. – PharyngealImplosive7 (talk) 18:00, 6 April 2025 (UTC)[reply]