Wikipedia:Request a query
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
The database replicas do not have access to page content, so requests which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.
Blocks w/ "copyright" in block log
[edit]Is it possible to generate a query for blocks / recent (most recent 100?) blocks that mention the word "copyright" in the block log?
(In case it's relevant, I would like to see any copyright blocks that don't get reported at the noticeboards for cleanup purposes and do the fancy thing some people do and put it in {{Database report}}... so like if somebody wants to say why that's an awful idea please feel free.) GreenLipstickLesbian💌🧸 10:20, 28 November 2025 (UTC)
- Here are the most recent 100: quarry:query/99310. The IP addresses unfortunately aren't in a very readable form but I've included them for completeness where they exist. You can change the number fetched by tweaking line 7. J11csd (talk) 12:36, 28 November 2025 (UTC)
- The human-readable ip is in bt_address. —Cryptic 12:47, 28 November 2025 (UTC)
- Oh, so it is! I glanced right over it :-S
- Query has been updated to use this field and also to do case-insensitivity (somewhat more) correctly. J11csd (talk) 12:57, 28 November 2025 (UTC)
- Some more thoughts:
- The explicit collation is unnecessary; converting to utf8 is enough.
- Probably want to find reasons matching %copyvio% too.
- It's not clear that the block table is the right tool for this; it'll omit expired ones. logging with log_type='block' and log_action!='unblock' will include those. Whether they should be included or not isn't clear. (On the other hand, extracting expiry, whether the block is partial, etc. from log_params is fraught; on yet a third hand, none of that's included in the query anyway.)
- This can probably be made smarter by omitting usernames/ips that already have a matching Wikipedia:Contributor copyright investigations/ subpage linked from Wikipedia:Contributor copyright investigations/Bottom. That'll miss the numbered ones like /20111108 (I take it those are for usernames that match real names? The ones I glanced at were.), but nothing to be done for those.
- —Cryptic 13:03, 28 November 2025 (UTC)
- Rather embarrassingly, I just realized I forgot to respond to this : thank you @J11csd and @Cryptic, this is exactly what I was dreaming of! Best early xmas present ever!
- And re: Cryptic's last point- yes, you guessed correctly, numbered CCIs are primarily for real or real-sounding names, to avoid accidentally harming the real person if somebody Googles them. The number is actually the date the CCI was opened, which, rather unfortunately, is not always the same day as the block. GreenLipstickLesbian💌🧸 11:47, 4 December 2025 (UTC)
- The human-readable ip is in bt_address. —Cryptic 12:47, 28 November 2025 (UTC)
Twinned WP–Help page titles
[edit]I am looking for a list of page titles which exist in both the Help and Wikipedia namespaces, where neither is a redirect. This arises out of this recent merge proposal regarding Wikipedia:Translation and Help:Translation, with the discussion turning partly on what would be appropriate to each page if they were not merged. I later realized there are other such pairs, such as Wikipedia:Substitution and Help:Substitution where the same question could be asked, and I wondered if there aren't many such WP-Help pairs, where it would be profitable to have a higher level discussion at VPI or somewhere about such twinned pages. Hence, this request.
As for SELECTion columns, page title (without namespace) in col 1, I would say, and then some paired columns with adjacent 'Help' and 'WP' stats, perhaps something like this:
- Title, Age-H, Age-W, Watchers-H, Watchers-WP, PgViews30days-H, PgVw30-WP, UniqueUsers-H, UqUsers-W, TotEdits-H, TotEdits-W
or whatever seems useful and not a pain in the neck to produce.
For the WHERE, I thought of exclusions for redirects, but probably there could be others that might be helpful, maybe non-disambig page (Template:Disambiguation), non-stub (Template:Stub), non-essay (Template:Essay), non-set index article (Template:Set index) whatever seems good here.
The tricky part, it seems to me, is how to sort them. It should be something that elicits those pairs that are 'most in need of community attention' at the top, admittedly a very wishy-washy description. Some proxies that come to mind for that are avg page views, avg page watchers, avg total number of unique editors, avg age; and do we take the max instead of avg, or do we just pick stats based on one NS? Maybe try a few sorts, and see how they compare. Anything that elicits 'Translation' and 'Substitution' towards the top are probably good; then again, I don't know what the competition is. Thanks! Mathglot (talk) 20:59, 12 December 2025 (UTC)
- Note to self: for VPI, html-ize, add cols for merge/Rfcs, remarks, etc. Mathglot (talk) 21:14, 12 December 2025 (UTC)
- The watchlist table isn't in the public replicas for privacy reasons, so we don't have access to watchers at all - the only way to get them is to go to Page Information onwiki for each of them, one by one. And page views isn't in the main database at all; the least painful way is to paste the results onto a page onwiki, feed that into https://pageviews.wmcloud.org/massviews/, and manually paste the results of that back onto the page. But the rest I can do. There's only a few. quarry:query/99985. Do you also want ones where one or the other (or both) is a redirect, but points somewhere besides the corresponding page in the other namespace, and not both them pointing at the same third page. There's 319 of those. —Cryptic 02:16, 13 December 2025 (UTC)
- Oh, this is great, thanks! And I am relieved that there aren't a lot more (although this is enough to provide great data for a VPI discussion, and keep us busy afterward). Thank you for the tip about massviews; with output limited to a few dozen, that sounds worth doing once I wikitable-ize it. Your refinement suggestion regarding redirects sounds like a good idea. In that case, do you envision the two pagename columns becoming four, or what would happen? I'm just thinking ahead to the sortable wikitableized version, and making sure we have a column we can sort on to push all the redirects to top or bottom. If you need a test item to peruse, I know that there is a Help:Table, and that WP:Table[noredir] is a redirect to a set index page. If there's 319, can we have that as a separate quarry id, so I can link the one you already have, and that one, too? Mathglot (talk) 02:30, 13 December 2025 (UTC)
Oh, I was suspicious of the rows with only 1 editor, and saw that Wikipedia:Content model for example is a soft redirect. Is it easy enough to exclude those? Otherwise, the '1 editor' is a good enough flag to just ignore them, or I can do it when I reprocess the file.Mathglot (talk) 02:35, 13 December 2025 (UTC)- Actually, never mind that. VPI readers may want to know about those. Mathglot (talk) 02:37, 13 December 2025 (UTC)
- quarry:query/99986 for the mismatched redirects. Some of them are simply a product of the previous query (WP:Visual editor and Help:Visual editor point to WP:VisualEditor and Help:VisualEditor respectively); some are distinct cases like Help:National varieties of English being a sort-of-a-disambig while Wikipedia:National varieties of English redirects to Wikipedia:Manual of Style#National varieties of English.Soft redirects aren't redirects so far as the software's concerned, just pages with a template on them; I could detect them by looking for transclusions of {{soft redirect}}, but wouldn't be able to tell what they soft-redirected to, like I can with normal redirs. —Cryptic 02:44, 13 December 2025 (UTC)
- Super useful; thanks for your efforts. Both of these should stimulate some interesting discussion. I've tagged this discussion as 'resolved' at the top. Much appreciated! Mathglot (talk) 03:05, 13 December 2025 (UTC)
- The watchlist table isn't in the public replicas for privacy reasons, so we don't have access to watchers at all - the only way to get them is to go to Page Information onwiki for each of them, one by one. And page views isn't in the main database at all; the least painful way is to paste the results onto a page onwiki, feed that into https://pageviews.wmcloud.org/massviews/, and manually paste the results of that back onto the page. But the rest I can do. There's only a few. quarry:query/99985. Do you also want ones where one or the other (or both) is a redirect, but points somewhere besides the corresponding page in the other namespace, and not both them pointing at the same third page. There's 319 of those. —Cryptic 02:16, 13 December 2025 (UTC)
For those who are curious/interested in following this further, see Wikipedia:Namespace/Help vs. Wikipedia. Mathglot (talk) 09:15, 13 December 2025 (UTC)
Percentage and raw numbers of article edits made by members of each user group
[edit]Hi, I'm posting here after being directed this way by the folks at the village pump. I'm interested in quantifying the amount and fractions of article edits (excluding Talk, policy pages, etc.) performed by each user group: unregistered (anonymous), new accounts, confirmed and autoconfirmed, and extended confirmed.
For the purposes of the query, I would like to exclude edits by bots and those made using scripts. The ideal timeframe for data would be a whole year (preferably calendar year 2025 to date), and historical data for past years if possible. A total tally of edits (as far as there is data available) would also be appreciated.
I believe that this would be a great addition to the pages Wikipedia: Wikipedians and Wikipedia:Who writes Wikipedia?, as well as being a test of my working hypothesis that the vast majority of article edits are performed by a minority more experienced users, as per the Pareto principle.
Thanks in advance for your attention to this. Please let me know what your thoughts are on the feasibility of this request and feel free to suggest alternative or additional criteria or to request any assistance or information that may be needed to procure this data. CVDX (talk) 22:02, 29 December 2025 (UTC)
- Not at all feasible. "New accounts" isn't defined; autoconfirmed isn't an actual usergroup, it's computed anew each time it needs to be checked, and that's not scalable to the 60-something million edits that happen each year; even when the group is real, there's no way to tell what groups a user was in at the time they made an edit, and that's significant even for edits in the past year when you're talking about extended confirmed and lower; there's no practical way to exclude bot edits from accounts that were formerly bots, and that becomes very significant for historical data; and excluding script edits is complex, and only possible when a script is either well-established-enough that its edits are tagged, or when it consistently marks its edit summaries (a much smaller minority than you'd think). —Cryptic 22:37, 29 December 2025 (UTC)
- Things I can do that come fairly close: distinguish between registered users and ips and temporary accounts; exclude accounts that are currently bot-flagged; bin edits made by registered and temporary accounts (but not ips) by the user's current total edit count, with arbitrary cutoffs - I'd suggest 1 edit, 10, 500, and a couple thousand, to start with; bin edits by registered accounts by time elapsed between registration and the edit, or by absolute registration time if you prefer (which would answer questions like "what percentage of edits are made by accounts that registered before 2010?"). —Cryptic 23:43, 31 December 2025 (UTC)
Pages in a category, but not linked on a corresponding page
[edit]Specifically, I'm looking for pages that are in Category:Game Boy Advance games but aren't linked on the List of Game Boy Advance games.
I gave it a try, but my SQL skills are lacking and I got stuck on the inside of the NOT EXISTS part: quarry:query/100504. --Retro (talk | contribs) 19:09, 31 December 2025 (UTC)
- You were pretty close. quarry:query/100518. —Cryptic 23:38, 31 December 2025 (UTC)
- Appreciate the help! Retro (talk | contribs) 00:33, 1 January 2026 (UTC)