Wikipedia talk:Manual of Style/Capital letters
| This project page does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
| ||||||||||||||
Capitalization discussions ongoing (keep at top of talk page)
[edit]Add new items at top of list; move to Concluded when decided, and summarize the conclusion. Comment at them if interested. Please keep this section at the top of the page.
Current
[edit](newest on top) Move requests:
- Talk:2025 Interim Constitution of Syria#Requested move 3 November 2025 – change to sentence case?
- Talk:Individual savings account#Requested move 2 November 2025 – change to title case?
- Talk:Wuhan Metropolitan Area intercity railway#Requested move 24 October 2025 – lowercase more words? uppercase more words?
- Talk:Circular line (Taipei Metropolitan Area)#Requested move 23 October 2025 – lowercase "Metropolitan Area"?
- Talk:Corpus Christi-Kingsville Combined Statistical Area#Requested move 20 October 2025 (six articles) – lowercase "Combined Statistical Area"?
Other discussions:
- Wikipedia talk:WikiProject UK Railways#Railway line article names
- Wikipedia:Naming conventions (UK railway lines) – a proposed naming convention guideline
- Wikipedia:Redirects for discussion/Log/2025 August 13#Hot coffee – could this be referring to a name or is it primarily the beverage?
- Talk:North Yemen civil war#Capitalising "26 September revolution" - in prose?
- Talk:Left-Bank uprising#Capitalization – Should "Left-Bank" be capped?
- Talk:Thirty Years' War/Archive 2#Imperial v imperial
Concluded
[edit]Extended content
|
|---|
|
RfC on the meaning of "usually" as used in MOS:MILTERMS
[edit]Should the spirit and intent of usually capitalized in sources
at MOS:MILTERMS be taken as consistent with the general advice on capitalisation given in the lead of MOS:CAPS or is the spirit and intent to create a substantially different and lower threshold for capitalising the types of events named.
Cinderella157 (talk) 03:41, 4 June 2025 (UTC)
The subject text at MOS:MILTERMS is as follows:
Accepted names of wars, battles, revolts, revolutions, rebellions, mutinies, skirmishes, fronts, raids, actions, operations, and so forth are capitalized if they are usually capitalized in sources (Spanish Civil War, Battle of Leipzig, Boxer Rebellion, Action of 8 July 1716, Western Front, Operation Sea Lion).
The matter is discussed above in the section MOS:MILTERMS.
Please comment by indicating Consistent or Lower. Cinderella157 (talk) 03:42, 4 June 2025 (UTC)
Notified at MILHIST. Cinderella157 (talk) 03:48, 4 June 2025 (UTC)
- Consistent MOS:MILTERMS is part of MOS:CAPS. The opening paragraph of MOS:MILTERMS states:
The general rule is that wherever a military term is an accepted proper name, as indicated by consistent capitalization in sources, it should be capitalized.
The general advice in the lead paragraph of MOS:CAPS is often paraphrased as requiring consistent capitalisation. Some have argued that usually herein means any degree of usage just greater than 50%. As Firefangledfeathers notes above:It's odd to see an unexplained clash between the general rule and the specific rule, and it's untenable to have the clash be open to interpretation
. However, such an interpretation clashes not only with the general rule but the more proximate rule in the lead paragraph at MILTERMS. The issue is not just whether usually should reasonably be interpreted as greater than 50% but whether doing so reflects the spirit and intent of the guidance. At multiple places, we are told that the spirit of any P&G is paramount rather than skirting the spirit on some technicality - perceived or real (eg WP:P&G, WP:5P, WP:IAR?, WP:PRINCIPLE, WP:MR and WP:LAWYERING). If the spirit of using usually is intended to create a lower threshold then we would need a substantive reason for doing so.
- The Merriam-Webster definition for usually is:
according to the usual or ordinary course of things : most often : as a rule : customarily, ordinarily
. This source collates linguistic studies on how various terms (including usually) are usually perceived as percentages - reporting that usually is perceived as 70 - 84 percent of the time. It also gives the definition from the OED:In a usual or wonted manner; according to customary, established, or frequent usage; commonly, customarily, ordinarily; as a rule
. Those arguing a lower threshold would seize on one part of the definition most often as being just greater than equal. As with any law, rule etc, the meaning of a definition should be read in the fuller context and a balance of all the parts. Seizing on one part in isolation is the epitome of a WP:PETTIFOGING argument. Considering the definition and linguistic interpretation of usually, the meaning is consistent with both the general advice in the lead of MOS:CAPS and the lead paragraph of MILTERMS. These are subject to the same conflicting views on whether these are proper names as any other name on WP which is descriptive and take the definite article in prose - unless they are consistently capped in sources. - Many editors are of a view that any name having a specific referent is a proper name that must be capitalised. While a specific referent is a property of a proper name, it is not a defining property since specificity of referent is also conveyed by the definite article (the). If there is anything that defines a proper name, it is that it is not descriptive. However, it is because of these different views that MOS:CAPS relies on consistent capitalisation in sources to determine what we capitalise rather than semantic arguments of what defines a proper name. This is the consensus of the broader community and is reflected by the consensus of a vast majority of (RM) discussions both generally and more specifically for battles, wars etc. As a group, the names identifying many battles, wars etc take the definite article in prose and are inherently descriptive - eg the Battle of Waterloo is a battle that occurred near Waterloo. As a group, these are commonly capitalised but there are significant number of exceptions for specific battles, wars etc such as the Syrian civil war, where Syrian civil war is not consistently capped in sources. There is no apparent substantive reason why these should be considered as a group as an exception from the general guidance, particularly when the lead paragraph at MILTERMS reinforces that the general guidance apples to MILTERMS.
- Asserting that usually creates a lesser standard than the general guidance is clearly contrary to the usual meaning and the spirit and intent, reading in the fuller context of MOS:CAPS as a whole and the more specific guidance at MILTERMS. Cinderella157 (talk) 03:43, 4 June 2025 (UTC)
Many editors are of a view that any name having a specific referent is a proper name that must be capitalised.
Those are the rules of the English language: "Names of people, places and organisations are called proper nouns. We spell proper nouns with a capital letter"[1]While a specific referent is a property of a proper name, it is not a defining property since specificity of referent is also conveyed by the definite article (the).
"The" is not necessary to make something a "specific referent", we say "Berlin", not "The Berlin"; adding "the" is an exception that arose through use, e.g. "The Grand Canyon". TurboSuperA+(connect) 09:50, 10 June 2025 (UTC)- Yes, we do capitalise proper nouns. This is not disputed. However, because something is spelled with a capital letter, that does not make it ipso facto a proper noun|name. English often capitalises descriptive names for emphasis, significance or as a term of art. If you read the Merriam-Webster definition or our article proper noun you will see that proper nouns are also not descriptive. I was not saying that proper nouns must take the definite article (the) to be specific (as you would indicate with the example Berlin). What I was saying is that the definite article confers specificity and therefore, specificity of referent is not a defining property of a proper noun. Consequently, names such as the Cimean B|blocade or the Syrian C|civil W|war are not ipso facto proper nouns because they take the definite article in prose. Your example the Grand Canyon is considered a proper noun even though it might appear descriptive (the canyon which is grand), This is partly because it is common to capitalise descriptors such as canyon, bay, sea etc (but not all descriptors) in geographical names. Secondly, we should not be confused by the etymology of the name where somebody said this looks grand, let's call it the Gand Canyon since they might just as easily called it something else like Kings Canyon. The ngram for Grand Canyon here is pretty much always capped compared with Syrian civil war here [contexturalised for prose]. However, because WP relies on usage in sources to determine capitalisation, we capitalise American Civil War because, even though it is not a true proper noun, it is consistently capitalised in sources (see here).
- If we remove usually in the sentence at MILTERMS, it begs the question as to what is an accepted name of wars etc, since clearly, not all wars, battles etc are proper nouns. They are descriptive in nature, they take the definite article in prose and not all are consistently capitalised in sources. For the rest of this, you can read my reply to Chicdat below. Cinderella157 (talk) 11:15, 10 June 2025 (UTC)
- Remove usually altogether – text was added without discussion by Dicklyon six years ago. I will copy my comment from a recent RM:
The operative word here is "accepted" – thus, the event has an actual, accepted common name, not a descriptive name (e.g. American Civil War is accepted, War in Afghanistan is [descriptive]).
This is putting into words common sense, something that has never really existed at MOS:CAPS. Accepted = proper name. Proper names are capitalized. Please find any grammar or style guide that contradicts that. 🐔 Chicdat Bawk to me! 11:06, 7 June 2025 (UTC)- Your attempted to remove usually but were reverted by another with the edit summary:
... if this text has been here for 6 years it has implicit consensus ...
Consequently, your comment is not a surprise. Removing usually begs the question: what is an accepted name - but you already answer this question:Accepted equals proper name
[equals sign won't render here]. Therefore, we capitalise names of wars etc if they are proper names. WP (MOS:CAPS) treats those names which are consistently capitalised in sources as a proper name (per the lead).Accepted equals proper name
represents the spirit and intent of the subject sentence. As you note, the names given to wars, battles, revolutions etc are not all proper names and the names of articles using these terms are not always correctly capitalised. Without usually, there is no conflict between the subject sentence and the lead paragraph of MILTERMS or the general advice in the lead. If usually is understood as synonymous with consistently, there is no conflict either. Such an understanding is consistent with reading the definition of usually on balance and the evidence of linguistic studies. Arguments that usually creates a lower threshold for caps than the general advice is based on an aberrant meaning of usually (by taking one part of the definition in isolation rather than on balance) such that the subject sentence would be inconsistent with the general advice. As you have identified,accepted equals proper name
, and such an argument is contrary to the spirit of the subject sentence as you have identify it. I see that adding usually affirms the consistency with the general advice and believe this was the intent of adding it. Perhaps Dicklyon can affirm this. With or without usually the intent of the subject sentence is to affirm the general guidance in the lead. Cinderella157 (talk) 01:57, 8 June 2025 (UTC)
- Your attempted to remove usually but were reverted by another with the edit summary:
- Revert and remove "usually" per Chicdat, proper names are uppercased on Wikipedia. To lessen that obvious commonsense view, the word "usually" (which means 'most often') was added without discussion and has since been used to lowercase proper names. An easy fix to bring the guideline back the status of its original meaning. As for the meaning of the word "usually", the only objective term used in dictionary meanings is "most often", which asserts a majority, or the name most commonly used, and nothing more. Randy Kryn (talk) 11:43, 7 June 2025 (UTC)
- Amending my statement, as people are actually saying "usually" doesn't mean what it means. Either "usually" is kept, which sets the standard of "most often" (i.e. either 50.1% or the name used more than any other) or the wording reverts to include all wars, battles, etc. "Usually" at least sets a bar for those who want to keep it, but it certainly doesn't mean "always" or "consistently", it means most often, and is maybe the best idea to use it for all title casings and not only MILTERMS. Randy Kryn (talk) 12:26, 10 June 2025 (UTC)
- Lower and remove "usually". Proper nouns are capitalised in the English language. TurboSuperA+(connect) 08:22, 9 June 2025 (UTC)
- Remove "usually" as a noise word. Proper nouns should be capitalised. Hawkeye7 (discuss) 08:37, 9 June 2025 (UTC)
- Remove "usually" per Chicdat specifically and above in general. The Kip (contribs) 19:04, 9 June 2025 (UTC)
- Consistent – If the spirit and intent of the MILHIST part of the MOS is to have a lower threshold for determining what's a proper name, that's problematic. Nobody will disagree with statements like Hawkeye7's that "Proper nouns should be capitalised", but the MOS tells us how to decide what is a proper noun/name. Having an editor assert "proper name" when it's commonly found lowercase is sources (as we see commonly in MIL RM discussions) is not the right answer here. Reject the attempt to have a lower threshold of capitalization in this one topic area. Dicklyon (talk) 23:14, 9 June 2025 (UTC)
- Please don't attempt to change the English definition of "usually". It means "most often", and nothing less or more. So no, it is not another word for "consistent". Randy Kryn (talk) 12:29, 10 June 2025 (UTC)
- Consistent, per Cinderella157 and Dicklyon. Gawaon (talk) 07:37, 10 June 2025 (UTC)
- Consistent based on reading the relevant sections of both policies. Seems pretty straightforward: follow abundant reliable sources.
- MOS:CAPS:
"Wikipedia relies on sources to determine what is conventionally capitalized; only words and phrases that are consistently capitalized in a substantial majority of independent, reliable sources are capitalized in Wikipedia."
- MOS:MILTERMS:
"[W]herever a military term is an accepted proper name, as indicated by consistent capitalization in sources, it should be capitalized."
- As far as Cinderella157's supporting comment, WP:TLDR. Penguino35 (talk) 14:07, 26 June 2025 (UTC)
- Consistent per above. There was never an agreement of intent to establish a lower threshold. That was a reinterpretation after the fact. And the word shouldn't be deleted, as the comments above show some desire for the absence of the word to be interpreted differently. — BarrelProof (talk) 21:26, 27 June 2025 (UTC)
- Broken RfC, this RfC is about the word "usually", not about replacing it with another word. Replacing it for another word with a different meaning falls outside the scope of the RfC question. It's either remove it or keep it as is. Wikipedia should not be changing the meaning of a word which is defined as "most often", and "spirit and intent" language is strange wording with no basis in guidelines or policy. "Usually" means what sources say it means, "most often". Either keep it or remove it, but don't redefine it. Randy Kryn (talk) 02:35, 28 June 2025 (UTC)
- There's nobody who could define English words once and for all, that's not how languages work. Words get their sense from their usage, and the usage can vary over time, region, and users. Dictionaries can help a lot, though of course they too will not always agree. I don't know from which dictionary you drew your "most often" description, but in Wiktionary I find the descriptions "Most of the time; less than always, but more than occasionally" and "Under normal conditions". But where in the "less than always, but more than occasionally" range do we want it to fall in this case? Or what are "normal conditions" and when do they no longer apply? Those are reasonable questions for an RfC to ask and as I understand this RfC, it's meant to do essentially just that – clarify the indented meaning of an inherently somewhat vague and ambiguous word for this specific case. Gawaon (talk) 06:56, 28 June 2025 (UTC)
- Bad RfC more or less per Randy Kryn. We have over the years developed an unfortunate habit of using words in ways that are different from and even contrary to their normal meaning though this double usage of words as terms-of-art is by no means limited to us. I even bear some small share of blame for that. I suppose in many cases it isn't that bad because frequency of usage and context allows people to figure out the intended meanings without too much difficulty. However we really want to avoid future occurrences even if it leads to somewhat dry technical language being employed.Thus it is logical to propose a rewording for clarity, or to remove a word, or even to remove the whole paragraph. If the intent here is to say that this is not an exception or special case then it shouldn't be there at all it is rather backwards to list something in an exceptions area only to say it is not an exception, please don't write guidelines that way. But what we should not be doing is having RfCs to redefine one specific instance of a word's appearance well unless you deliberately want to make projectspace even more confusing for new and casual editors.Assertions that we should draft imprecisely because semantic drift is inevitable are unconvincing and prove too much. If and when such shifts happen rewording can and will be done to maintain meaning, assuming practice doesn't shift, but we should strive to reduce ambiguity not create more of it. 184.152.65.118 (talk) 20:58, 12 July 2025 (UTC)
- Consistent is the best available option, since it reduces the impact that an unnecessary specific rule is having on a useful general rule. Better options would be to rework MILTERMS more significantly, or make a small change like replacing "usually" with "consistently". I oppose removing "usually", and I see the unexplained clash between it and both the MILTERMS opener and CAPS more generally to be untenable. Firefangledfeathers (talk / contribs) 14:10, 16 July 2025 (UTC)
- Remove usually per Chicdat's train of logic. If that means a "lower" standard, then so be it. ⇒SWATJester Shoot Blues, Tell VileRat! 22:07, 4 August 2025 (UTC)
RfC on the use of Google Ngram
[edit]RFCBEFORE: Wikipedia talk:Manual of Style/Capital letters#It is time we talked about Google Ngram
Discussion at RSN: Wikipedia:Reliable sources/Noticeboard#Google N-grams and 'consistent' answers
Should Google Ngram be deprecated in rename/move discussions?
- Yes
- No
@Cinderella157, Dicklyon, Sammy D III, Myceteae, Gawaon, Andy Dingley, Intothatdarkness, SchreiberBike, Hawkeye7, Blueboar, Rally Wonk, Stepwise Continuous Dysfunction, FactOrOpinion, NatGertler, Yesterday, all my dreams..., Randy Kryn, Chicdat, AjaxSmack, SMcCandlish, and Kowal2701: Pinging participants in the MOS:CAPS discussion, the RSN discussion, and those who might be interested in this RfC. I also left an rfc notice at Village Pump (policy), WikiProject English Language, WP:NCCAPS. If I forgot someone, I am terribly sorry. TurboSuperA+(connect) 13:58, 16 June 2025 (UTC)
- ‘’’No’’’, both because deprecating something from discussion is not a coherent suggestion, and because “useful” does not mean “perfect”. If you want to put together an essay to be used in responding to people trying to use it as a definitive statement, go ahead. Nat Gertler (talk) 10:11, 17 June 2025 (UTC)
Clarification. This question and RfC applies specifically to MOS:CAPS move/title discussions where the move is done to lowercase or uppercase letters in the name. Google Ngram should never be used and should be ignored in determining consensus. 19:12, 16 June 2025 (UTC)
- Yes. There is simply way too many problems and uncertainty in the results. Here Dicklyon shows how self-published books skew Google Ngram results. Here I show that results from the British English and American English corpuses are different, and Gawaon saying that there is no reliable way to tell which type of English it is. Here Hawkeye7 points out that there is no way to see the context how a word/term is used from Google Ngram results. Deprecating Google Ngram would also prevent low-effort move requests and editors would actually have to examine reliable sources. I believe this would cut down on the volume of requests and pave way for actual discussion, rather than throwing up a Google Ngram link and thinking that that is all that is required. Ultimately, using Google Ngram is more trouble than it's worth. As to alternatives, we can always examine how the term is capitalised in the sources cited in the article, there's Google Scholar and good-old fashioned discussion. TurboSuperA+(connect) 14:02, 16 June 2025 (UTC)
- N-grams have value although not the final end-all of discussions. "Official names" should have much more influence, as they are usually the common names that the public uses and attributes as proper names. The ban on counting official names as "official" has always been a head-scratcher to me. Randy Kryn (talk) 14:12, 16 June 2025 (UTC)
- Randy, you mistake the policy. We don’t ban official names… we simply favor whatever is the more commonly used name. As you note, that often is the “official” name… but not always. Blueboar (talk) 14:50, 16 June 2025 (UTC)
- No - Ngrams are a useful data point in move discussions. They should not be the only data point, but they should be examined and “in the mix”. Blueboar (talk) 14:50, 16 June 2025 (UTC)
- Clarification requested:
- I take this means we would extend the definition from deprecated sources (which applies to citing sources in articles) to mean that Ngrams should almost never be used in RMs and related discussions. Further, that in assessing consensus, admins should ignore or give substantially less weight to Ngrams. Is this correct?
- Is the intended scope limited to MOS:CAPS-related moves? I would note that Ngram data features in other RM discussions. Ngram is one of the tools mentioned at WP:DPT (part of the Wikipedia:Disambiguation editing guideline) as a tool that may be helpful.
- --MYCETEAE 🍄🟫—talk 14:42, 16 June 2025 (UTC)
See responses and follow-up down the page here: [2] --MYCETEAE 🍄🟫—talk 18:43, 16 June 2025 (UTC)- Answered here and subsequently updated in the main RFC question. --MYCETEAE 🍄🟫—talk 01:58, 17 June 2025 (UTC)
- No. I'm not sure why there's a mention of "rename / move discussions" as this is not WT:MOVE or WT:TALK but will guess it's because MOS:CAPS has been used in such discussions. I'm not sure why there's a mention of "deprecated" but will guess this is a use of the word in its typical English sense rather than the WP:RSP sense. I have no problem with TurboSuperA+ deprecating my use of any source in a discussion, but it should be up to me to decide whether I want to do it for a particular word in a discussion of that word (i.e. in context), and up to the other participants in that discussion whether they want to disagree. Peter Gulutzan (talk) 14:47, 16 June 2025 (UTC)
- Update: I guessed wrong. TurboSuperA+ has changed the RfC. So the word "deprecate" is not being used in the typical English sense, and the RfC affects only WT:MOSCAPS. Now I'm guessing that the sentence "Google Ngram should never be used and should be ignored in determining consensus." only (due to the previous sentence) means "Google Ngram should never be used and should be ignored in determining consensus -- if and only if Google Ngram Viewer results are used in WT:MOSCAPS in move/title discussions where the move is done to lowercase or uppercase letters in the name (i.e. article title)." If so, ignore my "No". Peter Gulutzan (talk) 19:58, 16 June 2025 (UTC)
- No per Blueboar. They are a data point that should be considered, but shouldn't be treated as the end-all and be-all. I would also note that this RFC would seemingly apply to all uses of ngrams, not just those involving capitalization. ~~ Jessintime (talk) 14:54, 16 June 2025 (UTC)
They are a data point that should be considered, but shouldn't be treated as the end-all and be-all.
- And yet. Not to mention it is easy to manipulate it. e.g.
- - This nomination is literally just a Google Ngram link. In that discussion it was suggested that recent results should be discounted because of a self-published book.
- - Here is another example. Notice, how Dicklyon said "in the last half-century", ignoring the results from before because they don't suit the goal.
- Why should something so easily manipulated to give a result one wants be used as the only argument to rename an article? It really doesn't make sense to me. TurboSuperA+(connect) 18:19, 16 June 2025 (UTC)
- Yes I don't think they should necessarily be removed from consideration, but they should NOT be considered the "be all and end all" in discussions, which seems to be the current standard. In my view RS, especially those used in the article in question, should always take precedence. In my experience this often does not happen, and the mighty Ngram is presented as gospel. Intothatdarkness 14:57, 16 June 2025 (UTC)
- No. One shouldn't rely on it exclusively, of course, but it can be useful as part of a larger picture to take into account. Gawaon (talk) 15:09, 16 June 2025 (UTC)
- Comment it's clearly not RS and I don't think the RSN thread was ambiguous or needed closure. As has already been pointed out, it is an arbitrary corpus interpreted by unreliable OCR that may or may not reflect actual usage trends and is virtually guaranteed to create ghost trends if enough comparisons are generated. Hence, it is reliable only for its own content which will almost always be UNDUE unless mentioned by an actual RS.But there's never been a hard-and-fast proscription against any discussion of GUNREL on talk pages. Situationally they can still be useful, it's not common, and the scope for use tends to be narrow, but they do have their purposes.From an RM perspective it should be treated as any other GUNREL would be. I don't see the need for a blanket proscription on bringing it up in discussions. 184.152.65.118 (talk) 15:12, 16 June 2025 (UTC)
- No. Although I'm awaiting a response to my questions above, I don't foresee a change in my position. Ngram is a useful, if crude, tool for assessing many usage questions relevant to move discussions. To the extent it's overused, misleading, or inappropriate in a particular context, such objections should be raised in RM discussions or elevated to a discussion about updating the MOS/naming conventions for a particular subject area. The guidance at Wikipedia:Search engine test and two BEFORE discussions linked above provide useful considerations for using Ngram. --MYCETEAE 🍄🟫—talk 15:53, 16 June 2025 (UTC)
- The problem is some of those discussions are bludgeoned to death by people who rely on Ngrams to the exclusion of all else. They may have their purpose, but in many instances they've exceeded that purpose and become definitive. Intothatdarkness 15:57, 16 June 2025 (UTC)
- Example? Dicklyon (talk) 16:05, 16 June 2025 (UTC)
- When folks are WP:BLUDGEONING they should be warned and subject to disciplinary processes. If the problem is editor conduct, giving Ngram the scarlet letter is an inappropriate remedy. --MYCETEAE 🍄🟫—talk 17:31, 16 June 2025 (UTC)
- The problem is some of those discussions are bludgeoned to death by people who rely on Ngrams to the exclusion of all else. They may have their purpose, but in many instances they've exceeded that purpose and become definitive. Intothatdarkness 15:57, 16 June 2025 (UTC)
- Neither – it would be censorship to try to stop people from deprecating n-grams in rename/move discussions, and we should not be telling editors whether they should do so. If you can rephrase the RFC question to ask what you actually intended, I'll be happy to give a more in-depth answer. For now, look to the section you linked for info that refutes things like "there is no way to see the context how a word/term is used from Google Ngram results". Context is one the most important things the n-gram stats can get you information about. Dicklyon (talk) 16:16, 16 June 2025 (UTC)
- Comment (Summoned by bot) Please have mercy on people who are unfamiliar with Google Ngram and provide a precis of just what it is. Some of us are summoned by the bot and would like to make an intelligent comment, which we can't do if we don't have sufficient information on the subject of the discussion. Thanks in advance. Coretheapple (talk) 16:24, 16 June 2025 (UTC)
- The relevant background is in the section #It is time we talked about Google Ngram, a bit higher on this talk page. This RFC is a premature fork of that conversation, and ought to be closed pending some discussion of what a sensible question might be, if any. Dicklyon (talk) 17:20, 16 June 2025 (UTC)
- Google Books Ngram Viewer provides a general overview of Ngram beyond the current context. In RM discussions, it is often invoked as evidence in resolving various style and usage questions (or attempting to). Whether or not a word or phrase is usually capitalized in books, and whether the threshold aligns with the wording of MOS:CAPS and WP:NCCAPS is a not uncommon point of discussion and can be contentious. It is also raised in WP:COMMONNAME and other non-capitalization discussions, such as which of two or more synonyms is most often used to name a subject and whether particular usage is sufficiently common to be appropriate for natural disambiguation or a descriptive title. It is mentioned in non-capitalization contexts at Wikipedia:Search engine test and WP:DPT to give a sense of some other ways it may be used on WP. --MYCETEAE 🍄🟫—talk 17:49, 16 June 2025 (UTC)
- @Coretheapple, Google N-grams is a specialized search engine. It searches book contents rather than webpages. If you enter one or more words/phrases, it searches the contents of a very large set of books that have been printed over a very large range of time (it includes several centuries of books, and if I'm remembering right, the corpus is ~7 million books), and the output of searches are graphs showing how the relative frequency of the chosen words/phrases changes over time. It's case sensitive, and you can set some variables, such as what time range you want to search. Here's their example, and you might want to quickly try out a couple of your own choices. If you want more info, click on "About Ngram Viewer" at the bottom of that page. FactOrOpinion (talk) 17:55, 16 June 2025 (UTC)
- Can you show me please the results of these three precise strings? I'm struggling. (I've lost track of all these conversations but there is a move discussion at Five Freedoms).
- Five Freedoms of Animal
- Five Freedoms of animal
- Five freedoms of animal
- I also say NO to deprecating ngram in general. YES to deprecating it from discussions based on capitals. Ngram cannot say from where in a book, article or newspaper headline they were used, their authors may have had good reason to capitalise something that shouldn't be here, or vice-versa. Rally Wonk (talk) 18:14, 16 June 2025 (UTC)
YES to deprecating it from discussions based on capitals.
- That was the intent behind the RFC. I should have made it more clear, but for some reason thought the context of the discussion (and the linked RFCBEFORE) would give that clue. TurboSuperA+(connect) 18:20, 16 June 2025 (UTC)
- Sure, here's the result from 1900 forward. It shows no results at all for the second and third of the three, only the first, and no (or next to no) results prior to 2000. FactOrOpinion (talk) 18:38, 16 June 2025 (UTC)
- Thank you. So no results for "Five Freedoms of animal", yet when I click 'Search in Google Books', the third result returns use of "Five Freedoms of animal" written mid-sentence in prose. This tool cannot be used for caps discussion. YES to the clarified proposal. Rally Wonk (talk) 18:49, 16 June 2025 (UTC)
- In the case-insensitive n-grams for Five Freedoms of Animal, you get about equl numbers of "Five Freedoms of Animal" and "five freedoms of animal". Other combinations of capitalization fall below the threshold number of books to be counted, so don't show up. It's hard to tell how much less frequent "Five Freedoms of animal" is, just that it's less. The tool can be used for what it shows; understand its limits when using it. Dicklyon (talk) 21:46, 16 June 2025 (UTC)
- Thank you. So no results for "Five Freedoms of animal", yet when I click 'Search in Google Books', the third result returns use of "Five Freedoms of animal" written mid-sentence in prose. This tool cannot be used for caps discussion. YES to the clarified proposal. Rally Wonk (talk) 18:49, 16 June 2025 (UTC)
- Can you show me please the results of these three precise strings? I'm struggling. (I've lost track of all these conversations but there is a move discussion at Five Freedoms).
- I appreciate the responses. Perhaps this RfC could be rephrased to guide the ignorant. Coretheapple (talk) 22:26, 16 June 2025 (UTC)
- Comment I'm not entirely clear on the question (e.g., what does it mean to "deprecate" in this context, where we're not talking about a source for article content?). Given that Google N-grams are sometimes used in discussions (whether for move discussions or something else), I think it would be helpful to have a statement somewhere in a guideline specifying that it is not a source, but is instead a search engine, and it searches a large corpus of books, but where we have zero way of knowing what percentage are books that we would deem to be RSs, and zero way of knowing how representative this corpus of books is among all books in English, nor whether the results of looking at word/phrase use in books is essentially the same as their use in other formats (e.g., newspapers). The search does not distinguish among capitalization in a chapter title, in the middle of a sentence, ... There are optical character recognition errors and date metadata errors. Because the graphs represent proportional occurrence, one can be mislead about the frequency in American English vs. British English if the book corpus doesn't include the same # of words from each, and it's unclear how it treats English variations outside of the US and the UK (e.g., is the capitalization in Nigerian English, Indian English, etc., the same as British English?). Because the graphs represent proportional occurrence, the areas under two graphs isn't as meaningful as it might be. There are limitations in assessing the context of a given N-gram. Is it worthless in assessing information for WP's purposes? Probably not. But it certainly shouldn't be held out as some definitive result. FactOrOpinion (talk) 17:02, 16 June 2025 (UTC)
- It seems to me that indeed nobody knows what exactly "deprecate" is supposed to mean in this context. That surely is a problem with this RfC. Gawaon (talk) 17:38, 16 June 2025 (UTC)
- @TurboSuperA+, several of us would appreciate your clarifying what you mean by "deprecate" in this context. FactOrOpinion (talk) 17:57, 16 June 2025 (UTC)
- It means that it would no longer be allowed to use it to argue one way or another in moves/renames when the rename is done to lowercase or uppercase the letters of the title. TurboSuperA+(connect) 18:13, 16 June 2025 (UTC)
- One other limitation that I didn't include earlier: it allows one to select English fiction as the corpus, but it does not allow one to select English non-fiction as the corpus. I don't know that they provide info anywhere about the relative sizes of these two subsets of their overall English corpus. FactOrOpinion (talk) 20:01, 16 June 2025 (UTC)
- It seems to me that indeed nobody knows what exactly "deprecate" is supposed to mean in this context. That surely is a problem with this RfC. Gawaon (talk) 17:38, 16 June 2025 (UTC)
- @Myceteae To no.2, yes, I meant it for MOS:CAPS discussions. I thought that would be clear since we're on the MOS:CAPS talk page. To no.1, also yes. Ideally they would be ignored. Right now, many moves use Google Ngram results as the only argument for or against a rename. TurboSuperA+(connect) 18:23, 16 June 2025 (UTC)
- @TurboSuperA+ it might be worth making a brief, clearly identified update to the question to clarify your intended meaning: That Ngrams should be rarely (or never) used and should be discounted (or ignored) in determining consensus, and that this applies specifically to MOS:CAPS move/title discussions. I was pretty sure this is what you meant but wanted to confirm for myself and mainly for the benefit of others who are less familiar with these debates. I see that versions of these questions have been raised by a few others but it's still early and would benefit the rest of the discussion to clarify this up top while making it clear that this was added
up toplater. --MYCETEAE 🍄🟫—talk 18:52, 16 June 2025 (UTC)- Done. Thank you for the advice. TurboSuperA+(connect) 19:13, 16 June 2025 (UTC)
- @TurboSuperA+ it might be worth making a brief, clearly identified update to the question to clarify your intended meaning: That Ngrams should be rarely (or never) used and should be discounted (or ignored) in determining consensus, and that this applies specifically to MOS:CAPS move/title discussions. I was pretty sure this is what you meant but wanted to confirm for myself and mainly for the benefit of others who are less familiar with these debates. I see that versions of these questions have been raised by a few others but it's still early and would benefit the rest of the discussion to clarify this up top while making it clear that this was added
- Like some other commenters here, I'm not sure what "deprecate" means in this context. How would MOS:CAPS "express disapproval" of Google Ngram results? A note to the effect of "Google Ngram results are not accurate for matters of capitalization and should be the sole basis for determining article titles"? Or a stricter prohibition?
- I'm fine with editors mentioning or linking Google Ngram results; I do so myself sometimes with "peruse the results yourself". But I agree some here that the results can highly misleading. In addition to other problems already mentioned, Google Books sources can be highly imbalanced in certain instances. In one case, I saw a sudden spike in usage of a term in the 1950s and upon investigation found that that spike was entirely due to its use in UN documents. Google had a large number of these documents scanned, and they swamped usage of other print materials in the same era. — AjaxSmack 18:52, 16 June 2025 (UTC)
- Obviously no. WP is not going to ban the use of a tool that is frequently crucial, just because a handful of individuals do not know how to use it properly. This is an attempt to end-run around the first rule of MOS:CAPS (
only words and phrases that are consistently capitalized in a substantial majority of independent, reliable sources are capitalized in Wikipedia
). If proponents of unnecessary over-capitalization succeed in banning one of the primary means of establishing the capitalization rate in source material then the assessment for many topics would be difficult or impossible, so the RM results would probably come down to whoever screamed loudest (and we all already know that'll be the single-topic editors demanding unnecessary capitalization in their pet topic). — SMcCandlish ☏ ¢ 😼 19:44, 16 June 2025 (UTC)reliable sources
- Google Ngram doesn't only check reliable sources, but self-published books as well. So it would seem that that particular sentence from MOS:CAPS precludes the use of Google Ngram by definition. TurboSuperA+(connect) 19:48, 16 June 2025 (UTC)
- In addition to not checking only reliatble sources, it also doesn't check only independent sources. It doesn't filter out headlines, proper nouns, captions, indexes, et. Thryduulf (talk) 20:02, 16 June 2025 (UTC)
- The SPS problem is avoided by constraing searches to 2019 and earlier. The expansion of the corpus after 2019 is when junk books were dumped into the data set. The latter problem, of distingushing running-text usage from usage in title-case headlines, captions, etc., is avoided by using a series of carefully selected searches. Everyone familiar with these tools already understands this. They are not a be-all and end-all tool, and one is apt to get more useful results from a Google Scholar search, but that doesn't make the tool useless. — SMcCandlish ☏ ¢ 😼 20:18, 16 June 2025 (UTC)
- Is there an essay anywhere about what "everyone familiar with these tools already understands"? FactOrOpinion (talk) 20:41, 16 June 2025 (UTC)
The SPS problem is avoided by constraing searches to 2019 and earlier. The expansion of the corpus after 2019 is when junk books were dumped into the data set.
- So Google Ngram can never give us an accurate representation of capitalisation in contemporary sources. Another reason to stop using it. TurboSuperA+(connect) 21:12, 16 June 2025 (UTC)
- I agree I would not try to use it for "contemporary" usage, unless you intend to show how Wikipedia capitalization influences contemporary usage. It's much more meaningful to look at usage up to, for example, the time a wikipedia article was created, before WP has had time to feed back to stats by influencing usage. Dicklyon (talk) 21:41, 16 June 2025 (UTC)
- This plus the polluting of the corpus after 2019 may decrease Ngram's utility over time. There's been some acknowledgement of the trend towards capitalization in some of the revolution RMs that were deemed to not yet meet our threshold. With capitalization and many other RMs where Ngram data is raised, it often appears usage is headed in a particular direction but is not yet ripe for a particular title, and the suggestion is raised that we revisit in a year. This doesn't change my view that Ngram is often useful, to be clear. --MYCETEAE 🍄🟫—talk 02:09, 17 June 2025 (UTC)
- I agree I would not try to use it for "contemporary" usage, unless you intend to show how Wikipedia capitalization influences contemporary usage. It's much more meaningful to look at usage up to, for example, the time a wikipedia article was created, before WP has had time to feed back to stats by influencing usage. Dicklyon (talk) 21:41, 16 June 2025 (UTC)
- The SPS problem is avoided by constraing searches to 2019 and earlier. The expansion of the corpus after 2019 is when junk books were dumped into the data set. The latter problem, of distingushing running-text usage from usage in title-case headlines, captions, etc., is avoided by using a series of carefully selected searches. Everyone familiar with these tools already understands this. They are not a be-all and end-all tool, and one is apt to get more useful results from a Google Scholar search, but that doesn't make the tool useless. — SMcCandlish ☏ ¢ 😼 20:18, 16 June 2025 (UTC)
- In addition to not checking only reliatble sources, it also doesn't check only independent sources. It doesn't filter out headlines, proper nouns, captions, indexes, et. Thryduulf (talk) 20:02, 16 June 2025 (UTC)
- I dunno, the anti-capitalist editors scream pretty loud, too. Give yourself some credit! --MYCETEAE 🍄🟫—talk 01:46, 17 June 2025 (UTC)
- Yes for capitalisation, mostly for other uses. Ngrams are a single datapoint that is sometimes interesting, but the complete lack of any context to the usage and mix of reliable and unreliable sources means that it essentially never actually useful in determining what capitalisation is appropriate. When the discussion is about which term is more commonly used and there is no evidence of a British/American English split and there is no evidence that usage differs in different contexts then it can be useful evidence but it is never conclusive on its own. Thryduulf (talk) 20:00, 16 June 2025 (UTC)
- No. Corpus-based data is useful for COMMONNAME-based RMs, since in many circumstances they can survey usage more broadly than an editor searching by hand would be able to. On those grounds alone, I think anything as sweeping as this RfC proposal is likely to do more harm than good. That's not to say that Ngrams should be used uncritically—they have limitations, like any potential source of data—but the ideal solution here would be much closer to "information page about the strengths and weaknesses of Ngram data" rather than a full proscription. Even for caps-related uses of Ngrams, which are one of the areas where the corpus' weaknesses are most profound, they can still have some informational value; for example, if Ngrams gives a 50:1 ratio for one capitalization or another, it's all the likelier that the presence of headlines and titles aren't significantly skewing the numbers. ModernDayTrilobite (talk • contribs) 21:07, 16 June 2025 (UTC)
- Yes per Thryduulf. Ngrams might possibly be helpful if they were used with greater caution and nuance, but I don't see a realistic path forward in which that happens, so I think deprecation is a preferable alternative to the status quo. LEPRICAVARK (talk) 23:31, 16 June 2025 (UTC)
- Yes - go the deprecation route. GoodDay (talk) 23:42, 16 June 2025 (UTC)
- No: Ngrams are are flawed, but every simple way of answering a complex question is flawed. Discussions should include criticism of of all methods including Ngrams. Capitalization is not an easy question. Every definition of proper noun fails when examined closely. Our consensus process is the best way we've found and it's not perfect, but it's working fine. SchreiberBike | ⌨ 11:17, 17 June 2025 (UTC)
- Yes as I said before, they have too much randomness. Yesterday, all my dreams... (talk) 16:38, 17 June 2025 (UTC)
- They do have some problems, but randomness is not among them. Dicklyon (talk) 00:04, 18 June 2025 (UTC)
- Do you know how they determined which books to include in their corpus? Was it a random sample, or a representative sample, a convenience sample, ...? FactOrOpinion (talk) 00:15, 18 June 2025 (UTC)
- Not exactly, but I know they started scanning all the books in about a half dozen major university libraries, and after that got direct feeds of already-digital books from publishers. Basically, anything they could get, even though in recent years so many books are just wiki-derived, self-published, or AI slop. Our article Google Books says, "As of October 2019, Google celebrated 15 years of Google Books and provided the number of scanned books as more than 40 million titles. Google estimated in 2010 that there were about 130 million distinct titles in the world ..." Since several of the libraries were in the US and UK (see Google Books#Initial partners), I presume they've covered a larger fraction of English-language books than other languages, and I presume that the libraries tend to bias the collection toward "reliable". But it's basically everything, not a sample. Dicklyon (talk) 02:53, 18 June 2025 (UTC)
- I've now done hunted down a bit more info, and it's nowhere near "basically everything." The corpus for the Ngram viewer is a proper subset of the Google books corpus: "The first version of the data set, published in 2009, incorporates over 5 million books. These are, in turn, a subset selected for quality of optical character recognition and metadata—e.g., dates of publication—from 15 million digitized books, largely provided by university libraries. ... The second version, published in 2012, contains 8 million books." (source) I haven't been able to find the size of the 2019 corpus, but if the ratio is about the same as in the first corpus (1/3), then it's ~13.3M books, which is less than 9% of all books (based on this estimate of total current books). I'm guessing that the fraction of Google Books that came from university libraries has dropped over time. It's definitely a not a random or representative sample. It's more of a large convenience sample.
- For WP's purposes, it doesn't even make sense to me to weight each book equally. FactOrOpinion (talk) 14:49, 18 June 2025 (UTC)
- Thank you for correcting my mis-impressions. Dicklyon (talk) 16:43, 18 June 2025 (UTC)
- It's interesting that the main problem that paper highlights is the over-representation of specialist sources, as opposed to being representative of general usage. Not surprising, and likely another reason why capitalization is exaggerated therein. Dicklyon (talk) 16:48, 18 June 2025 (UTC)
- Not exactly, but I know they started scanning all the books in about a half dozen major university libraries, and after that got direct feeds of already-digital books from publishers. Basically, anything they could get, even though in recent years so many books are just wiki-derived, self-published, or AI slop. Our article Google Books says, "As of October 2019, Google celebrated 15 years of Google Books and provided the number of scanned books as more than 40 million titles. Google estimated in 2010 that there were about 130 million distinct titles in the world ..." Since several of the libraries were in the US and UK (see Google Books#Initial partners), I presume they've covered a larger fraction of English-language books than other languages, and I presume that the libraries tend to bias the collection toward "reliable". But it's basically everything, not a sample. Dicklyon (talk) 02:53, 18 June 2025 (UTC)
- Any time you deal with huge data sets over long periods of time, randomness come in, one way or another. But based on your response below, I think you know that now. But no worries... Yesterday, all my dreams... (talk) 22:03, 18 June 2025 (UTC)
- My real point was just that the unknowns and biases are not random, we just don't know exactly what they are. But if you want to think of them as random, that's fine too. Dicklyon (talk) 22:48, 18 June 2025 (UTC)
- Do you know how they determined which books to include in their corpus? Was it a random sample, or a representative sample, a convenience sample, ...? FactOrOpinion (talk) 00:15, 18 June 2025 (UTC)
- They do have some problems, but randomness is not among them. Dicklyon (talk) 00:04, 18 June 2025 (UTC)
- No. It still has some value in determining the common name. We shouldn't only being using Ngrams, though. Mellk (talk) 20:55, 17 June 2025 (UTC)
- No way. Maybe a set of best practices on how it is used might be reasonable, but to entirely exclude evidence entirely because what seems like a dispute between the proposer and a single user, Dicklyon, on how it should be used is excessive to an extreme. If how it's used is believed to be incorrect, explain what the errors are and present better evidence. Alpha3031 (t • c) 07:42, 19 June 2025 (UTC)
explain what the errors are and present better evidence.
- WP:NGRAM (work in progress, links need to be made into permanent ones, etc.) TurboSuperA+(connect) 08:32, 19 June 2025 (UTC)
- You're clearly free to write whatever essay you want, but I think it would be much more useful to create an essay that addresses both the problems and what knowledgeable users of this tool have learned over time about how to improve the reliability of results (e..g, above, @SMcCandlish said "distingushing running-text usage from usage in title-case headlines, captions, etc., is avoided by using a series of carefully selected searches. Everyone familiar with these tools already understands this," but didn't elaborate on what such a series of carefully selected searches looks like). I also suggest that you reread this entire thread to see what other relevant points have been made. FactOrOpinion (talk) 14:46, 19 June 2025 (UTC)
- It is just the beginning, note where I say "work in progress" meaning "not finished". It does include comments from this thread (and will continue to include them as they come in). I only linked it because the editor asked for evidence of problems/unreliability, and it was a convenient way to provide it. We can talk about the essay on my talk page or anywhere else other than this RfC, so we don't derail it. TurboSuperA+(connect) 16:05, 19 June 2025 (UTC)
- re:
the editor asked for evidence of problems/unreliability
, I meant you should point out the specific issue at each specific RM, or find a better indicator. Alpha3031 (t • c) 08:48, 21 June 2025 (UTC)
- re:
- It is just the beginning, note where I say "work in progress" meaning "not finished". It does include comments from this thread (and will continue to include them as they come in). I only linked it because the editor asked for evidence of problems/unreliability, and it was a convenient way to provide it. We can talk about the essay on my talk page or anywhere else other than this RfC, so we don't derail it. TurboSuperA+(connect) 16:05, 19 June 2025 (UTC)
- You're clearly free to write whatever essay you want, but I think it would be much more useful to create an essay that addresses both the problems and what knowledgeable users of this tool have learned over time about how to improve the reliability of results (e..g, above, @SMcCandlish said "distingushing running-text usage from usage in title-case headlines, captions, etc., is avoided by using a series of carefully selected searches. Everyone familiar with these tools already understands this," but didn't elaborate on what such a series of carefully selected searches looks like). I also suggest that you reread this entire thread to see what other relevant points have been made. FactOrOpinion (talk) 14:46, 19 June 2025 (UTC)
- No The question of what we should cap on WP is essentially a statistical question, since it would be impossible/unreasonable to identify all sources using a particular term in prose. Consequently we look to identifying a sample of sources to determine the proportion of usage. Note that the proportion of usage is the key issue - it is not a source war determined by which side can produce the most sources. The virtues of ngrams are, that they draw on a large sample which is free from observer bias originating with the WP editors interrogating the sample. However, even ngrams may not have a sufficiently large enough number of use for a particular term (search string) to reasonably address the question and a bias toward technical/academic sources does exist. Where I have seen objections to the use of ngrams in discussions this largely occurs where editors supporting capitalisation see the results as being at odds with their perceptions of what the results should be. Yes, there are some imperfections with ngrams as a tool as identified herein but these imperfections tend to favour capitalisation. Understanding the strengths and limitations of a tool is the issue here. I believe that this RfC was premature and that the above discussion (#It is time we talked about Google Ngram) still had some way to go in providing useful comments on how ngrams can be effectively used and when not. In interrogating any sample set of sources for a particular usage on a time basis (eg by year) will evidence varies each year - sometimes greatly. This is what we see in any ngram. Ngrams provide a smoothing function to help deal with this randomness. It is not a fault of ngrams that the data can have a significant deal of randomness - it is the nature of the data (the beast). To the modified RfC question, there are reasonable instances where ngram evidence alone would be sufficient to initiate an RM, though it is good practice to confirm ngram results, at least against google book results. The utility of ngrams in capitalisation discussions (RMs) is a matter to be determined on a case by case basis. Cinderella157 (talk) 00:33, 23 June 2025 (UTC)
Note that the proportion of usage is the key issue
the issue is that Google ngrams do not provide a reliable indication of the proportion of usage because of all the deficiencies identified in this thread and elsewhere. Thryduulf (talk) 09:39, 23 June 2025 (UTC)- Yes, it has deficiencies. However, we have nothing better, and with some work we can work around many of those deficiencies. Everything has flaws. If anyone has a better idea of how to evaluate a large corpus of published data for capitalization, I'm open to it, but let's not remove the best we have available. SchreiberBike | ⌨ 11:13, 23 June 2025 (UTC)
- It might be the best available, but that doesn't mean it is good enough. Its deficiencies mean that discussions regarding capitalisation are the area where its results are the least reliable at representing what we are attempting to measure (the prevalence of different capitalisations in the running prose of reliable sources using the term in the same context(s) as the article). In every other aspect of determining Wikipedia content we restrict ourselves to using sources that are both reliable and relevant, even if there are sources that are significantly easier to access that don't meet those requirements. Titles of articles should be no different. Thryduulf (talk) 11:25, 23 June 2025 (UTC)
- We have the sources cited in the article, which are all (or should be) RS. Meanwhile there is no guarantee Google Ngram is searching through reputable sources only. Not to mention the results can be gamed:
A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not.
[3] Which is made worse by the fact that anyone can "publish" a book, buy an ISBN and have it added to Google Library/ngram results. TurboSuperA+(connect) 11:28, 23 June 2025 (UTC) - "with some work we can work around many of those deficiencies" How? If it's going to be used, it would be helpful for you to list the problems for which you think there's a work around and what that work around is. FactOrOpinion (talk) 12:18, 23 June 2025 (UTC)
- Search on this page for "Tips for using n-grams". SchreiberBike | ⌨ 00:04, 24 June 2025 (UTC)
- OK, but none of those address many of the issues that have been raised here, the most significant of which is that the corpus is not limited to RSs, and there is no way to know whether the distribution in RSs would be different than in the corpus as a whole, nor how representative the corpus is of all books (and of course, our RSs are sometimes texts like newspapers that aren't part of the corpus at all). FactOrOpinion (talk) 01:00, 24 June 2025 (UTC)
- This is an RfC about no longer allowing the use of Google Ngrams in capitalization decisions. Google Ngrams have problems but so does every alternative. The vast majority of the time there is general agreement about capitalization, but sometimes we need to look at the evidence and discuss. We should look at all the alternatives and come to a consensus. No single source of information is decisive, but we can look at and weigh sources based on their value. I feel frustrated that people are searching for some perfect solution and disregarding a process that has worked. SchreiberBike | ⌨ 01:35, 24 June 2025 (UTC)
- I'm aware of what it's an RfC for, and I'm not looking for a perfect solution. I'm simply noting that MOS:CAPS clearly states "only words and phrases that are consistently capitalized in a substantial majority of independent, reliable sources are capitalized in Wikipedia." Google Ngrams does not distinguish RSs from non-RSs. There isn't even any way to assess what % of the corpus are RSs. FactOrOpinion (talk) FactOrOpinion (talk) 02:00, 24 June 2025 (UTC)
- We can indeed look at and weigh sources based on their value, however we cannot do that using google Ngrams because there is no way of knowing which sources are in its corpus and thus no way of knowing what their value is. Also, the perfect solution not existing is not a reason to use solutions that are bad. Thryduulf (talk) 02:07, 24 June 2025 (UTC)
- Exactly this. Intothatdarkness 12:53, 27 June 2025 (UTC)
- This is an RfC about no longer allowing the use of Google Ngrams in capitalization decisions. Google Ngrams have problems but so does every alternative. The vast majority of the time there is general agreement about capitalization, but sometimes we need to look at the evidence and discuss. We should look at all the alternatives and come to a consensus. No single source of information is decisive, but we can look at and weigh sources based on their value. I feel frustrated that people are searching for some perfect solution and disregarding a process that has worked. SchreiberBike | ⌨ 01:35, 24 June 2025 (UTC)
- OK, but none of those address many of the issues that have been raised here, the most significant of which is that the corpus is not limited to RSs, and there is no way to know whether the distribution in RSs would be different than in the corpus as a whole, nor how representative the corpus is of all books (and of course, our RSs are sometimes texts like newspapers that aren't part of the corpus at all). FactOrOpinion (talk) 01:00, 24 June 2025 (UTC)
- Search on this page for "Tips for using n-grams". SchreiberBike | ⌨ 00:04, 24 June 2025 (UTC)
- Yes, it has deficiencies. However, we have nothing better, and with some work we can work around many of those deficiencies. Everything has flaws. If anyone has a better idea of how to evaluate a large corpus of published data for capitalization, I'm open to it, but let's not remove the best we have available. SchreiberBike | ⌨ 11:13, 23 June 2025 (UTC)
- Add When doing a case insensitive search on ngrams, the casing in the search term has no effect on the search result (compare here and here). The assertion it does is incorrect. The capitalisation of moon is a conundrum because different people have different views on where or when it should be capitalised. Ngrams do have the capacity to be tailored to capture/represent different contexts. If moon (the earth's moon) is
widely known as a proper name through the English language
, why do we see near equal capitalisation here, when, as a rule, proper names are always capped? We can the ngram with sources (eg here) which indicate a similar result for capitalising moon. Even looking at the sources used in the article on the Moon, the reference section shows that moon is not consistently capitalised. Perhaps the issue is summarised by this quote,You really used an Ngram to prove that the "Moon" should be lower-cased because the majority of people are ignorant?
[4] - ie I know better and the sources are wrong. Opposing the use of ngrams because they don't give a result that one wants, expects or an answer one thinks is wrong is not a good reason to oppose their use. Cinderella157 (talk) 11:57, 25 June 2025 (UTC)
Yes for objects. I think this attempt to lower-case the Moon can show the weakness of Ngram use.
That was a while ago but Dick Lyon just posted this about it (deliberately no diff): "...things like this data from sources (which is very unlike what we see in an astronomical context). I prefer to stick to what sources tell us..."
The first search, with Armstrong's proper name uppercase, will put the search in a specific context with one person doing one action. The second search, replacing the upper-case proper name with a lower-case "earth", puts the search into a much wider context with different results.
Since we are talking about the Moon, which is widely known as a proper name through the English language, why should only the narrow search be "the sources"? Because of these variations I suggest that Ngrams aren't reliable for proper names of objects. No position on actions and probably useful for Commonname.
Why are we talking about potential sources which can't possibly be checked, instead of sources actually used in the article, which can? Thank you. Sammy D III (talk) 18:44, 23 June 2025 (UTC)
- Armstrong was just an example. You can see lots more: here, and decide which ones you'd like to focus on, or reduce the context like here or here. I believe I had brought that up relative to an edit I had made about landing on the moon. All the n-grams show is that most authors don't capitalize moon in that context. Dicklyon (talk) 03:16, 24 June 2025 (UTC)
- I owe you a sincere apology. I got it wrong. I thought you had tried to move the Moon, not just Commonname it in the text. I have struck it out and I mean it. Not an olive branch, just the right thing to do. Sammy D III (talk) 12:50, 25 June 2025 (UTC)
- Yes, I have now succeeded in unilaterally moving it to moon. I hope that's OK. It's not as if we didn't discuss it.[Joke] Dicklyon (talk) 22:27, 25 June 2025 (UTC)
- Single-handedly moving the moon, now that's quite an accomplishment! Gawaon (talk) 15:22, 29 June 2025 (UTC)
- Yes, I have now succeeded in unilaterally moving it to moon. I hope that's OK. It's not as if we didn't discuss it.[Joke] Dicklyon (talk) 22:27, 25 June 2025 (UTC)
- I owe you a sincere apology. I got it wrong. I thought you had tried to move the Moon, not just Commonname it in the text. I have struck it out and I mean it. Not an olive branch, just the right thing to do. Sammy D III (talk) 12:50, 25 June 2025 (UTC)
- With even more searches and results I think you have reinforced my point: "Because of these variations I suggest that Ngrams aren't reliable for proper names of objects". Sammy D III (talk) 03:26, 24 June 2025 (UTC)
- I'm puzzled re what you're trying to say. Nobody is claiming that Ngram are "reliable for proper names of objects", whatever that means. They just show usage statistics. Dicklyon (talk) 04:01, 24 June 2025 (UTC)
- No. Google Ngrams are important data. It should be noted prominently that their proper uses requires skill. SmokeyJoe (talk) 13:48, 17 July 2025 (UTC)
His father is Black and his mother is white.
[edit]At the end of Tyrese Haliburton#Early life and family, it says His father is Black and his mother is white.
Is this mixed capitalization of races appropriate? If not, what is the consensus on how to treat them? FWIW, the cited source uses that exact style, but evidently this appears contrary to MOS:RACECAPS. Left guide (talk) 06:07, 21 June 2025 (UTC)
- I think the accepted WP practice is apply it consistently on a given page, whatever style is chosen. —Bagumba (talk) 06:23, 21 June 2025 (UTC)
- Yes, a note in MOS:RACECAPS says "The status quo practice had been that either style was permissible, and this proposal did not overturn that". I too would interpret that as meaning that both capitalized style and lower-case style is permissible, as long as it's used consistently on any given page. Mixed usage is not accepted – the proposal to capitalize only "Black" failed to reach consensus. Gawaon (talk) 07:14, 21 June 2025 (UTC)
- I don't understand that answer. Shouldn't it be No if either capitalized or lowercase is acceptable, and mixed like this is not? Dicklyon (talk) 22:29, 25 June 2025 (UTC)
- I think Gawaon's "yes" is agreeing with Bagumba, not responding to the question in the second sentence of the original post. --Trovatore (talk) 05:03, 26 June 2025 (UTC)
- Yes, I expressed my agreement with Bagumba. Gawaon (talk) 05:15, 26 June 2025 (UTC)
- I think Gawaon's "yes" is agreeing with Bagumba, not responding to the question in the second sentence of the original post. --Trovatore (talk) 05:03, 26 June 2025 (UTC)
- I don't understand that answer. Shouldn't it be No if either capitalized or lowercase is acceptable, and mixed like this is not? Dicklyon (talk) 22:29, 25 June 2025 (UTC)
- Yes, a note in MOS:RACECAPS says "The status quo practice had been that either style was permissible, and this proposal did not overturn that". I too would interpret that as meaning that both capitalized style and lower-case style is permissible, as long as it's used consistently on any given page. Mixed usage is not accepted – the proposal to capitalize only "Black" failed to reach consensus. Gawaon (talk) 07:14, 21 June 2025 (UTC)
- Yes, it is fine, and not contrary to MOS:RACECAPS. The upshot of the RfCs was that both upper or lower case is acceptable. Hawkeye7 (discuss) 23:51, 25 June 2025 (UTC)
- Hmm, taken at face value that would also seem to allow
His father is black and his mother is White.
, which I suspect would elicit objections. --Trovatore (talk) 00:53, 26 June 2025 (UTC)- I remain confused. Does "both upper or lower case is acceptable" mean it's OK to do them differently, like Trovatore illustrates? Or not? Or is Black and white OK as in some publications' modernized styles, but black and White not? I'm not saying it's an easy question, just that I don't understand these answers. Dicklyon (talk) 05:01, 26 June 2025 (UTC)
- My interpretation of "either style [is] permissible" in MOS:RACECAPS is that consistency is still required – if something is not consistent, it's not a style, and hence not permissible. So it's not OK to lower-case "white" in one sentence and capitalize it in the next (when both apply to persons), since that's not consistent. Neither is it OK to capitalize "White" and lowercase "black" since that's not consistent. Nor the other way around. Consistency is an implicit requirement here (per our general rules), so both terms must be treated the same. Gawaon (talk) 05:18, 26 June 2025 (UTC)
- Consistency is not required. Although we did not adopt the American practice of capitalising Black only, editors are free to do so. Hawkeye7 (discuss) 05:27, 26 June 2025 (UTC)
- Are they free to capitalize White only? --Trovatore (talk) 05:31, 26 June 2025 (UTC)
- If they want to. I guess you are wondering what the content creators will do with so much editorial freedom. Hawkeye7 (discuss) 08:15, 26 June 2025 (UTC)
- Rereading the note in MOS:RACECAPS, I see that "mixed use" (i.e., capitalized "Black" and lower-case "white") is indeed allowed as well. I stand corrected! Gawaon (talk) 06:08, 26 June 2025 (UTC)
- Should we add a line to the main text clarifying that mixed use is permissible when editors determine this is the appropriate style for a particular article? The added detail in the note is useful but the top line guidance is easily missed. --MYCETEAE 🍄🟫—talk 20:22, 28 June 2025 (UTC)
- Why not? It sure would make things clearer. Gawaon (talk) 21:02, 28 June 2025 (UTC)
Done Special:Diff/1299365122. I tried to stick very closely to the wording in the note to reflect that this is a mere clarification and not a change but additional wordsmithing may be in order. --MYCETEAE 🍄🟫—talk 00:26, 8 July 2025 (UTC)
- That is not how I read the results of that discussion. I read it as either Black and White or black and white is acceptable, but that Black and white or black and White is not. --User:Khajidha (talk) (contributions) 15:23, 23 July 2025 (UTC)
- It's what the note in that section has been saying for a long time, however: "with no consensus to implement a rule requiring either or against mixed use where editors at a particular article believe it's appropriate" (emphasis added). Since there was no consensus against mixed use (Black, but white), it's allowed. Gawaon (talk) 17:11, 23 July 2025 (UTC)
- That is not how I read the results of that discussion. I read it as either Black and White or black and white is acceptable, but that Black and white or black and White is not. --User:Khajidha (talk) (contributions) 15:23, 23 July 2025 (UTC)
- Why not? It sure would make things clearer. Gawaon (talk) 21:02, 28 June 2025 (UTC)
- Should we add a line to the main text clarifying that mixed use is permissible when editors determine this is the appropriate style for a particular article? The added detail in the note is useful but the top line guidance is easily missed. --MYCETEAE 🍄🟫—talk 20:22, 28 June 2025 (UTC)
- Are they free to capitalize White only? --Trovatore (talk) 05:31, 26 June 2025 (UTC)
- Consistency is not required. Although we did not adopt the American practice of capitalising Black only, editors are free to do so. Hawkeye7 (discuss) 05:27, 26 June 2025 (UTC)
- My interpretation of "either style [is] permissible" in MOS:RACECAPS is that consistency is still required – if something is not consistent, it's not a style, and hence not permissible. So it's not OK to lower-case "white" in one sentence and capitalize it in the next (when both apply to persons), since that's not consistent. Neither is it OK to capitalize "White" and lowercase "black" since that's not consistent. Nor the other way around. Consistency is an implicit requirement here (per our general rules), so both terms must be treated the same. Gawaon (talk) 05:18, 26 June 2025 (UTC)
- I remain confused. Does "both upper or lower case is acceptable" mean it's OK to do them differently, like Trovatore illustrates? Or not? Or is Black and white OK as in some publications' modernized styles, but black and White not? I'm not saying it's an easy question, just that I don't understand these answers. Dicklyon (talk) 05:01, 26 June 2025 (UTC)
- Hmm, taken at face value that would also seem to allow
Arbitration notice
[edit]There is an arbitration case involving this topic at Wikipedia:Arbitration/Requests/Case#Capitalization Disputes. Left guide (talk) 20:52, 26 June 2025 (UTC)
Clarifying MOS:DOCTCAPS
[edit]Doctrines, ideologies, philosophies, theologies, theories, movements, methods, processes, systems or schools of thought and practice, and fields of academic study or professional practice are not capitalized, unless the name derives from a proper name.
I think perhaps this section could do with some more clarity or more examples of what is considered a movement, method, process, etc.
I've been reconsidering capitalization edits I made on Teach the Controversy, but I'm not sure how to interpret this policy in relation to the article/phrase. I consulted previous discussions about this policy, but in most instances it was fairly straightforward and/or didn't apply to this case.
I was confused by some instances in article titles/article body, but perhaps they are cases consistently capitalized in a substantial majority of independent, reliable sources
:
Third World socialism, Third-Worldism, third-world?, Non-Aligned Movement, Manifest destiny, Global War Party.
Considering other Discovery Institute campaigns, there's also "Critical Analysis of Evolution", "Free Speech on Evolution", "Academic freedom campaign" (but Academic Freedom bills). Then there's Intelligent Design and Wedge Strategy which are often capitalized but have lowercase article titles. I feel the capitalization is helpful in the case of Critical Analysis of Evolution because, if lowercase, the words would seem to mean something else. With Teach the Controversy, it's short enough that using "teach the controversy" strategy doesn't feel too repetitive, but always enclosing in quotation marks feels unnecessary. But without any distinction, it could get reinterpreted as teach the [controversy strategy].
In some of the articles I linked, the capitalization of the article title wouldn't match all instances in the body. Is this because MOS:DOCTCAPS is more important for article names and consistency isn't necessary unless it's a problem? Or are these various inconsistencies themselves violations of MOS:DOCTCAPS that just haven't been corrected? – Kilvin • 👾 03:42, 18 July 2025 (UTC)
- What you appear to be describing are terms of art or
a term that has a specialized meaning in a particular field or profession
[5]. These fall to MOS:SIGNIFCAPS (as well as DOCTCAPS) -Introduction of a term of art may be wikilinked and, optionally, given in non-emphasis italics on first occurrence
- ie once it is identified as a term of art by the use of italics, italics need not be used thereafter. Cinderella157 (talk) 08:01, 18 July 2025 (UTC)
This was cited for moving "List of assets owned by The Coca-Cola Company" (regardless of the company's capitalization of "the"). The article "The Pokémon Company", however, consistently capitalizes "the". Should both use uncapitalized "the" per this guideline? J3133 (talk) 07:10, 21 July 2025 (UTC)
- Yes. Gawaon (talk) 07:34, 21 July 2025 (UTC)
antisemitism -v- anti-Semitism
[edit]The section "Peoples and their languages" states, "antisemitism, which is preferred in wikivoice per the consensus of scholars and historians of antisemitism" – but "consensus" is not true: there is a small majority, but not a consensus (i.e. "general agreement among a group of people"). The Internet Archive lists 2,740 texts with "antisemitism" in the title and 2,114 texts with "anti-Semitism" in the title. I note that The Oxford English Dictionary and Merriam-Webster give only one version of the term: "anti-Semitism" It seems rather an odd formation, given that German has had "Antisemitismus" since the 1870s and French has had "antisémitisme" almost as long but I do not think Wikipedia should fly in the face of the two most authoritative English and American lexical sources. Was the current wording of the MoS discussed, and if so, where can one find the discussion? Tim riley talk 07:05, 1 August 2025 (UTC)
- The last discussion was in March, and Zanahary made the guideline change. Just skimming here, but it seems that the guideline would be stronger if it noted Wikipedia consensus to use "antisemitism" without making a claim about scholars and historians. Firefangledfeathers (talk / contribs) 13:31, 1 August 2025 (UTC)
- No objection ꧁Zanahary꧂ 16:23, 1 August 2025 (UTC)
- My thanks, Firefangledfeathers. I think the decision you refer to is ill-informed but I shan't raise the matter again. Tim riley talk 19:09, 1 August 2025 (UTC)
- No objection ꧁Zanahary꧂ 16:23, 1 August 2025 (UTC)
Capitalization "black", "white" and "colo[u]red"
[edit]There is no single universal rule for capitalizing "black" and "white" when relation to people, although this is more common in some American style guides. It can be nuanced, for example according to The Guardian, Minna Salami, who is a Finnish Nigerian, dislikes capitalizing "black" when reference to people because she opposes the imposition of any single rule regarding how black people should define themselves. In South Africa, the term "colored" should not be capitalized, according to the South African Editorial Style Guide by the government in South Africa. (https://www.gcis.gov.za/sites/default/files/docs/resourcecentre/guidelines/Editorial_Style_Guide.pdf). The Oxford dictionary stated that the capitalization of these terms are a stylistic choice, rather than a strict rule. The term "African American" should not be hyphenated. MarcoToa1 (talk) 09:43, 8 October 2025 (UTC)
- Capitalizing "white" is optional, since it hasn't developed a widespread, accepted cultural identity and community to the same extent. Some also capitalized "white" and "black" like the APA style. MarcoToa1 (talk) 09:45, 8 October 2025 (UTC)
- It's best to ask the writer or author's preference about the capitalization. MarcoToa1 (talk) 09:47, 8 October 2025 (UTC)
- You can compare:
- Black people, black people, White people, white people, Coloured people, coloured people
- in Ngram and must be case sensitive. MarcoToa1 (talk) 09:49, 8 October 2025 (UTC)
- You can also compare other style guides. MarcoToa1 (talk) 09:50, 8 October 2025 (UTC)
- It should be "compare with". MarcoToa1 (talk) 09:51, 8 October 2025 (UTC)
- You can also compare other style guides. MarcoToa1 (talk) 09:50, 8 October 2025 (UTC)
- It's best to ask the writer or author's preference about the capitalization. MarcoToa1 (talk) 09:47, 8 October 2025 (UTC)
- This is covered in MOS:RACECAPS. Gawaon (talk) 10:08, 8 October 2025 (UTC)
- Thanks. MarcoToa1 (talk) 12:05, 8 October 2025 (UTC)
Capitalizing the word "MXDWN"
[edit]I've used the sources from mxdwn.com, which is a relianle source. When I cite this source, I write it as "MXDWN" (all caps) in |website=, thinking it was an initialism. However, I could not find any evidence that "MXDWN" is actually an initialism. On the contrary, the site itself uses the lowercase form "mxdwn" on its About Us page. Should I therefore write it as "Mxdwn" instead? Camilasdandelions (talk!) 23:55, 26 October 2025 (UTC)
- Yes, that makes sense. Gawaon (talk) 08:34, 27 October 2025 (UTC)

