Jump to content

User talk:Phlsph7/AI spelling and grammar suggestions for vital articles

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Spelling variants

[edit]

Hi, looking at User:Phlsph7/AI_spelling_and_grammar_suggestions_for_vital_articles#Morya_Gosavi, re "The word "travelled" is the British English spelling. Since the text uses American English elsewhere, it should be "traveled." This is a problematic edit, Wikipedia supports multiple versions of English and aims to be consistent at the article level not the project level. However the test for moving an article to American English spelling is not the presence of some American English somewhere else in the article. Not least because an article in say Indian English might include a quote in American English. Also the code is already encouraging replacing of realised with the American realized without even explaining that this is part of an americanisation of the site. My suggestion is that you concentrate on using the AI to identify errors that are errors regardless of which variant of english is being used. ϢereSpielChequers 12:49, 3 February 2025 (UTC)[reply]

Re-Emerged v reemerged is another homogenisation, Merriam Webster accepts both. My suggestion is that at this stage you use AI to find additional typos that could go into Wikipedia_talk:AutoWikiBrowser/Typos. That requires no or very very few false positives and at least a dozen non recent examples (if all your examples are from the last few months it is likely already in AWB and it is just the normal delay of getting all articles patrolled by AWB). ϢereSpielChequers 13:08, 3 February 2025 (UTC)[reply]
Hello WereSpielChequers, thanks for taking a look and for the helpful ideas! I added new instructions to ignore English variants and alternative spelling errors. Unfortunately, the AI model does not always follow instructions, so it may occasionally still bring up these points, but hopefully less frequently now.
Using AI to find common typos for AWB is an interesting idea, but the approach to do this would probably be quite different from what the script is currently doing. I'll experiment a little in this direction. Phlsph7 (talk) 15:06, 3 February 2025 (UTC)[reply]
I suspect that the AI has been trained more on American English than Indian English, and this would need careful consideration by anyone using the typo fixing parts of this. BTW Grammar changes could be valid for all I know. My punctuation skills are a bit rubbish and many of the changes it suggests remind me of corrections I have seen others do. But I'll leave it to others to check whether the AI is correct in that area. Re using this to find new typos and incorrect word combinations that AWB doesn't pick up, could we try running this on all articles not edited in the last four months? That should find a bunch of problems that we aren't currently correcting and that we could feed into AWB and other tools. ϢereSpielChequers 10:30, 4 February 2025 (UTC)[reply]
I agree, the fixation on American English is an issue for this application of the AI model.
Do you have a specific list of articles in mind? Otherwise, I could run it on the first 50 article of https://en.wikipedia.org/w/index.php?title=Special:AncientPages and see what it turns up. In its current form, running the script on a huge number of articles is not feasible. Per article on average for the vital articles in this list, it cost about 2-3 cents (US) for the AI model and took about 30 s. It's less for shorter articles and there would be ways to bring these number down by optimizing the script, but this would require some work. Phlsph7 (talk) 16:11, 4 February 2025 (UTC)[reply]

in not known

[edit]

One of the examples from User:Phlsph7/AI_spelling_and_grammar_suggestions_for_vital_articles was replacing "in not known". I've searched and found 17 matches on wikipedia and 1 on wikibooks. Two I have left as quotations, 16 I have corrected. Now raised at Wikipedia_talk:AutoWikiBrowser/Typos#in_not_known. I do fix quotations when they are clearly translation errors or I can check the source and the typo is just in wikipedia. But otherwise it is a bit of a grey area and not appropriate for uncontentious minor edits. ϢereSpielChequers 14:21, 3 February 2025 (UTC)[reply]

seize - cease

[edit]

Re "and he felt his heart stop beating and his breathing seize." and the AI comment "Explanation: The word "seize" is incorrect in this context. The correct word is "cease," which means to stop. "Seize" means to take hold of suddenly and forcibly, which does not fit the context of breathing stopping." OK, so the AI doesn't know that engines can seize up? ϢereSpielChequers 19:39, 3 February 2025 (UTC)[reply]

have born in

[edit]

One of the examples from User:Phlsph7/AI_spelling_and_grammar_suggestions_for_vital_articles was adding "been" to "have born in". I've searched and found 22 matches on wikipedia. One I have left as a quotation, the rest I corrected. Now raised at Wikipedia_talk:AutoWikiBrowser/Typos#have_born_in. I've looked at the broader "have born" but they are a mix of correct ones and both "have borne" and "have been born". So not appropriate for AWB but I have put in [what is call ϢereSpielChequers 14:21, 3 February 2025 (UTC)[reply]

what is calls

[edit]

There were only four of these so too few for an AWB rule. I've fixed them and looked at "is calls" generally, it would need a safe phrase for IS calls if we were to regularly patrol it. ϢereSpielChequers 08:27, 4 February 2025 (UTC)[reply]

AI brainstormed mistakes for the AWB typo list

[edit]

@WereSpielChequers: I experimented a little around, asking AI models to come up with mistakes for the AWB typo list. Most items are probably not useful but maybe some are.

Typo list
  • never the less -> nevertheless
  • had went -> had gone
  • a important -> an important
  • could of -> could have
  • should of -> should have
  • would of -> would have
  • must of -> must have
  • might of -> might have
  • its know -> it is known
  • their is -> there is
  • they’re own -> their own
  • your right -> you’re right
  • its’ value -> its value
  • the affect -> the effect
  • a affect -> an effect
  • to loose -> to lose
  • a loose -> a loss
  • less then -> less than
  • more then -> more than
  • different then -> different than
  • in vein -> in vain
  • a entire -> an entire
  • a historic -> an historic
  • a hour -> an hour
  • a unique -> an unique
  • a European -> an European
  • a one -> an one
  • a user -> an user
  • a universe -> an universe
  • a umbrella -> an umbrella
  • a honest -> an honest
  • a honor -> an honor
  • a heir -> an heir
  • a herb -> an herb
  • a hotel -> an hotel
  • a uniform -> an uniform
  • a university -> an university
  • a usual -> an usual
  • a useful -> an useful
  • a union -> an union
  • a unit -> an unit
  • a utensil -> an utensil
  • a UFO -> an UFO
  • a US -> an US
  • a UK -> an UK
  • a URL -> an URL
  • a HTML -> an HTML
  • a HTTP -> an HTTP
  • a FAQ -> an FAQ
  • a MBA -> an MBA
  • a NBA -> an NBA
  • a NFL -> an NFL
  • a NASA -> an NASA
  • a NATO -> an NATO
  • a UNESCO -> an UNESCO
  • a UNICEF -> an UNICEF
  • a iPhone -> an iPhone
  • a iPad -> an iPad
  • a iPod -> an iPod
  • a eBook -> an eBook
  • a email -> an email
  • a URL -> an URL
  • a USB -> an USB
  • a FAQ -> an FAQ
  • a LCD -> an LCD
  • a LED -> an LED
  • a MRI -> an MRI
  • a NATO -> an NATO
  • a UNESCO -> an UNESCO
  • a UNICEF -> an UNICEF
  • a iPhone -> an iPhone
  • a iPad -> an iPad
  • a iPod -> an iPod
  • a eBook -> an eBook
  • a email -> an email
  • a URL -> an URL
  • a USB -> an USB
  • a FAQ -> an FAQ
  • a LCD -> an LCD
  • a LED -> an LED
  • a MRI -> an MRI
  • a NATO -> an NATO
  • a UNESCO -> an UNESCO
  • a UNICEF -> an UNICEF
  • a iPhone -> an iPhone
  • a iPad -> an iPad
  • a iPod -> an iPod
  • a eBook -> an eBook
  • a email -> an email
  • a URL -> an URL
  • a USB -> an USB
  • a FAQ -> an FAQ
  • a LCD -> an LCD
  • a LED -> an LED
  • a MRI -> an MRI
  • a NATO -> an NATO
  • a UNESCO -> an UNESCO
  • was wrote -> was written
  • did went -> did go
  • is builded -> is built
  • are grew -> have grown
  • has ate -> has eaten
  • could of -> could have
  • should of -> should have
  • would of -> would have
  • its affect -> its effect
  • less people -> fewer people
  • most unique -> unique
  • an historic -> a historic
  • different then -> different than
  • amount of peoples -> number of people
  • is grew -> has grown
  • the data suggests -> the data suggest
  • more better -> better
  • real life events -> real-life events
  • could care less -> couldn’t care less
  • irregardless of -> regardless of
  • past history -> history
  • advance planning -> planning
  • is comprised of -> comprises
  • for all intensive purposes -> for all intents and purposes
  • alot of -> a lot of
  • nucular energy -> nuclear energy
  • centre of attention -> center of attention
  • prolly due to -> probably due to
  • supposably true -> supposedly true
  • definate amount -> definite amount
  • expecially common -> especially common
  • bigger then -> bigger than
  • further then -> further than
  • more unique -> unique
  • mis-understood concept -> misunderstood concept
  • non existent -> nonexistent
  • an one-off -> a one-off
  • everyday issues -> daily issues
  • try and -> try to
  • should of been -> should have been
  • would of been -> would have been
  • ain’t been -> hasn’t been
  • ain’t got -> haven’t got
  • all ready -> already
  • its self -> itself
  • every one -> everyone
  • him self -> himself
  • them self -> themselves
  • yourselfs -> yourselves
  • alot time -> a lot of time
  • much expenses -> many expenses
  • defiantly mistaken -> definitely mistaken
  • beleive in -> believe in
  • publically funded -> publicly funded
  • seperate issues -> separate issues
  • embarassingly poor -> embarrassingly poor
  • accomodate needs -> accommodate needs
  • occured suddenly -> occurred suddenly
  • recieve information -> receive information
  • succesful outcome -> successful outcome
  • untill late -> until late
  • writen record -> written record
  • analize data -> analyze data
  • consciencious worker -> conscientious worker
  • experiance gained -> experience gained
  • maintainance costs -> maintenance costs
  • independant study -> independent study
  • goverment agency -> government agency
  • suprise element -> surprise element
  • occassionally occurring -> occasionally occurring
  • enviromental impact -> environmental impact
  • calender year -> calendar year
  • equivilant results -> equivalent results
  • refered frequently -> referred frequently
  • indispensible tool -> indispensable tool
  • preceeding events -> preceding events
  • begining stages -> beginning stages
  • particualrly important -> particularly important
  • pronounciation issues -> pronunciation issues
  • adquate resources -> adequate resources
  • definately true -> definitely true
  • unforseen problems -> unforeseen problems
  • supprise theory -> surprise theory
  • arguement points -> argument points
  • dissapear quickly -> disappear quickly
  • humerous remark -> humorous remark
  • maintainence schedule -> maintenance schedule
  • occuring events -> occurring events
  • realy sure -> really sure
  • wich option -> which option
  • accross the board -> across the board
  • seperate lines -> separate lines
  • referrence material -> reference material
  • ocassionally seen -> occasionally seen
  • perminantly fixed -> permanently fixed
  • in not known -> is not known
  • flowing plants -> flowering plants
  • what is calls -> what is called
  • have born in -> have been born in
  • had went -> had gone
  • between you and I -> between you and me
  • less people -> fewer people
  • could of been -> could have been
  • its a fact -> it's a fact
  • affect the change -> effect the change
  • should of done -> should have done
  • each of them are -> each of them is
  • a historic event -> an historic event
  • in regards to -> in regard to
  • based off of -> based on
  • different than -> different from
  • comprised of -> composed of
  • as best as possible -> as well as possible
  • try and do -> try to do
  • more better -> much better
  • suppose to be -> supposed to be
  • could care less -> couldn't care less
  • less than ten items -> fewer than ten items
  • anyways -> anyway
  • alot of -> a lot of
  • each and everyone -> each and every one
  • for all intensive purposes -> for all intents and purposes
  • the criteria is -> the criteria are
  • the phenomena is -> the phenomenon is
  • should of went -> should have gone
  • a number of is -> a number of are
  • one in the same -> one and the same
  • for all practical purposes -> for all intents and purposes
  • hone in on -> home in on
  • in the meanwhile -> in the meantime
  • on accident -> by accident
  • I seen -> I saw
  • could of gone -> could have gone
  • less than a dozen -> fewer than a dozen
  • in the mist of -> in the midst of
  • the data is -> the data are
  • the media is -> the media are
  • none of them were -> none of them was
  • each of the team members are -> each of the team members is
  • the criteria is -> the criteria are
  • the bacteria is -> the bacteria are
  • the alumni is -> the alumni are
  • the phenomena are -> the phenomena is
  • the data was -> the data were
  • the media was -> the media were

Phlsph7 (talk) 13:27, 5 February 2025 (UTC)[reply]

Thanks working on that, but there are certainly flaws. ϢereSpielChequers 18:27, 6 February 2025 (UTC)[reply]
I could produce more such lists, but it would probably be a lot of fishing to find the few relevant cases hidden in the list of bad suggestions.
Have you checked the cases of "an" followed by a word starting with "u". As far as I know, if the "u" is pronounced "you", the article should be "a" rather than "an". The AI list doesn't always get this right, but there seem to be various candidates with a few hits, like an URL and an UNESCO. Phlsph7 (talk) 09:34, 7 February 2025 (UTC)[reply]

hands or hands

[edit]

Re "Morya also took sanjeevan samadhi by burying himself alive in a tomb with a holy book in his hand." and the suggestion to move that to hands. I suspect the AI is looking at this in the context of typical use of these words, rather than the appropriate use of that word having read the sources for that article. IN that context a lot of the AI suggestions will be wrong. ϢereSpielChequers 18:27, 6 February 2025 (UTC)[reply]

The script does not have access to the sources, so all its suggestions are only based on text in the article. I removed the suggestion from the list, feel free to remove any inappropriate suggestions. Phlsph7 (talk) 09:23, 7 February 2025 (UTC)[reply]

"The year "2024" is likely a typographical error, as it is a future date."

[edit]

See https://en.wikipedia.org/wiki/User:Phlsph7/AI_spelling_and_grammar_suggestions_for_vital_articles#Megacity. (2024 is not a future date) GenericUser24 (talk) 18:44, 10 February 2025 (UTC)[reply]

Thanks for taking a look, I removed the suggestion. Phlsph7 (talk) 09:42, 11 February 2025 (UTC)[reply]