Jump to content

Help talk:Translation

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Creation of this Help page

[edit]

This page was redirect I have created this page by splitting text out of the Wikipedia:Translation page. I did this because some users were confused by the "how to" advise in "Wikipedia" namespace and thought it was a guideline. As the page had diverged from the guidence in WP:TFOLWP this was confusing. Fixing the divergence and moving the "how to" into the "Help" namespace ought to end this confusion.

What I have done is split the Wikipedia:Translation page into two. I have left most of the text there, but have moved the "How to translate" section and the citations section into this page (Help:Translation) (Revision as of 20:44, 15 February 2021 of Wikipedia:Translation, Revision as of 20:42, 15 February 2021 of Help:Translation). -- PBS (talk) 21:46, 15 February 2021 (UTC)[reply]

Proposal to move the Expand language template to Talk pages

[edit]

A discussion about moving the {{Expand language}} template (and its associated templates, {{Expand French}}, {{Expand Spanish}}, and so on) from article pages to Talk pages is taking place at Wikipedia:Templates for discussion/Log/2021 April 16#Template:Expand language. Your feedback would be appreciated. Mathglot (talk) 20:18, 17 April 2021 (UTC)[reply]

Style for translating isolated phrase

[edit]

This article discusses translating an entire article, but it does not address the question of translating a single phrase within an English article. For example, a recent edit to Geometry changed Theorema Egregium (remarkable theorem) to Theorema Egregium ("remarkable theorem"). I have no idea which, if either, is correct, or whether there is a template that generates the approved rendering. Also, if there is an article that discusses this then there should be a hatnote template linking to it, e.g., {{about}}. Shmuel (Seymour J.) Metz Username:Chatul (talk) 12:58, 13 May 2021 (UTC)[reply]

Avoid machine translation

[edit]

Considerable advances in machine translation have been made since the section "Avoid machine translation" was drafted. While careful checking and copy editing of the output is still essential, results often provide a good basis for a new article in English. Knowledge of the source language is of course a major advantage. (cc: TSventon, Rosiestep, SusunW, Dr. Blofeld) --Ipigott (talk) 10:18, 23 October 2023 (UTC)[reply]

Agreed, there should no longer be a major concern on some of the biggest western languages. I don't have experience with Chinese and Japanese so can't say for those, but I would guess very good now too. I wish more people would make an effort to learn languages themselves but not speaking the language absolutely shouldn't be a deterrent for translating. DeepL is of an extremely high standard in particular. ♦ Dr. Blofeld 12:25, 23 October 2023 (UTC)[reply]
Machine translation from Chinese is often still a complete joke full of embarrassing errors, or missing context and abbreviations (recently I read an article about Taipei where Taipei was abbreviated 北市. Google not-so-helpfully translated this as "Beijing"). "Don't use machine translations unless you know exactly what you are doing" is still a good message. —Kusma (talk) 12:34, 23 October 2023 (UTC)[reply]
One problem with machine translation is that it is too easy. If somebody uses DeepL or Google or whatever to translate a foreign language Wikipedia article without checking a single source, the output mimics a decent Wikipedia article but there has been zero source verification. I fear that we won't be able to control the unverified translators if we allow them to copypaste foreign Wikipedias at a large scale. Cleanup of such articles is often more work than writing them from scratch, and a lot less fun. Direct translate copies from foreign Wikipedias are also inferior to using browser translation on an interwiki link (where both the translation and the linked article are likely to improve over time; copying today's version is less likely to result in future improvement). —Kusma (talk) 12:39, 23 October 2023 (UTC)[reply]
Well, yes, you should never translate content from another Wikipedia without checking a source and avoid translating unsourced stuff. But we're talking about quality of translations. ♦ Dr. Blofeld 15:25, 23 October 2023 (UTC)[reply]
We can change Machine translation almost always produces very low-quality results to Machine translation often produces low-quality results if you like, but I oppose a removal of an unedited machine translation, left as a Wikipedia article, is worse than nothing. We need to make it crystal clear that cut & paste machine translating from other Wikipedias is not wanted.
What is great is that machine translation allows people to access more sources, if done with skill (SusunW is a shining example here, especially because she is aware of her limitations and networks with native speakers when necessary). We should promote the use of MT for this, but I do not think we gain by allowing unlimited trans-copypasta. —Kusma (talk) 15:38, 23 October 2023 (UTC)[reply]
"Machine translation almost always produces very low-quality results" is highly inaccurate. "Machine translation may vary considerably in quality of translation from language to language, and caution must be exercised" would be more ideal I think.♦ Dr. Blofeld 15:45, 23 October 2023 (UTC)[reply]
Yes, something like that would work. The following sentences about what can go wrong still sometimes apply, though. —Kusma (talk) 16:21, 23 October 2023 (UTC)[reply]
I would support tweaks to the wording such as changing the heading to "Avoid unedited machine translations". I have put a notice on Wikipedia talk:Translation as that page has more watchers than this one. TSventon (talk) 13:03, 24 October 2023 (UTC)[reply]
My take is that machine translation is valuable but like anything should be used with caution and verified with multiple methods, just like one would do with any type of sourcing. If one doesn't know how to use it, it can be abused. Barring its use serves no purpose; however, if in doubt, ask an expert. SusunW (talk) 15:46, 23 October 2023 (UTC)[reply]
Agreed. The consensus in 2021 (that section was written before then) may have been that unedited/lightly-edited machine translations are worse than nothing, but that's changing for large languages, where translations need less and less work to clean up. There are still different style guidelines and sourcing standards, &c. But that's another issue. If we leave in claims about consensus we should include an "as of" date for the last time there was a consensus check. – SJ + 18:39, 15 December 2023 (UTC)[reply]
I agree, the quality is excellent between main European languages (but not others). I came to wonder whether in some cases, the rule should be inverted : 'Avoid human translation'. Indeed, when I searched a topic and could not find it in my language (French), but in another one, I had the habit of creating a new page containing my translation, thinking it could help others not versed in the other language. I can see that these pages are often poorly maintained. For these languages, shouldn't we instead rely on machine translation, choose a reference page, and translate dynamically on request so that maintainers of all these languages see their efforts shared? But a first step is needed, currently automatic translators don't know about varying Wikipedia conventions (personally, I think most of this variation could be standardized away). Pyschobbens (talk) 10:34, 12 May 2024 (UTC)[reply]
Pyschobbens those are some interesting suggestions, however I think that the problem of updating probably applies to all new articles rather than just translations. For controversial articles it is useful to have a choice of language versions which you can check for bias. TSventon (talk) 12:47, 12 May 2024 (UTC)[reply]

Improving guidance for newbies

[edit]

I'd like to work on making this page more useful for new editors who want to start translating articles and aren't very familiar with wikipedia editing yet. (WP:TRANSLATION can then hold most of the technical information on translation that is more useful to experienced editors.) I expect most of the people who watch this page are very familiar with wikipedia editing, and aren't exactly that target audience, but just in case: does anyone remember what they found difficult when they started translating wikipedia articles? What information would have helped you most? What do you wish someone had told you before you started?

I'll leave this thread open for a few days before doing anything in case anyone has concerns. After that, I plan to make posts inviting input on the various individual country wikiprojects in hope of turning up some more newbie translators. -- asilvering (talk) 17:15, 1 August 2024 (UTC)[reply]

Reorg without rewrite

[edit]

I've gone through this page to figure out what we actually have here, and quickly discovered it had poor organization, epitomized by poor headings and section ordering. I've gone through it stepwise, changing section names, adding new subsection headings, dropping a couple that made no sense, moving like sections to make them adjacent, consolidating some of them, and adding {{Main}} and {{Further}} links throughout to get the reader quickly to more detailed information on vetted policy or guideline pages. With the exception of a major update to section § Attribution, which was both incorrect and incomplete, I've made almost no changes to running text on the page (diff).

I think this gives a clearer view of both the strengths as well as the weak points of this page, which should make it easier to analyze and figure out how to move forward with it (if at all). What it looks like to me, is a grab bag of selections from policy and guideline pages, not always those particularly relevant to translation, more than to any article. What really stands out to me, as someone who has done many translations, is what it does not say, which is a lot of things, including, for example, what happens with all the templates and wikilinks in the original, each of which could have a section on its own. Instead, there is a lot of pointless language covered better, and more accurately, in the actual P&G pages. It's as if the whole thing was written by someone who hasn't done translations, or does them so automatically, they forgot their own process.

Help:Your first article already provides an overview of all the most important policies and guidelines all on one page in an easily understandable format, does a much better job than this page at figuring out which ones to cover and how much, and we don't need another page like that that doesn't measure up. As far as this page goes, the whole thing could be thrown out and replaced with a link to H:YFA, with the sections on §§ Attribution, Citation Templates, and Tools tacked on at the end. Mathglot (talk) 23:58, 14 November 2024 (UTC)[reply]

Avoid machine translation, unless you know the source language

[edit]

This section is a follow-up to the § Avoid machine translation section above that quiesced a year ago. I wanted to encourage the acquisition of data to back up some of our opinions on the subject, in order to help respond to what I believe are some misconceptions or misunderstandings raised there about the quality of machine translation, with a view to ultimately proposing an update to the guideline about MT once we have gathered enough information to back it up.

As someone who does translations from time to time at English Wikipedia and uses MT to save time typing, I am very well aware of the leaps and bounds in the quality of MT output compared to five years ago. It is even better than it was when the discussion above ended a year ago. However, it is very clear to me that MT continues to make occasional very serious errors, so serious that it flips the sense of a sentence on its head, turning truth into falsehood and vice versa. This is quite a low percentage of the time, but you never know when it is going to happen, and I am very strongly of the opinion that:

A Wikipedia editor should not use the output of machine translation (edited or not) unless they know the source language sufficiently to verify the translation.

Given my actual experience with MT in a few modern European languages, this seems so transparently obvious to me that it doesn't need to be stated. But I guess it does, given some of what I read in the discussion above. I am not talking about a translation that is awkward, inelegant, unprofessional, choppy, sounds like a translation, or that needs copyediting for better grammar, style or word choice; I am fine with a crappy English translation (human or bot) if it is comprehensible and faithful to the original on points of fact; monolingual copyeditors can fix up the dross. What I am worried about is MT that gets the facts wrong and turns verifiable statements into unverifiable, truth into fiction, a sentence in language S with a citation that proves it, into a sentence in English with the same citation that fails to verify it, or even proves the contrary. And the point is not how often this happens (infrequently), the point is that a monolingual editor is in no position to know when it does happen, or even to suspect when it does so they can ask for assistance.

And yes, catastrophically bad MT errors still happen in 2025, with Google, with DeepL, and with LLM chatbots, and unless an editor knows the source language at least well enough to know when it is wrong, adding MT output to a Wikipedia article with their username signed to the edit is folly, does a disservice to our readers, makes the encyclopedia worse, and should be prohibited. This is not a wall of shame, so I am not going to list the editors I have run into that have added articles to English Wikipedia translated from dozens of languages, but there are some who have hundreds of such translations. (If you need solid data about this, email me.) This is a serious problem that we should at least try to prevent from getting worse, as a tech-savvy newbie could generate these at bot speed.

My goal in this discussion is to raise this issue again with a view not only to welcome opinions but especially to encourage everyone to record examples of serious MT misfires. Opinions about MT quality are all very well, but the proof is in the data; imho, that is how we can militate for change. In the past, I have seen these topsy-turvy translations often enough that I would have a whole collection of them by now had I stopped to write them down, but I just clucked my tongue or shook my head, laughed, fixed the article, and moved on. But I realize now that that is not enough, and starting now, I am going to record these erroneous translations, and encourage others to do the same. When there appears to be a sufficient number of them to make the point, I will probably go to WP:VPR or some other centralized venue and try to have the bolded statement above or something like it turned into an editing guideline. As things stand right now, if you find a user creating twelve non-stub articles a day from seven languages, there is really nothing you can point to in current guideline or policy that says they shouldn't; as long as they "edited" the output a bit afterward, they are compliant. That is woefully inadequate.

Editors who use MT *must* be able to understand the source language sufficiently to be able to verify that the English translated content is still verifiable, and if they cannot, they should not use MT-generated English content in Wikipedia articles, period. That does not require professional competence in the source language; I rate myself only 1 or 2 in Catalan, but I know when MT screws it up and if there is some tricky question I am not sure of, I know who to ask.

I will start to add examples as I find them, perhaps to a subpage as a kind of worksheet; in the meantime, if anyone already has examples handy, just add them below; please provide MT id, date of translation, source language, source text and location (link when possible), translated text, and any needed additional commentary about what it got wrong. Thanks, Mathglot (talk) 21:10, 21 April 2025 (UTC)[reply]

@Mathglot: are you planning to ping the participants when you have gathered some evidence? Incidentally I put hy:Օքսֆորդի համալսարան (University of Oxford) through Google translate yesterday and it translated Linacre as Leningrad, among other odd college names, I am not sure how to link that. TSventon (talk) 17:53, 22 April 2025 (UTC)[reply]
Eventually, yes, but I don't want to lean on anyone's doorbell just yet when there isn't too much to talk about except unsupported speculation on my part. That said, anyone is welcome to chime in at any time, of course. I've started a worksheet where we can record stuff at Help:Translation/Machine translation errors, but I think I might have to change the name, because I don't really care about most errors, only the ones that should never be allowed into Wikipedia because they change the sense sufficiently to introduce false or unverifiable content where the original language text was fine. I have a suggested format for reporting errors; improvement suggestions welcome. If you could add an excerpt from your Armenian example there, that would be great. The way you link it, is with a google translate url, like this; do you see how I got that? 00:22, 23 April 2025 (UTC)
@Mathglot: I agree that you should gather your evidence first. I looked at your example and have some queries, apologies if I am slow on the uptake.
What is src?
Why not link to the revision like this
I believe the time stamp is 25 mars 2025 à 12:59
When I translate the whole page I get MT: La Manif Pour Tous justifies the use of the term by the use made of it by Najat Vallaud-Belkacem , then general councillor of the Rhône , inAugust 2011 and this despite the fact that the latter returned, inJune 2013, on his statements to affirm that "gender theory does not exist" . [references removed] The phrase "returned ... on" is not idiomatic and might suggest caution. TSventon (talk) 01:31, 23 April 2025 (UTC)[reply]
Thanks for the feedback.
  • It's 10:59 (UTC), which I prefer. Guessing you are on CEST, which would give you 12:59 in summer. A nuance that won't affect reporting, I don't think, but if it becomes an issue, we can review.
  • Src is "source" (as in, source document: web page, book, whatever). The explanation in the model does say, a link to the original source text, but if that isn't sufficient, we could spell it out, or explain it better.
  • Yes, your rev link is better. I usually do it like this: [​[:fr:Special:Permalink/224223413|rev. 224223413]​] ⟶ rev. 224223413 because it doesn't leave an external link icon, but your way is probably easier for most.
  • Differerent output from Google: I think multiple things affect translation output, and I think I should update the reporting model to allow different users to add different results that they got for the same source (with possibly different context). I think it's possible that time of day and server load could affect results (also IP geo location), so having multiple results from different users is helpful.
The translation you got is rubbish, of course (and did you notice they switched her sex?) but not quite sure what you meant by 'caution'. I know what the French says, so I don't need to exercise caution before throwing out the translation Google gave me, and the one it gave you; they are both wrong. In a very paradoxical way, yours is better, quote-unquote, because being very obvious rubbish, hopefully a monolingual user might pause and recognize that it's rubbish and therefore not attempt to add it. Is that what you meant, that a user might see the rubbish and exercise some caution?
That's a lot of weight to put on the word hopefully and I've seen enough of it in use here that it's perfectly obvious to me that many MT-[ab]users never bother to even read the English before publishing it in an article and therefore would never catch even the most glaring mistakes. The Google translation I got was perfectly logical, sensible, grammatical, well-written and styled. It was just false. MT output that is glaringly obviously crap is less likely (but not completely unlikely) to be added to an article, and therein lies the paradox: the well-written (, but completely false) translation is far more likely to end up in an English Wikipedia article. Those are the really dangerous ones, and the ones a monolingual user will never catch. Those are the ones that make me lose sleep. Mathglot (talk) 02:26, 23 April 2025 (UTC)[reply]
I prefer UTC as it is the Wikipedia standard, for some reason my en edit history is in UTC, but I hadn't realised that my fr edit history is in CET summer.
Src seems to be an html abbreviation, but not one I was familiar with.
I generally agree that MT should be used with caution, so I will be interested in seeing what you come up with. Obviously MT users come with a range of skillsets, from a low level understanding of English and the source language combined with a low level of checking to a high level of understanding with a high level of checking. And, yes I did mean that a "user might see the rubbish and exercise some caution". I noticed that the gender seemed to have changed, but was not worried as I was taking a poor translation as it was.
(I should admit that I have been experimenting with doing very simple updates to university articles in a large number of languages. I think it is very likely that many British university articles in foreign languages were machine translations of variable quality, so I don't think I have made them worse.)
On the theme of pious hope, ideally an editor translating your example sentence would also check the reference and listen to the video. TSventon (talk) 03:56, 23 April 2025 (UTC)[reply]
Ha ha, I admit to not having been familiar with the expression pious hope, but now I am, and I can see so many applications for it, not least of all here at Wikipedia. It's tricky to set UTC on the fr-wiki: in Preferences, choose Apparence > Décalage horaire > Fuseau horaire; and then in the drop-down with all the cities choose Autre (décalage horaire avec UTC), and in the input field right below that, type +0:00 and click the blue Enregistrer button. Even when you've done all that, it won't emit the '(UTC)' label and there is nothing to remind you that it is displaying a UTC time as it looks just like any other local time zone time.
I'm not here to finger-wag and I don't care how many articles you might have translated from Inuit and Zulu to English, I'm trying to get data for a policy proposal, and the more you have used it the more likely you are to have some good examples to share. On a previous item: can you add your Armenian example to the worksheet? Thanks, Mathglot (talk) 06:37, 23 April 2025 (UTC)[reply]
I have added the Armenian example: it is funny rather than dangerous and the error could be in the Armenian text or the translation or both. TSventon (talk) 14:52, 23 April 2025 (UTC)[reply]