Jump to content

Wikipedia:Authority control integration proposal/RFC

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Maximiliankleinoclc (talk | contribs) at 19:06, 3 July 2012 (Comments). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This proposal covers a plan to incorporate a large number of VIAF authority control identifiers to English Wikipedia biography articles, using the {{Authority control}} template. After an initial period of data-gathering and testing utilising multiple sources the template and VIAF parameter will be added or augmented by bot. This plan is being coordinated by Max Klein, the Wikipedian in Residence at OCLC, and Andrew Gray, the Wikipedian in Residence at the British Library.

Video Summary of the proposal

On youtube.

Summary of the proposal

The proposal was initially discussed on the Village Pump here and has been updated to include the feedback and commentary received during the discussions. While the Village Pump discussion was broadly favourable, it is being formally listed as an RFC in order to ensure clear support from the community before implementation later in 2012.

Authority control is the term used in librarianship, archival practice and related fields for unique identifiers to disambiguate objects (people, places, academic subjects, etc). On Wikipedia, this is handled with the {{authority control}} template, which places the identifiers at the end of the article and links out to library catalogues and central authority databases.

As well as the links for readers, this also embeds information which can be used to help build tools linking back into Wikipedia, or for maintaining its content.

It is widely used on the German Wikipedia (220,000 articles) and on Commons, but only lightly used on the English Wikipedia (4,000 articles). We plan to add a large number of identifiers to the English Wikipedia using data drawn from VIAF and from the German Wikipedia; depending on the level of overlap, this will probably be between 250,000 and 300,000 records. These will predominantly be drawn from the Virtual International Authority File (VIAF), an international project to merge multiple national authority files. VIAF identifiers correspond to identifiers in other systems, and can be used to populate other identifiers in the future.

Using data already embedded within VIAF, as well as on the German Wikipedia, we will identify pairs of corresponding VIAF numbers and articles. After data validation, a bot will add the VIAF number to the article using a reworked version of the {{Authority control}} template.

Frequently asked questions

  1. How do I add a subject's VIAF to the article about them (or mine to my user page)?
    Use {{Authority control}}.
  2. Why use VIAF and not another identifier?
    VIAF is a composite of several existing authority control databases, and so includes all the content from many of the other systems. Any entity with, for example, a LCCN should have a corresponding VIAF number as well, but not every entity with a VIAF number will have an LCCN. Adding VIAF does not preclude the inclusion of other identifiers (and may indeed make it easier); this isn't aiming to impose a sole standard.
  3. Why only people?
    The authority control system does cover other things, but for the moment (written 2013) we are only planning to cover people—this is to simplify the initial program, as well as target the articles where the template is most likely to be useful.
  4. What about errors in VIAF?
    You can report apparent errors in VIAF (or its constituent catalogues) at Wikipedia:VIAF/errors. These are then available to the relevant managing body, and for linkage repair on-Wiki. For the German equivalent noticeboard, see de:WP:PND/F.
  5. What about licensing?
    VIAF is licensed as ODC-BY, which is compatible with Wikipedia licensing; the use of a VIAF URI is sufficient attribution for the terms of the license.
  6. Will this give any control over Wikipedia content to third parties?
    No. While we will be including VIAF identifiers, the content of Wikipedia and VIAF will remain entirely separate. No metadata will be imported automatically from VIAF, nor will Wikipedia need to follow VIAF naming conventions.
  7. What if editors object to the template or the identifier?
    Editors of specific pages will in all cases be free to remove the metadata where it is inaccurate or felt to be editorially inappropriate. For the purposes of Wikipedia:Sanctions, the first revert of an automated or semi-automated addition of authority control information shall not count as a revert.
  8. What about pages covering two people?
    There are many cases where a single article deals with two individuals. If two VIAF identifiers refer to the same article, this will be logged but not added to the article; if it currently contains one but not the other, or a mixture of identifiers referring to both, this will also be flagged.
  9. What about Wikidata?
    Wikidata includes authority identifiers. However, adding the template now allows us to gain the benefit of having this information available before Wikipedia transcludes it from Wikidata ; it also will simplify any future work to add these identifiers to Wikidata.
  10. What about cases where several people have the same name?
    The primary purpose of authority control records is to help distinguish between people with the same (or similar) names. As such, identifiers are usually not matched on the name alone; the software is able to take account of other information such as birth and death dates.
  11. I wrote a new biographical article, how do find the VIAF identifier?
    Thank you for contributing to Wikipedia! You can look up a subject's VIAF at http://viaf.org/ Enter their name as the "Search Terms:", and leave the other parameters at their default values. If there are two or more entries with the same name, check the listed works for a match. If you're not sure which to use, you can ask for advice at Wikipedia talk:Authority control.
  12. I have another question
    Any comments, criticisms, etc. will be gratefully received, again at Wikipedia talk:Authority control.

Responses

Please leave feedback or comments below. More general queries can also be left at Wikipedia talk:Authority control integration proposal.

Support

  1. Tagishsimon (talk) 22:28, 28 June 2012 (UTC)[reply]
  2. DGG ( talk ) 00:45, 29 June 2012 (UTC)[reply]
  3. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:30, 29 June 2012 (UTC)[reply]
  4. Ironholds (talk) 10:46, 29 June 2012 (UTC)[reply]
  5. Nyttend (talk) 13:28, 29 June 2012 (UTC)[reply]
  6. --AndreasPraefcke (talk) 13:42, 29 June 2012 (UTC)[reply]
  7. Wer900talkcoordinationconsensus defined 16:41, 29 June 2012 (UTC)[reply]
  8. SarekOfVulcan (talk) 19:44, 29 June 2012 (UTC)[reply]
  9. --j⚛e deckertalk 22:31, 29 June 2012 (UTC)[reply]
  10. Imzadi 1979  23:02, 29 June 2012 (UTC)[reply]
  11.  Sandstein  06:16, 30 June 2012 (UTC)[reply]
  12. --Jarekt (talk) 11:45, 2 July 2012 (UTC)[reply]
  13. Filminfo 15:50, 2 July 2012 (UTC)[reply]
  14. the wub "?!" 16:17, 2 July 2012 (UTC)[reply]
  15. I support this project for having a large benefit, a low risk of harm, for being able to be undone if it is unwanted, and for the attention its coordinators give to addressing the concerns people have for it. This is a great experiment both in terms of incorporating data into Wikipedia and in terms of transparency in doing something new. I appreciate the commitment which project coordinators and participants have shown to making forthright replies to community questions. I have seen no make a comment or share an idea that makes me think anything other than that this project deserves to proceed. Blue Rasberry (talk) 20:51, 2 July 2012 (UTC)[reply]
  16. Bgwhite (talk) 06:57, 3 July 2012 (UTC)[reply]
  17. Mr impossible (talk) 12:07, 3 July 2012 (UTC) - this already seems to be appearing on Commons and the potential of this improved, linked data is very great.[reply]
  18. sunhai76 (talk) 14:20, 3 July 2012 (UTC)[reply]
  19. Yes please. Specs112 t c 12:36, 3 July 2012 (UTC)[reply]
  20. Some concerns, but outweighed by the benefit. Comments below. LeadSongDog come howl! 13:26, 3 July 2012 (UTC)[reply]
  21. Ruud 14:06, 3 July 2012 (UTC)[reply]
  22. kosboot (talk) 14:07, 3 July 2012 (UTC)[reply]
  23. Whouk (talk) 14:42, 3 July 2012 (UTC) Might (or might or not) be issues down the line with generating Wikipedia content from the established links but we can discuss that as and when. Sounds like there's a lot of thought gone in and real potential for this to be useful.[reply]
  24. Night of the Big Wind talk 15:43, 3 July 2012 (UTC)[reply]

Oppose

Comments

  • I like this idea very much and think it would benefit both readers and researchers using Wikipedia. 64.40.54.97 (talk) 00:19, 29 June 2012 (UTC)[reply]
  • With regards to FAQ question number three, how receptive have the VIAF people been to corrections submitted by the German community? Lankiveil (speak to me) 10:13, 29 June 2012 (UTC).[reply]
    • Good question - I don't know, but I'll try to find out. That said, note that the German noticeboard is submitting corrections to PND/GND at the Deutsche Nationalbibliothek, rather than to VIAF, and so they'll be handled by different organisations. Andrew Gray (talk) 10:36, 29 June 2012 (UTC)[reply]
      • VIAF has a reviews all corrections submitted by an editor. If there they are notified of an error which they agree with (which is mostly and obejctive process) then that correction will appear in VIAF the next time it is updated. Typically VIAF is updated every 6 months to a year. Maximiliankleinoclc (talk) 19:21, 29 June 2012 (UTC)[reply]
      • Have any actual corrections been incorporated though? It's one thing to say "oh, it might happen", but I am concerned that we'll wind up getting a head-pat and some soothing words when we submit corrections, which will leave our articles and VIAF out of sync. Lankiveil (speak to me) 04:05, 1 July 2012 (UTC).[reply]
        • Having spoken to the lead scientist of VIAF, Thom Hickey, I can report that changes have been made, and that there is a commitment from that team to make all submitted changes. The release cycle of VIAF is just that, a cycle, not an editable wiki, so it may take 6 months or more for the changes to eventually be reflected. Maximiliankleinoclc (talk)
  • As I noted in earlier discussion, we should look to moving AC links into infoboxes, where articles have them, during a subsequent phase of this initiative. That will allow them to be included in the emitted metadata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:32, 29 June 2012 (UTC)[reply]
    • Should be pretty easy to move them into infoboxes later, right? Wouldn't it simply mean a bot cutting code from the bottom of the page and pasting it into the infobox? Nyttend (talk) 13:26, 29 June 2012 (UTC)[reply]
      • Yes, but maybe infoboxes like Template:Infobox writer (and possibly some others) could be adjusted beforehand, and the bot could write the info directly there. (At de.wikipedia, the majority has always disapproved of infoboxes for most kinds of people, so we didn't bother to do that. Possibly, in geographic articles and other fields where we do have infoboxes, the authority control data will one day be shown there, but maybe only after the WikiData revolution.) --AndreasPraefcke (talk) 13:40, 29 June 2012 (UTC)[reply]
  • It might be good to amend the FAQ with "What about cases where several people have the same name?" IOW, how are we going to be sure we put the right VIAF id on the right pages? ErikHaugen (talk | contribs) 22:15, 29 June 2012 (UTC)[reply]
  • I would also propose to build tools for updating {{Authority control}} templates on other projects (that use them) for articles linked by interwiki links. This might require closer integration of {{Authority control}} templates at different projects. Eventually I see this as an ideal thing to add to future Wikidata infrastructure which is now being build, so different wikipedias linked by intewiki links can share a single copy of {{Authority control}} data. --Jarekt (talk) 11:56, 2 July 2012 (UTC)[reply]
    • I should have finished reading FAQ, before answering. I can see that Wikidata idea is not unique. ;) --Jarekt (talk) 12:07, 2 July 2012 (UTC)[reply]
      • This would be ideal (certainly no two linked articles should ever have different identifiers). For the moment, we won't run this script on any other languages, but there's nothing stopping anyone from localising it once we know it works. Andrew Gray (talk) 16:57, 2 July 2012 (UTC)[reply]
        • Correct, certainly no two identical (and interwiki'd) articles should have different identifiers. However I have a suspicion that in some cases two articles maybe interwiki'd but in fact the articles have differnt subjects. I could really see this happening for a case like where *in totally fictional example* deWP for John Smith links to [[w:en:John Smith]] but should really link to [[w:en:John Smith (Plumber)]] either because it was never check, or was accurate at some point but then moved and now points to a DAB. That's the difficulty that I've been attempting to explain with using deWP as a validation step. Maximiliankleinoclc (talk) 19:17, 2 July 2012 (UTC)[reply]
  • I'd like to see a clearer explanation of what linkages are being built. Looking at http://viaf.org/viaf/141474549 we see a linkage to the stub-class article http://wikipedia.org/wiki/Stephan_Kekul%C3%A9_von_Stradonitz, and there we find an interlanguage link to http://de.wikipedia.org/wiki/Stephan_Kekule, which is a much better article. Further, at that article, there is an instance of the template Normdaten (Person) which links to http://viaf.org/viaf/57425893/. That VIAF record's history shows it was added to DNB, then to PTBNB, then removed from the DNB, yet it's still listed in the German Wikipedia article. The PTBNB is still there, but doesn't link to any Wikipedia article. The Portugese Wikipedia article at http://pt.wikipedia.org/wiki/Stephan_Kekul%C3%A9_von_Stradonitz doesn't link to the PTBNB, nor the VIAF. (Confused yet?) It is not at all clear that the English-language Wikipedia article should be linked in preference to other languages. We may need either multiple wp articles linked, or else a way to agree between wikis on which single article to link. LeadSongDog come howl! 13:57, 3 July 2012 (UTC)[reply]
    • This is a very interesting example. In fact this goes to address some problems with verifying links against deWP efforts. Here's what happened:
In 2009 VIAF thought that "Stradonitz, Stephan Kekule von" (Portugese entry) and "Kekule von Stradonitz, Stephan" (German entry) were the same person. I think this is just a portugese error in not doing german last names properly, but VIAF recovered from it and matched them and created a cluster number 57425893. Later deWP linked by hand to that VIAF cluster 57425893. Then in 2012 the Norwegian database was added, who have this person cataloged correctly (or at least the same as the Germans) as "Kekule von Stradonitz, Stephan". At this point VIAF identified the exact match of the German and Norwegian names, and deemed the difference of the Portugese one to mean that it was probably a different person since at least two other countries corroborated on the right name. So the German/Norwegian cluster became cluster number 141474549, while cluster 57425893 had its German part removed. This left deWP pointing to the cluster of the wrongly cataloged name. It's not their fault. But what it does mean is that if my bot went to add cluster 141474549 to the enWP article and checked against deWP, it would not match the deWP and classify the mismatch as VIAF clustering error, when in fact it is a Wikipedia linking error. That is one reason not to check the deWP (or treat it as law). The bot that is being proposed here for enWP is going to have a maintenance schedule that will update enWP (and down the road Wikidata) based on diffs, so this sort of things woudln't happen. Maximiliankleinoclc (talk) 19:04, 3 July 2012 (UTC)[reply]