Jump to content

Wikipedia talk:Authority control integration proposal

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by AndreasPraefcke (talk | contribs) at 21:18, 4 July 2012 (How to use it). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Discussion at the village pump

I and lots of other people provided comments about this proposal at the Village Pump. Those comments are here - Wikipedia:Village_pump_(proposals)#Authority_Control_Integration. Blue Rasberry (talk) 15:14, 19 June 2012 (UTC)[reply]

Interested

I have only scan read the proposal. However I believe the benefits of closer integration with the Library community are immense. At the moment I am just supportive. I have had discussions with Cilip, OCLC here and in the US. I would need time to have this explained to me or to trust others who were working in an agreed way. I'm not sure if this is possible but I think a demonstration of the functionality - even if it only involved .0001 % of the potential would be very useful. Is this partially possible? Victuallers (talk) 18:49, 18 June 2012 (UTC)[reply]

Hi Roger. Feel free to give me a call if you'd like me to run through it offline! Andrew Gray (talk) 12:28, 20 June 2012 (UTC)[reply]

{{Authority control}} on Commons

Maximiliankleinoclc, I was quite active in last year adding {{Authority control}} templates to commons:category:Creator templates and subcategories of commons:category:People by name. Last year I copied over 30k {{Authority control}} templates from German Wikipedia (were they have ~180k of them). Lately we transplanted Commons:Help:Gadget-VIAFDataImporter from wikisource and added it to the list of available gadgets, and now there is a lot of activity of adding {{Authority control}} templates to a lot of pages. With the help of the gadget it can be done with a few clicks. It would be great, if it was be possible to add some more by a bot. You can use such run as a test before running it on Wikipedia. What info do you use to match records? Name and dates of death/birth, or other info? --Jarekt (talk) 19:12, 18 June 2012 (UTC)[reply]

There is Authority control on Commons (45k templates) as well as English (4k), and Deutsche (180k). This just goes to show the extent of the spaghetti. This proposal has Wikidata - the untangler of the spaghetti - in mind from the start. I think when we import to Wikidata having the three separate data sources and checking the will provide a good level of accuracy. That is when we go to put authority control on Wikidata, if three, or two out of three of en, de, and commons, agree on authority control then we can be confident. Maximiliankleinoclc (talk) 17:47, 26 June 2012 (UTC)[reply]

Useful

I've always supported anything in the direction of linking to outside authoritative databases, and also supported content organization that would be useful in converting WP into a fully semantic wiki. My impression is that people at some of the other WPs are far ahead of the enWP in this respect. Certainly the VIAF system is very unfamiliar in the US--in fact, even the LC authority file , whether from LC or via OCLC is still unfamiliar to every non-librarian here, though I have been trying to link to it at relevant AfDs and to use if for sourcing years of birth.. Raising awareness is good, but the initial steps should come slowly and not as a surprise. The suggestion at the village Pump that the bot is not yet ready & that this should be added manually is relevant. DGG ( talk ) 19:15, 18 June 2012 (UTC)[reply]

The way the bot would work is that it takes a list of preexisting articles that are positive matches to VIAF at the moment. That could help automate saving AfD, by automation, and also awareness. Maximiliankleinoclc (talk) 17:52, 26 June 2012 (UTC)[reply]

I strongly support the use of authority control in articles but it must be visible - hidden metadata falls out of step with visible article content, and errors go uncorrected. Furthermore, we should look to embedding these links, where possible, in infoboxes, so that they can be used as UIDs in the metadata emitted by them. I look forward to meeting you at Wikimania! Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:26, 18 June 2012 (UTC)[reply]

Using de.wikipedia's handpicked data

I think the very first step to integrate more authority data to en.wikipedia should be to take the data available via "de:" interwiki links wherever de:Template:Normdaten exists in the German article; and copy that to en.wikipedia. At de.wikipedia we not only have a lot of German GND data, but 116.000 LCCN entries and 142.815 VIAF entries. Since these are (or at least should be) manually checked, they are probably much better than any of VIAF's alogrithm-driven entries in matching persons. --AndreasPraefcke (talk) 10:21, 19 June 2012 (UTC)[reply]

I was having trouble understanding this and I talked to User:Maximiliankleinoclc and user:Andrew Gray about this. They explained to me what you are saying and possibilities for what can be done with German data. For others who are also confused, what this means is that there are two databases, a German and United States one, each of which would be good enough to use. Each would, though have a certain number of errors. If each of these databases where cross-checked for differences, then when they have different data for the same person that probably indicates a problem. If the two databases are cross-checked first, then have the problems identified and fixed, then the resulting fixed data list is the one used for this project, then this project will be using a higher quality data set of higher quality than any individual set. Right? Blue Rasberry (talk) 21:07, 2 July 2012 (UTC)[reply]
Well, since the results of the cross-check also might directly go back into the individual databases the difference might be not as big, since the improvement would be everywhere. Actually, we have two mappings: One performed intellectually (or semi-automated) in de:WP which can be extracted from the Normdaten-templates (~ 140.000 VIAF numbers as noted above inserted in the last couple of years) and a the more recent from VIAF to en:WP which was performed automatically by OCLC and can be extracted from their data dumps at [1], namely the viaf-yymmdddd-links.txt files (~304.000 as of 2012-05-24). These two can be set into correspondence either by exploiting interwiki links to en:WP contained in de:WP articles or vice versa (there will be differences, since the interwiki links are fairly good but not perfect). As a first exploit we identified several thousand articles in de:WP without a Normdaten-template but with an interwiki link to an article in en:WP which also is target of OCLC's mapping (actually restricted to those VIAF clusters which also bear a GND identifier since these are our primary focus at de:WP).
As a second exploit I just processed the articles with Normdaten-template in de:WP (and interwiki to en:WP) and compared their VIAF number (if any) with the one provided by OCLC's mapping (if any): Of 702.000 interwiki links from de:WP to en:WP were 84.000 backed by OCLC's mapping, 51.100 had a VIAF number in de:WP's Normdaten-template, and 9.400 (=18%) of these differed from the one provided by OCLC's mapping!
This could be explained as follows: Some editors in de:WP note the VIAF number corresponding to the GND record. Due to some peculiarities of the GND it is quite typical to have several GND records: One suitable for wikipedia usage (a personalized one) and one (an "undifferentiated" one) which unfortunately tends to enter the main VIAF cluster for that person. Other editors even deliberately prefer VIAF numbers not already implicit because of the already noted GND or LCCN number. And one may speculate that OCLC's mapping mostly succeeded for those clusters containing the LoC-NA record. Thus the huge discrepancy is not a sign of faults but can be interpreted as a successful "external" effort yielding > 9.000 intellectual identifications within VIAF in cases where the automatic match & merge had not succeeded! Of course there are also several minor sources of real errors: interwiki issues (obviously sometimes disambiguation pages are involved which never should) and the fact that some VIAF numbers in de:WP are outdated, i.e. VIAF would redirect the number given to a current one). -- Gymel (talk) 00:02, 3 July 2012 (UTC)[reply]
Thanks for explaining how each database can identify information not present in other databases and thanks also for explaining how this project could result in improvement of the original databases on their next updates. All of this is difficult for me to understand because I have never thought of such things before and I do not yet know enough to be aware of what background for understanding that I might be lacking.
May I ask you to identify yourself on your userpage? You know a lot about this for a new user and I am curious about how you are involved in this project or similar projects. I really appreciate your input on this; although I support the project in principle because I recognize the benefits, I feel less qualified to endorse technical aspects of it simply because I do not understand such things. I am glad to see you here joining the discussion of this. Blue Rasberry (talk) 13:01, 3 July 2012 (UTC)[reply]

VIAF crowdsourcing interface

What is really missing in VIAF, in my opionion, is an easy mechanism to let users merge and split data clusters. It could be restricted to experienced users on application (let's say people working on authority files professionally, or the handful of people doing a lot of authority data work in de.wikipedia and other projects), or, if it doesn't change data "live", at least it could be used as a tool to become aware of clusters that require further looking. --AndreasPraefcke (talk) 10:21, 19 June 2012 (UTC)[reply]

You are right, it shouldn't be as liberal as anonymous. And it isn't you can submit correction to VIAF through the VIAF website suggestions box. Although this topic is a bit outside the scope of the proposal. Maximiliankleinoclc (talk) 17:57, 26 June 2012 (UTC)[reply]
I know. I just jumped at the opportunity... (you don't get to talk to someone concerned with VIAF every day :-) --AndreasPraefcke (talk) 17:59, 28 June 2012 (UTC)[reply]

More ways that VIAF could profit from Wikipedia

Does IMDB licensing allow for this? Maximiliankleinoclc (talk) 17:58, 26 June 2012 (UTC)[reply]
Probably - we do it already! Note that most people/films with IMDB pages already link to them, so his would mainly be shuffling the location of the data around a bit. Andrew Gray (talk) 22:11, 27 June 2012 (UTC)[reply]

Well, maybe OCLC would have to ask the powers behind IMDb; I don't know how the law is in this respect. I cannot imagine IMDb would be adverse to being publicly recognized as professional (which IMHO this database is: it is much better than any of the librarian auth. files as far as actors and directors go), or to get even more traffic from professional environments luike libraries and universities. --AndreasPraefcke (talk) 15:15, 28 June 2012 (UTC)[reply]

VIAF feature request: use Wikipedia autority control as stabilizer

  • When combining data, VIAF should prefer the identifier already in use at (de./en) wikipedia. Alternatively, VIAF could create some bot to to automatically change VIAF entries in Wikipedia whenever a VIAF set is combined and redirected, or at least provide a list of such redirects for Wikipedians to do the bot work. --AndreasPraefcke (talk) 10:44, 19 June 2012 (UTC)[reply]
This is a really good point, and I will include it in the maintenance section of the proposal, to update in time with the VIAF updates. Maximiliankleinoclc (talk) 17:59, 26 June 2012 (UTC)[reply]

Substantive feedback

I'm repling here rather than at Wikipedia:Village_pump_(proposals)#Authority_Control_Integration, because this is getting long and reasonably technical. As a librarian, wikipedian and maintainer of a authority control system not involved in the current proposal I strongly support this work, but not in the current technical implementation. In particular:

  1. The documentation on {{Authority control}} is entirely inadacquite, both because there are many points it doesn't cover (most of which appear to have been raised in this discussion so far) and because it's librarian-centric rather than wikipedian-centric. We also need much more detail on things like romanisation / character mapping differences between the wikipedia approach and that in use for each scheme.
  2. There are a large number of caveats in the documentation for {{Authority control}} which could be checked for automatically. They're not. I'm thinking (deprecated, please use GND) and As of 2012-06-20 for LCCN's starting with "n" and followed by 8 digits this template appears to require the syntax n/99/999999 where 9 is any digit from 0 to 9. and so forth. Data validation is your friend, early data validation is your best friend. See {{Citation}} for good examples of this in practise.
  3. Currently there is a single {{Authority control}} template which does all the work. This makes it impossible for people to add non-trivial new schemes (i.e. schemes with data validation and proper documentation), because non-trivial new schemes take development work and development can't be done on templates with 3 million uses except by gurus. At the most, {{Authority control}} should be an interface to a collection of templates, each of which know everything about a single scheme. That way new schemes could be tested, debugged and added without distrubing 3 million pages. See {{Citation}} for good examples of this in practise.
  4. The current approach is to use VIAF for people and only people. This is completely wrong. The approach should be to use VIAF initially for people and expand it as is feasible. The difference between the appraches in the near term is about planning and coding the templates / bots extensibly.

I hope this helps Stuartyeates (talk) 09:40, 24 June 2012 (UTC)[reply]

Thanks Stuart. I'm drawing up a general RFC on the proposal at the moment, and I think you're right that we need to step back and look at how the whole AC system on Wikipedia is organised as part of this. Bringing it all into one place and updating the documentation would be a great help!
Regarding your specific points, I agree with you entirely on #4 - I think that biographies are the most efficient use of the template, but there's no specific reason to restrict it to them.
  1. 3 is interesting; are you thinking of, say, {{authority control}} as a wrapper which pulls in {{VIAF}} and {{LCCN}} and so on, similar to the way that JSTOR and DOI and so on are handled in the citation templates? Andrew Gray (talk) 22:22, 24 June 2012 (UTC)[reply]
Yep, that's pretty much what I was thinking.Stuartyeates (talk) 20:18, 25 June 2012 (UTC)[reply]
Great. I've explicitly revised the proposal to include reworking the template and the documentation as part of the plan - the template is just about within the bounds of what I'm comfortable doing, technically, so I might enlist some additional support for this part! Let me know what you think of the revised proposal. Andrew Gray (talk) 21:34, 25 June 2012 (UTC)[reply]
The good thing about breaking the templates down is that it makes it much easier to experiment with them and push our collective limits. Also, sane questions at Help_talk:Template seem to get answered. I'll make some (explanatory) tweaks to the proposal. Stuartyeates (talk) 23:18, 25 June 2012 (UTC)[reply]

Very nice idea about abstraction layers. Will implement. Maximiliankleinoclc (talk) 18:08, 26 June 2012 (UTC)[reply]


Authority data may not be trivial, but it's not rocket science. We have a long experience with that template (de:Template:Normdaten) on de.wikipedia, and it's pretty easy to use and didn't make much problems at any time. That said, I don't get your point #1: of course, it's librarian-centric, because that's what authority files have been so far. We may imagine (or start to develop) our own, better authority files, but to link with the _existing_ ones (invented and maintained by librarians), we need to adopt their thinking. What do you mean by romanisation approaches? We don't include names or any difficult characters in the template, but only identifiers. Well, the LCCN is kind of complicated (that is beacuse the ******s (fill in foul language of your liking) at LOC can't make up their minds which scheme of their LCCN to use themselves. But even the LCCN only has one or two English letters (n or no) and a couple of numbers. GND has numbers and an occasional X or "-". VIAF is even numbers only. Even the Japanese NDL is a simple number with some leading zeros. --AndreasPraefcke (talk) 18:34, 25 June 2012 (UTC)[reply]
it's librarian-centric, because that's what authority files have been so far — Yes, but the documentation here is for wikipedians not librarians and so long as it uses librarianship terms of art without explanation, it will remain obscure to them. Stuartyeates (talk) 20:18, 25 June 2012 (UTC)[reply]
What do you mean by romanisation approaches? We don't include names or any difficult characters in the template, but only identifiers. — Yes, we include only the authority control link, but wikipedians expect to be able to check that what's at the end of the link matches what they expect and any discrepensies need to be explained. There are english language wikipedia processes such as Wikipedia:Good articles which involve every link being manually checked by an uninvolved editor and the documentation needs to support them in that. Stuartyeates (talk) 20:18, 25 June 2012 (UTC)[reply]
The examples you give of different schemes and their range of identifers reassures me that error-checking in the tamplate is possible. Stuartyeates (talk) 20:18, 25 June 2012 (UTC)[reply]
Thank you for the clarification. Whether or not to actually show the authority data within Wikipedia articles will always be in debate, I guess. Since they are metadata, I'd prefer some visualisation like our interwiki links, at a place that makes it clear that this is "professional" stuff and not really necessary for enjoying a good article. --AndreasPraefcke (talk) 08:18, 26 June 2012 (UTC)[reply]
In my experience you are likely to experience considerable editor resistence on the English-language wikipedia to any effort framed in terms of machine-readable data. It needs to be framed either in terms of improving the English-language wikipedia pages as experienced by new editors (i.e. those not logged in) or alternatively improving inter-wiki links to minority languages (i.e. outside the top 10 or 20 wikis). A number of efforts to improve machine-readability have floundered because editors of the English-language wikipedia overwhelming favour improving human editability over machine-readability. Stuartyeates (talk) 08:27, 26 June 2012 (UTC)[reply]
That's valuable input. If this turns out to be a deal-break we can resort to hidden metadata, and then it can be viewed in a more visually pleasing way once the Wikidata integration has occurred. Maximiliankleinoclc (talk) 18:11, 26 June 2012 (UTC)[reply]
Alas hidden metadata isn't a solution to this problem, since it's still there when a user hits the edit button, contributing to the complexity encountered by a newbie editor (assuming that you mean hidden metadata in the persondata sense). The solution is to base the RFC on arguements like disambiguation and assistance in interwiki links, while downplaying the linked data aspects. Stuartyeates (talk) 23:49, 26 June 2012 (UTC)[reply]
I understand your concern, but this battle is long lost, I think. The article on the U. S. begins with {{Hatnote|This article is about the United States of America. For other uses of terms redirecting here, see [[US (disambiguation)]], [[USA (disambiguation)]], and [[United States (disambiguation)]].}} <noinclude>{{pp-semi|small=yes}}{{pp-move-indef}}</noinclude> {{Infobox country |conventional_long_name = United States of America. Not really "wiki wiki" as it used to be in 2003. But even then, the source code was already a mess as far as usability goes ([2]. --AndreasPraefcke (talk) 11:12, 28 June 2012 (UTC)[reply]

Use of Albert Einstein as an example

In a number of places Albert Einstein is used as an authority control example. While he is an example of an author who is well known and everywhere, I suggest picking harder examples, to prove the versitility of authority control. Yasunari Kawabata, someone from Category:Burmese writers, Category:Thai writers or similar. These are the hard examples, involving radically different cultures and scripts, proving that our schemes have what it takes. Alternatively if some of the schemes can't handle these people and their real names, we need to document that. Stuartyeates (talk) 20:25, 26 June 2012 (UTC)[reply]

Mohandas Karamchand Gandhi seems a good example. VIAF 71391324 has primary entries in three of the four scripts VIAF has records in (Latin, Arabic & Hebrew - the missing one is Cyrillic, contributed by Israel and Moscow), and the 400s alternate section includes a wide variety of romanisations, plus alternates using Japanese, Devanagari and Gujarati scripts.
It's a bit hit-and-miss what does and doesn't turn up in the 400s; some seem to have a lot of scripts, some very few. Fyodor Dostoyevsky is an interesting example - VIAF 104023256 - as it has all four languages in the 100s, and as alternate headings, Japanese, Greek and Chinese plus an enormous variety of different romanisations. Andrew Gray (talk) 22:49, 26 June 2012 (UTC)[reply]
I've changed to Dostoyevsky. Feel free to switch it for someone with even more variety... Andrew Gray (talk) 22:54, 26 June 2012 (UTC)[reply]
Yes, a much better example. Stuartyeates (talk) 23:05, 26 June 2012 (UTC)[reply]

Convincing use-cases for authority control

I've been looking at some of the previous attempts to do automatic interlinking of wikiepdia with external things and I've come to the conclusion that we're going to need some clear, detailed use cases of how the authority conrtrol can benifit wikiepdia in and of itself. I suspect we can come up with a couple of "with this information we can write a toolserver tool to do X" scenarios. Stuartyeates (talk) 04:56, 27 June 2012 (UTC)[reply]

We do not need to demonstrate benefit to Wikipedia "in and of itself". Benefiting Wikipedia's readers and/or data re-users will be more than adequate justification. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:17, 27 June 2012 (UTC)[reply]
Various things come from the embedded identifiers:
  • Reliable linking from external services - we can build lookup services. de.wp has an elegant PND tool - http://toolserver.org/~apper/pd/person/pnd-redirect/de/118768581 - which takes you to the article represented by that PND (and, optionally, allows you to select a preferred language - http://toolserver.org/~apper/pd/person/pnd-redirect/en/118768581 does the same thing pointing to en.wp, using interwikis). Being able to have this work for a large number of records rather than a few thousand makes it a practical tool, and allows people to automatically generate these links to Wikipedia, use the API to pull out leads from articles for reuse in other sites, etc.
  • Extend the scope for checking metadata - we already have methods, such as the Death anomalies project, for comparing the metadata between Wikipedia language editions and spotting inconsistencies. Including identifiers which tie into external services, with reliable APIs, mean that we have a lot of reliable data to compare articles to.
  • Return metadata to the outside world - working backwards from this, once we have embedded identifiers, the curators of this metadata will find it a lot easier to incorporate information from Wikipedia, taking advantage of our fairly fast update cycle for things like death dates.
  • Identifying alternate names - particularly for non-standard transliterations, the alternate headings in authority files give us an extensive and curated collection of variants of names. The linkage will help the creation of redirects.
  • Content creation support - the presence of the identifiers makes it possible for people to, eg, develop scripts to generate author's bibliographies for articles and so forth. I'm not recommending we do this now, but it's the sort of practical innovation we're allowing for.
...plus the various benefits from the links in the template itself, such as being able to jump directly to WorldCat and thus to relevant material. Andrew Gray (talk) 12:44, 27 June 2012 (UTC)[reply]
Let me add one example for 1: The German National Library uses our data (a list of GND and Wikipedia articles, provided daily at [3]) to link back from its catalog to Wikipedia. See http://d-nb.info/gnd/118527053 (to the right: Zugehöriger Artikel in Wikipedia).
However, the best system for in- and outward links to/from Wikipedia using authority identifiers is our own BEACON format. See de:Wikipedia:BEACON or meta:BEACON. It allows any publisher of an online database to provide its own simple file with identifiers that will provide a search result in its database (library catalogue, biographical dictionary and so on). Even if such concent providers don't use authority data, their websites can be tagged by third parties (as long as the records have some persistent URL that we can use). Look at the Dostoevsky results of such a search using all available BEACON files: http://beacon.findbuch.de/seealso/pnd-aks?format=sources&id=118527053 Our own "Wikipedia Personensuche" that is linked in all authority control templates of biographies in de.wikipedia also uses these BEACON files (not all of them yet, though), see http://toolserver.org/~apper/pd/person/Fjodor_Michailowitsch_Dostojewski –-AndreasPraefcke (talk) 11:04, 28 June 2012 (UTC)[reply]

Multiple authority control templates in a single article

I'm struggling to understand how this is going to work in terms of semantics. I think it's best to start with 1:1 relations, because anything else risks making the Authority control framework incompatible with some semantic frameworks / approaches. The restriction would only apply to article space, naturally, and not to pages such as: Wikipedia:Project_Gutenberg_author_list/Page_01, which is list of authors for whom pages are yet to be created (there are probably more such pages under Wikipedia:WikiProject Missing encyclopedic articles). Stuartyeates (talk) 20:06, 28 June 2012 (UTC)[reply]

I spent today mulling over this, and on the whole I think I agree. Multiple cases should be logged in a separate report (I expect it will mostly cover sibling and co-author pairs), pending a clear decision on how we handle these cases - it's something which a lot of our metadata has trouble with. If we have corporate identifiers for the multiple-individual entity, on the other hand, no problem including those directly. Andrew Gray (talk) 20:45, 28 June 2012 (UTC)[reply]

I don't know en.wikipedia well enough, but I can say what we do at de.wikipedia. We have not really solved this problem, but we use a workaround. For co-author pairs that have a single article on de.wikipedia but are two people, we have redirects for each name that get categories like a normal biography, and have the authority template (which apparently can then be machine-read from the dump like the others). The data is not really useful for Wikipedia readers there, but at least it makes automatic linking from other sites possible. If the co-author pair (like de:Clegg & Guttmann) or a group pseudonym (like de:Nicci French) or a band or comedy act consisting of two people (whose biographies are covered in the band article only) have a separate authority record, this one is additionally put in the article itself. --AndreasPraefcke (talk) 21:13, 28 June 2012 (UTC)[reply]

Clever! How does your persondata work for this - do you add persondata to the redirects? I know we do something similar with categorised redirects sometimes, but it's not a systematic thing. (yet?) Andrew Gray (talk) 21:25, 28 June 2012 (UTC)[reply]
Yes, "Personendaten" is also added to the redirect. That said, with artists' duos, co-authors and the likes, I'd personally prefer not to use redirects at all, but have stubs that have all the necessary data and a short definition, plus some prominently placed link to where the real information is. Usually, people won't really search for those single people, but giving their whole biographies in the "band article" is often enough just a bit too much anyway, and these stubs could be linked from the "band article" anyway. There are other instances of double authority data for a single article, though, where redirects may indeed be the best choice. For example, churches are part of the GND as "geographic names" (being a building), but the corresponding parish is usually some sort of "organization" with a separate GND number. For complicated things like that (about which the wikipedia reader will never really care for), we'd either need to get away from the 1:1 relation of article and authority record, or we may just as well use such redirects (e. g. "Christ Church (Xtown)", redirected from "Parish of Christ Church (Xtown)"). --AndreasPraefcke (talk) 23:08, 28 June 2012 (UTC)[reply]
It's almost like our project doesn't have a very clear structured taxonomy, for some reason ;-). Andrew Gray (talk) 10:31, 29 June 2012 (UTC)[reply]
Thanks for explaining this, AndreasPraefcke. Blue Rasberry (talk) 21:01, 2 July 2012 (UTC)[reply]

French project

Just noticed this discussion on fr.wp: fr:Wikipédia:Le Bistro/3 juillet 2012#Sondage sur notices d'autorité bibliographiques. Per fr:Discussion modèle:Autorité they seem to be primarily pulling from de.wp data; there might be some opportunity for pooling effort here. Andrew Gray (talk) 12:46, 3 July 2012 (UTC)[reply]

How to use it

There are occasions when I have found two articles about the same person, when the persons are known by more than one name. This might be the case for persons from history, for persons from countries with non-Latin alphabets and varied transliterations into English, and for authors versus pen-names. One example is Prince Henry, Duke of Cornwall which was merged to Henry, Duke of Cornwall. How, exactly, would one look up the "Authority control" number for the subject of a new biography, to see if there is an existing bio for that person? Could a bot create a list of articles about the same entity under different names or spellings, so that a merger could be considered, just as the two aforementioned articles were merged? (Some conspiracy theorists will have a field day with the notion that Wikipedia is coming under "Deep Authority Control Integration.") Edison (talk) 15:41, 4 July 2012 (UTC)[reply]

Generating such lists by bot will be no problem at all, I think. A database of all "persondata" and "authority control" data can be maintained on the toolsever. We do that for de.wikipedia data for years now (it is the database behind http://toolserver.org/~apper/pd/ ), and the whole thing is being re-programmed to serve even more purposes right now; so I think there might be a chance that the same thing can easily be adapted to en.wikipedia needs. Well, not "we" do it, acatually en:User:Appper did the whole work. But I think it shouldn't be any problem to get lists of double entries of authority data from such a database on a regular basis. --AndreasPraefcke (talk) 21:18, 4 July 2012 (UTC)[reply]