Jump to content

Module talk:WikidataIB/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Lowercase sigmabot III (talk | contribs) at 05:56, 26 November 2017 (Archiving 1 discussion(s) from Module talk:WikidataIB) (bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Archive 1Archive 2Archive 3Archive 4Archive 5

Not sourced to Wikipedia or wmflabs, and oh crap

if not ref:find("Wikipedia") then refs = refs + 1 end

I believe you also have to check and exclude "wmflabs". It appears that a lot Wikidata items are sourced to wmflabs urls which appear to contain information from Wikipedia. Some of these URLs do contain the string "Wikipedia", but not all of them.

Oh crap. I think I just found a bigger problem. Let's say someone goes to one Wikidata item (United States of America), and they find an unsourced claim of relationship to a second wikidata item (capital: New York City, start time 1785, end time 1790). In this case the claim appears to be a true, but let's set that aside. Let's say they then go that second Wikidata item (New York City) and copy the claim (capital of: United States of America, start time 1785, end time 1790).

That's great - so far. However let's say they then add a reference. Stated in: United States of America.

That second claim is "sourced", but it's citing another wikidata item as a reference.

Would I be correct in guessing that attempting to filter out references from one wikidata item to another wikidata item is going to go "boom"?

And no, you can't fix it by deleting this one junk reference. I don't know how widespread the problem is, but I see a pile of references like this. Alsee (talk) 07:06, 9 October 2017 (UTC)

@Alsee: Can you point to an example of a link to wmflabs please? In the second case, that's an editor mistake that would need to be manually corrected, since you can do things like "Stated in: Keplerian elements for approximate positions of the major planets (Q21128615)" (this is then what {{Cite Q}} develops into a full reference), but that's the kind of thing that can be caught by property contraints (e.g., checking if the Wikidata item is a book/journal/etc. and not a country). Thanks. Mike Peel (talk) 11:22, 9 October 2017 (UTC)
Mike Peel given Wikidata's standards, I'm not sure why refs to other Wikidata items would be fundamentally any less legitimate than refs to Wikipedia. In any case there appear to be a lot of them. In just a matter of minutes randomly clicking through WorldHeritage Wikidata items I ran into four pages containing refs to other wikidata items. In any case, it appears I was right in my oh-crap interpretation. You're basically saying you're unwilling/unable to filter out circular Wikidata refs.
Regarding wmflabs refs: I spent a long time trying to figure out how to use the regular search box to search for wmflabs in wikidata refs, but I couldn't find any way to do so. Am I missing something simple, or is this content really not indexed?!? I finally resorted to teaching myself the wikidata database query language to search refs that way. That's crazy. Here's a query that pulls out some wmflabs refs. Anyway, the large majority of wmflabs hits go to tools.wmflabs.org/heritage. Those are all refs to content extracted from Wikipedia. I also found isolated instances of circular refs to wikidata itself via tools.wmflabs.org/reasonator and tools.wmflabs.org/scholia. There's also tools.wmflabs.org/whois and tools.wmflabs.org/geohack which are god-awful ways to effectively ref external sources. Alsee (talk) 20:50, 10 October 2017 (UTC)
@Alsee: On the first point: you misunderstand. There are Wikidata items holding information about topics, and Wikidata items holding information about references (which, in the case of notable books for example, are the same thing). So by saying "Stated in: Keplerian elements for approximate positions of the major planets (Q21128615)", you aren't referencing the info in that Wikidata item, you are referencing the item that the Wikidata item is *on* (which in this case is a journal article). Wikidata references to info *in* Wikidata entries should be removed - and property constraints is the way this can be done. On the wmflabs refs, I'll have to dig into that, but I suspect it's something to do with Wiki Loves Monuments data (and yes, the search box on Wikidata sadly sucks). Thanks. Mike Peel (talk) 23:05, 10 October 2017 (UTC)
Mike Peel, yep wmflabs.org/heritage is Wiki_Loves_Monuments data pulled from Wikipedia.
Regarding refs to Wikidata items, I fully understood. As a programmer it never crossed my mind anyone would create those kinds of refs, until I saw them. We're addressing practical reality, not conceptual theory. We're discussing machine-readable-data being crowdsourced by non-programmers. Wikidata's design invites the creation of a "Stated in: OtherWikidataItem" type ref. For an average non-programmer, a "Stated in: OtherWikidataItem" ref is about as reasonable as a "Stated in: Wikipedia" ref. For a number of reasons the Wikidata community doesn't seem to have been spotting these refs.
If we want to examine these refs from a pure theory standpoint: <satire>These refs should not be removed, they should just be fixed. The correct ref is "Stated in: Wikidata (Q2013)".</satire> Aside from the ironic Q number, that is a dead-standard Wikidata ref. It correctly identifies the source and it has the expected level detail, reliability, and verifiability.
It looks like there are 22,308 instances of (?X Capitol ?Y) lacking a matching (?Y Capitol of ?X). 22,308 entries I can add, all appropriately referenced as "Stated in: Wikidata (Q2013)". Weee! This is exactly why Wikipedia and Wikidata can't be considered usable sources - you just end up with a citogenesis machine.
Saying these refs should be removed doesn't actually fix anything. I don't have deep knowledge of the property constraint system, but this seems like an intractable problem. Sure you can hunt down countries used in this kind of ref, but a general solution seems infeasible. Even if you try to define a constraint, software still can't detect the difference between information found on Grant's Tomb (Q1025105) and information found in the wikidata item for Grant's Tomb (Q1025105).
If I am overlooking a concrete plan to fix this, let me know. If not, can you admit the "only include sourced Wikidata items" can't/won't be expanded to "source other than Wikipedia or Wikidata"? Alsee (talk) 17:02, 11 October 2017 (UTC)
@Alsee: You're still missing the point, I'm afraid. It's like saying that Wikipedia articles only reference Wikipedia since all of the reference information is stored in the article itself. On Wikidata you can either provide a basic citation for e.g. a news article, in the same Wikidata entry as the information. Or if it's a notable source in itself (say, a book), then you can say "stated in X, page Y", where X is a link to the Wikidata entry with the information about that book - and then you can use it to reference different statements in different entries as appropriate. It's not a "stated in: wikidata" reference any more than a reference here is "stated in: cite web". And note that it is different from "Imported from: English Wikipedia", which I agree is circular and should not be viewed as a proper reference. Thanks. Mike Peel (talk) 20:19, 11 October 2017 (UTC)
@Alsee: Some useful background info (although not as much as would be ideal) is at d:Help:Sources. Thanks. Mike Peel (talk) 20:23, 11 October 2017 (UTC)
@Alsee: On the wmflabs references, I've forwarded your question to d:Wikidata:Project_chat#References_to_wmflabs.3F, as I'm stumped. Thanks. Mike Peel (talk) 20:31, 11 October 2017 (UTC)
Mike Peel I believe I understand your point just fine. I'm not a fan of Wikidata-on-Wikipedia, so I find the problem amusing. Are there any points below where we fundamentally disagree:
  1. These refs do exist.
  2. They were created with the contributor intention of citing information in the Wikidata item.
  3. As programmers, you and I agree that these refs represent a logic-error by the contributor.
  4. As programmers, you and I can understand why random non-programmers might easily make this logic-error.
  5. Most EnWiki editors would like these items filtered out, just like items sourced to Wikipedia are filtered out.
  6. You're unable to filter out these cases without heavily nuking Wikidata-on-Wikipedia.
  7. And just for amusement: Changing them all to "Stated in: Wikidata (Q2013)" would correct the logic-error by the contributor, and it would be roughly consistent with the level of detail and verifiability of countless other "Stated in: Q######" Wikidata refs. Alsee (talk) 23:15, 11 October 2017 (UTC)
The logic-error is essentially a user providing a pointer-to-object when a pointer-to-pointer-to-object is expected. You can hardly be surprised when non-programmers lose track of double indirection. Alsee (talk) 23:51, 11 October 2017 (UTC)
No, the logic error is not that the user has the wrong number of layers of indirection, it is that they are using the wrong property. They should be using inferred from (P3452). And one could easily (I'm not very familiar with the Wikibase lua API) write code that excluded references with that property. {{repeat|p|3}}ery (talk) 00:01, 12 October 2017 (UTC)
{{repeat|p|3}}ery "inferred from" implies a different level of indirection from "stated in". By saying they used the wrong property, you did an even better job than I did of establishing that the logic error was losing track of levels of indirection. Alsee (talk) 14:54, 12 October 2017 (UTC)
P.S. It appears that "Inferred from" didn't even exist[1] until recently. So it was physically impossible for the edits I've seen to have used it. Alsee (talk) 15:59, 12 October 2017 (UTC)
@Alsee: No, your argument is wrong from your second point onwards - the contributor intention at that point is to reference the source that the Wikidata item describes, not to reference the information in the Wikidata entry. You are missing the point I'm trying to make, and you are fundamentally misunderstanding how referencing works on Wikidata. Please, forget your preconceptions, re-read what I've said above, and look at the help pages. Would it help if we talked through Skype or some other communication method at some point? Mike Peel (talk) 00:25, 12 October 2017 (UTC)
Per below, I correct on the second point. Hopefully this fully clears up our misunderstanding, and there's no need to debate 3 through 7? Alsee (talk) 21:38, 12 October 2017 (UTC)
@Alsee: If you are solely meaning a Wikidata editor saying "this information is stated in this wikidata entry" ("stated in: USA wikidata entry"), then I agree. However, if you are referring to a Wikidata editor saying "this information is in the source described at this Wikidata entry" ("stated in: X journal article, page Y"), then I still disagree... Thanks. Mike Peel (talk) 01:59, 13 October 2017 (UTC)
Mike Peel I agree. To really nail the point, I am talking about a ref where a human or a bot copied information from Q1 to Q2. If the information in Q1 is unsourced or was imported from Wikipedia, then the information in Q2 is equally unsourced or equally imported from Wikipedia. Q2 is bypassing your Only sourced filter. My first thought was obviously to expand the filter to block Q2. My second thought was holy crap, that will hit every ref consisting of a bare "Stated in: Q#". (A case like "stated in: X, page Y" is not a problem because it has a qualifier attached.) I think you're stuck. Either you don't filter Q2 and you admit the Only sourced filter is a joke, or you do filter Q2 and drop a huge nuke on importing Wikidata. Alsee (talk) 11:22, 13 October 2017 (UTC)
I think the solution to this is on Wikidata - where such references exist in your Q1/Q2 example, the reference should be removed on sight, and things like property constraints can be used to find them. I think you're arguing about <0.1% of cases of 'stated in', and trying to do so through this module would remove a huge amount of perfectly well-sourced material. Thanks. Mike Peel (talk) 12:20, 13 October 2017 (UTC)
"...a 'Stated in: OtherWikidataItem' ref is about as reasonable as a 'Stated in: Wikipedia' ref." False equivalence. It's more like including in a citation the link A History of the English-Speaking Peoples or The Guardian. That happens all the time on Wikipedia. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:31, 12 October 2017 (UTC)
Mike Peel and Andy Mabbett: I asked the editor who created one of references.[2] They confirmed that I was right. The ref was created with the intention: sourced from OtherWikidataItem. Furthermore the proposal page for creating "Inferred from" states that bot-operators were using "Stated in" in references intended to mean: sourced from OtherWikidataItem. So apparently there are a massive number of bot-created references of this type. No wonder I found so many of them when I randomly flipped through Wikidata items. Alsee (talk) 16:10, 12 October 2017 (UTC)
I'm not clear how that relates to what I wrote, above. Do you dispute that we include links like A History of the English-Speaking Peoples or The Guardian in a citations in Wikipedia? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:29, 12 October 2017 (UTC)
Andy Mabbett, we're not discussing unverifiably vague references to information found somehere-but-I'm-not-telling-you-where in The Guardian.
We are discussing circular refs. We are discussing a Wikidata ref stated in (P248) The Guardian (Q11148) which is not citing The Guardian, a ref citing information found in Wikidata Q11148. The Wikipedia equivalent would be a <ref>[[The Guardian]]</ref> which was citing English Wikipedia's The Guardian article - NOT The Guardian itself.
If someone adds unsourced information to one Wikidata item, and that information is copied to a second Wikidata item with a reference "Stated in: first Wikidata item", no Wikipedia Editor is going to buy Wikidata's claim that the information is now sourced. Alsee (talk) 16:56, 12 October 2017 (UTC)
I think "Stated in: United States of America" is not how the Wikidata property is supposed to be used. Internal Wikidata references are supposed to use "inferred from". Then Wikipedia can decide not to import data with "inferred from" the same way it doesn't import data with "imported from". ChristianKl (talk) 23:46, 13 October 2017 (UTC)
ChristianKl I agree with the theory that "Stated in: other wikidata item" shouldn't be done. However the issue here is reality. Not only have humans been creating these refs, various bots have been creating them en-mass. They appear to be unbelievably common. I ran into about a dozen of them, in just a few minutes of browsing an arbitrary list of wikidata items.
Either we accept that we're using Wikidata and drop the pretense of filtering, or we actually manage that filter properly. Managing the filter would mean expanding it to cover problem-cases as they turn up. That would mean expanding the filter to cover Inferred_from, expanding it top cover upwards-of-a-million Wikipedia-sourced wflabs/heritage refs I found, expanding it to cover circular refs to wikidata items, and expanding it to filter other cases as they are identified. Alsee (talk) 04:40, 22 October 2017 (UTC)

Lists and pen icons

From the documentation:

  • |list=<hlist|ubl> allows multiple returned values to be displayed as a horizontal list (|list=hlist), or a vertical unbulleted list (|list=ubl). These override the separator and do not display the 'pen icon' linked to "Edit at Wikidata"

Why not display the pen icon in these cases, it looks odd without it, e.g. at Trudi Canavan you can't easily tell that 'works' is from Wikidata while 'Awards' is locally defined. Thanks. Mike Peel (talk) 11:26, 25 October 2017 (UTC)

Maximum number of values to retrieve?

Would it be possible to optionally specify a maximum number of values to retrieve? It would be particularly useful when fetching images and coordinates, where we want a maximum of 1 value to be returned as otherwise redlinks appear or values appear over the top of each other. Although we can use PreferredValue, there's no guarantee that this will only return one value. I don't think it matters *which* value is returned, so long as preferred values are chosen over normal rank ones. Thanks. Mike Peel (talk) 21:52, 29 September 2017 (UTC)

That might need to be more tightly specified, Mike. However, I've set up a test in the sandbox just for values that are wikibase-items, using yet another new parameter |maxvals=, which should do nothing if omitted, or blank or less than 1. See how it works with the country (P17) for Geneva (Q71):
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no}} → Switzerland, Republic of Geneva, First French Empire, French First Republic, Republic of Geneva Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=}} → Switzerland, Republic of Geneva, First French Empire, French First Republic, Republic of Geneva Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=0}} → Switzerland, Republic of Geneva, First French Empire, French First Republic, Republic of Geneva Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=1}} → Switzerland Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=2}} → Switzerland, Republic of Geneva Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=3}} → Switzerland, Republic of Geneva, First French Empire Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=4}} → Switzerland, Republic of Geneva, First French Empire, French First Republic Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=5}} → Switzerland, Republic of Geneva, First French Empire, French First Republic, Republic of Geneva Edit this on Wikidata
And with getPreferredValue:
  • {{#invoke:WikidataIB/sandbox |getPreferredValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no}} → Switzerland Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getPreferredValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=}} → Switzerland Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getPreferredValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=0}} → Switzerland Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getPreferredValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=1}} → Switzerland Edit this on Wikidata
  • {{#invoke:WikidataIB/sandbox |getPreferredValue |P17 |fetchwikidata=ALL |qid=Q71 |onlysourced=no |maxvals=2}} → Switzerland Edit this on Wikidata
Can you think of some edge cases to test it with? I'd rather leave it in the sandbox while we check it works and decide whether to implement it for all data types (while we still can). Cheers --RexxS (talk) 23:03, 29 September 2017 (UTC)
Thanks @RexxS, that's exactly what I was hoping for. :-) {{#invoke:WikidataIB/sandbox |getValue |P18 |fetchwikidata=ALL |qid=Q618630 |onlysourced=no |maxvals=1}} doesn't seem to work though - it returns Goldstone Deep Space Communication Complex - GPN-2000-000506.jpg Edit this on Wikidata - perhaps because it's a different datatype? Thanks. Mike Peel (talk) 14:39, 30 September 2017 (UTC)
I've implemented the functionality for images as well in the sandbox now, Mike (so your comment above no longer fits what you saw earlier). Would you have time to test some cases, please? Do you have any opinion on whether we should enable this feature for: (1) all data types; (2) all data types that have handlers written; (3) just specified data types, e.g. images - if so which ones? --RexxS (talk) 15:47, 30 September 2017 (UTC)
Thanks RexxS. I'd suggest that it works for all datatypes if possible, that way it's simpler to document and use. I'll try it out in more cases this afternoon, but so far it seems to be working as expected. Thanks. Mike Peel (talk) 15:58, 30 September 2017 (UTC)
@Pppery: It looks like you've deleted this from the sandbox. :-( Can you add it back please, and/or merge it into the main version? Thanks. Mike Peel (talk) 10:24, 27 October 2017 (UTC)