Wikidata:Project chat: Difference between revisions
→Data Cleanup: Reply |
|||
Line 546: | Line 546: | ||
[[User:Rodriguez.UW|Rodriguez.UW]] ([[User talk:Rodriguez.UW|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:50, 10 July 2024 (UTC) |
[[User:Rodriguez.UW|Rodriguez.UW]] ([[User talk:Rodriguez.UW|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:50, 10 July 2024 (UTC) |
||
:I fully support these changes and am committed to participating in their implementation. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:25, 10 July 2024 (UTC) |
:{{Strong support}} I fully support these changes and am committed to participating in their implementation. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:25, 10 July 2024 (UTC) |
||
:This is a great idea, I fully support it. [[User:Brimwats|Brimwats]] ([[User talk:Brimwats|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 01:25, 11 July 2024 (UTC) |
:{{Strong support}} This is a great idea, I fully support it. [[User:Brimwats|Brimwats]] ([[User talk:Brimwats|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 01:25, 11 July 2024 (UTC) |
||
:This seems like a bad idea. Pronouns are lexicographical data and should exist as lexemes. If the lexeme data is inconsistent, it can and should be improved. Items are not inherently more consistent than lexemes, and creating new entities instead of fixing the existing ones won't make the data better - it will actually result in duplication of data, which then leads to problems with data getting out of sync. |
:This seems like a bad idea. Pronouns are lexicographical data and should exist as lexemes. If the lexeme data is inconsistent, it can and should be improved. Items are not inherently more consistent than lexemes, and creating new entities instead of fixing the existing ones won't make the data better - it will actually result in duplication of data, which then leads to problems with data getting out of sync. |
||
:It's completely possible to have multiple lexemes with the same subject form and different object forms (e.g. {{L|L304659}} vs {{L|L304660}}). If your objection is that the links don't display the object form, that is a data display issue, not a problem with the data itself. |
:It's completely possible to have multiple lexemes with the same subject form and different object forms (e.g. {{L|L304659}} vs {{L|L304660}}). If your objection is that the links don't display the object form, that is a data display issue, not a problem with the data itself. |
||
Line 557: | Line 557: | ||
::Also pinging @[[User:BlaueBlüte|BlaueBlüte]] who already responded to your proposal on [[Wikidata talk:WikiProject Personal Pronouns]] back in May. - [[User:Nikki|Nikki]] ([[User talk:Nikki|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 04:38, 11 July 2024 (UTC) |
::Also pinging @[[User:BlaueBlüte|BlaueBlüte]] who already responded to your proposal on [[Wikidata talk:WikiProject Personal Pronouns]] back in May. - [[User:Nikki|Nikki]] ([[User talk:Nikki|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 04:38, 11 July 2024 (UTC) |
||
::@[[User:Nikki|Nikki]] Lexemes could be linked from pronoun set items as parts, but as we stated in this proposal, individual lexemes are not enough to go on much of the time to identify pronoun sets. We gave examples and fully explained why lexemes are not appropriate or sufficient for personal pronoun sets. These examples aren't different senses of the same word, but often distinct words in different senses. Lexemes would still exist, but would not be used as values for this particular property. Rather, they would be parts of sets. Which is how people use them. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:52, 11 July 2024 (UTC) |
::@[[User:Nikki|Nikki]] Lexemes could be linked from pronoun set items as parts, but as we stated in this proposal, individual lexemes are not enough to go on much of the time to identify pronoun sets. We gave examples and fully explained why lexemes are not appropriate or sufficient for personal pronoun sets. These examples aren't different senses of the same word, but often distinct words in different senses. Lexemes would still exist, but would not be used as values for this particular property. Rather, they would be parts of sets. Which is how people use them. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:52, 11 July 2024 (UTC) |
||
:I support the goals of this project. Wikidata editors should not be supplying {{P|21}} based on a person's pronouns. And {{P|21}} should not be used to supply {{P|6553}} without references that document a person's choice of their pronouns. The more that can be done to prevent misgendering people in Wikidata, the better. [[User:AdamSeattle|AdamSeattle]] ([[User talk:AdamSeattle|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:11, 11 July 2024 (UTC) |
{{Strong support}}:I support the goals of this project. Wikidata editors should not be supplying {{P|21}} based on a person's pronouns. And {{P|21}} should not be used to supply {{P|6553}} without references that document a person's choice of their pronouns. The more that can be done to prevent misgendering people in Wikidata, the better. [[User:AdamSeattle|AdamSeattle]] ([[User talk:AdamSeattle|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:11, 11 July 2024 (UTC) |
||
:{{Strong oppose}} I don't understand why "Change data type from Lexeme to Wikidata Item", it would be a lot of work for no gain (I would even say it would be a loss, as the data would be poorer). Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 11:59, 11 July 2024 (UTC) |
:{{Strong oppose}} I don't understand why "Change data type from Lexeme to Wikidata Item", it would be a lot of work for no gain (I would even say it would be a loss, as the data would be poorer). Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 11:59, 11 July 2024 (UTC) |
||
::The gain would be solving the problem we point out here: "Individual pronouns as lexemes don’t account for cases where a principal pronoun isn’t enough to go on to identify a pronoun set (for instance, “ze/zir” vs. “ze/hir”). Therefore, lexemes cannot be consistently implemented with accuracy". It would not be very much work, and we laid out a very detailed implementation plan for doing the work. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:37, 11 July 2024 (UTC) |
::The gain would be solving the problem we point out here: "Individual pronouns as lexemes don’t account for cases where a principal pronoun isn’t enough to go on to identify a pronoun set (for instance, “ze/zir” vs. “ze/hir”). Therefore, lexemes cannot be consistently implemented with accuracy". It would not be very much work, and we laid out a very detailed implementation plan for doing the work. --[[User:Clements.UWLib|Crystal Yragui, University of Washington Libraries]] ([[User talk:Clements.UWLib|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:37, 11 July 2024 (UTC) |
Revision as of 20:16, 11 July 2024
Wikidata project chat A place to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please use
|
- Afrikaans
- العربية
- беларуская
- беларуская (тарашкевіца)
- български
- Banjar
- বাংলা
- brezhoneg
- bosanski
- català
- کوردی
- čeština
- словѣньскъ / ⰔⰎⰑⰂⰡⰐⰠⰔⰍⰟ
- dansk
- Deutsch
- Zazaki
- dolnoserbski
- Ελληνικά
- English
- Esperanto
- español
- eesti
- فارسی
- suomi
- føroyskt
- français
- Nordfriisk
- galego
- Alemannisch
- ગુજરાતી
- עברית
- हिन्दी
- hrvatski
- hornjoserbsce
- magyar
- հայերեն
- Bahasa Indonesia
- interlingua
- Ilokano
- íslenska
- italiano
- 日本語
- Jawa
- ქართული
- қазақша
- ಕನ್ನಡ
- 한국어
- kurdî
- Latina
- lietuvių
- latviešu
- Malagasy
- Minangkabau
- македонски
- മലയാളം
- मराठी
- Bahasa Melayu
- Mirandés
- مازِرونی
- Nedersaksies
- नेपाली
- Nederlands
- norsk bokmål
- norsk nynorsk
- occitan
- ଓଡ଼ିଆ
- ਪੰਜਾਬੀ
- polski
- پنجابی
- português
- Runa Simi
- română
- русский
- Scots
- davvisámegiella
- srpskohrvatski / српскохрватски
- සිංහල
- Simple English
- slovenčina
- slovenščina
- shqip
- српски / srpski
- svenska
- ꠍꠤꠟꠐꠤ
- ślůnski
- தமிழ்
- తెలుగు
- ไทย
- Tagalog
- Türkçe
- українська
- اردو
- oʻzbekcha / ўзбекча
- Tiếng Việt
- Yorùbá
- 中文
![]() |
On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page's archive index. The current archive is located at 2025/04. |
Why If I add subclass of (P279) with for example history of Berlin (Q679741) then value-requires-statement constraint (Q21510864) pop up? For example, it pop up at history of trams in Berlin (Q1514212) while it not pop up in history of trams in Barcelona (Q11925955). Eurohunter (talk) 06:18, 2 July 2024 (UTC)
- @Eurohunter You have to make sure there is a complete hierarchy of classes. In the example you have given, Q1514212 has class Q679741, but Q679741 needs to have some class too... I suggest Q122131 be added there as P279. Vojtěch Dostál (talk) 13:26, 2 July 2024 (UTC)
- @Vojtěch Dostál: Thanks. Eurohunter (talk) 12:14, 6 July 2024 (UTC)
iodine in medicine (Q28196266) used for many medicine-related topics on iodine, instead of iodine (Q1103) or perhaps another (new) item
I noticed that this item is linked not only as an antiseptic but also for many other medical topics. Its description only mentioned "antiseptic" and I've added the prevention and treatment of iodine deficiency, based on its page linked from WikiProjectMed. The mistake may arise from the fact that it's disambiguated in the English- (and several other) language Wikipedia(s) as "iodine (medical use)". I think all other medical uses (e.g. radioactive iodine therapy (Q13233408)) should link to either to iodine as an element (iodine (Q1103)), or to a new item created for this purpose, but the antiseptic (and possibly the deficiency-preventing) use shouldn't be conflated with the radioactive or other medical means of using it. Adam78 (talk) 21:08, 2 July 2024 (UTC)
- @Adam78 Is iodine as antiseptic in any way chemically different from the iodine element? If not, all such links should point to iodine (Q1103) and Q28196266 should instead be facet of (P1269) of iodine (Q1103) or something of that sort. A similar example is calcium in biology (Q60097). Vojtěch Dostál (talk) 11:25, 4 July 2024 (UTC)
Railway junctions: Q24045957 vs Q336764
I'd be grateful if anyone could help me distinguish railway node (Q24045957) and railway junction (Q336764) -- both used specifically for railway junctions, and distinct from railroad switch (Q82818) and the more general junction (Q1777515).
There seem to be two different concepts here, at least in German, but I'm not entirely seeing how they should be named in English to express the difference, or whether articles in the various different language wikis are all connected to the correct item.
Which would be most appropriate for a location where one linear ELR railway line section (Q113990375) of track (perhaps 50 km long, double-track) meets another such section? Jheald (talk) 22:10, 2 July 2024 (UTC)
Notified participants of WikiProject Railways. (I did ask on the talk page there a couple of years ago, but it didn't get any responses.) Jheald (talk) 22:14, 2 July 2024 (UTC)
- I can explain these from the Czech point of view, but the explanation is similar for all countries in the central Europe (Poland, Germany, Slovakia etc.). At thirst railway node (Q24045957) is very big (hundreds of switches) and railway junction (Q336764) is very small (sometimes only one switch, but usually not more than four switches). railway node (Q24045957) express connection of lot of railway lines usually in one town/city. E.g. železniční uzel Praha (Prague junction) consists of all railway station in Prague (Q1085), in which all railway lines leading to this big city are connected. railway junction (Q336764) is usually a place where one railway line splits into two railway lines and it is not railway station (Q55488), so if the railway lines are with one track, then one switch can be enough. In Czechia and Poland it is also a place on the double track line between two stations, where are 4 switches to go from the left to the right track and vice versa (the same place is Slovakia (till 2000 also in Czechia) is classified as passing loop (Q784159)). But I have no idea how to name these different places in English. When I translate it, I usually use "junction" for both, although they have completely different meanings. --Cmelak770 (talk) 06:54, 3 July 2024 (UTC)
- In Germany we strongly distinguish between free track (Q1302250) which roughly means track which is not part of a railway station (Q55488) and tracks that are part of a railway station (Q55488). railway junction (Q336764) is a junction, that is not part of a railway station (Q55488). As far as I know, (most?) english speaking countries don't have this concept free track (Q1302250), so this may not be easy to translate. --Trockennasenaffe (talk) 06:09, 4 July 2024 (UTC)
- Asked at en:Wikipedia_talk:WikiProject_UK_Railways whether anyone there can suggest better English-language labels / descriptions Jheald (talk) 12:56, 7 July 2024 (UTC)
- @Cmelak770, Trockennasenaffe: After some input from en-wiki and ChatGPT, I have updated the label/description for Q24045957 to "railway node" = "significant location in the railway network which may encompass multiple connected lines, stations, and facilities, often a major transit hub".
- I also considered "rail hub" or "railway hub", which I think better captures the sense of the articles for Q24045957 in cs-wiki and ru-wiki, and of items like Prague rail hub (Q12046953) and Brno railway hub (Q20860267); however the concept described in the de-wiki, nl-wiki, and it-wiki articles seems not necessarily to be on such a large scale.
- railway junction (Q336764) would then be for specific locations of track divergence, usually not in stations.
- This could probably still be improved or refined, but it may be a start at least. Jheald (talk) 20:01, 9 July 2024 (UTC)
- Thanks, I think the English description is quite correct now. Cmelak770 (talk) 13:00, 10 July 2024 (UTC)
Ingest of SEC EDGAR data into Wikidata?
I have recently noticed that many company infoboxes on Wikipedia are frequently out of date, even though they draw from Wikidata for many values like yearly results. All of this data is available online through the SEC's EDGAR system, at least for publicly traded companies in the US, so I was wondering whether it would be worthwhile to write a bot that would read SEC data and update Wikidata with it?
Botlord (talk) 19:18, 3 July 2024 (UTC)
- @Botlord: that sounds like a great idea - if you are proposing to do it yourself, the general procedure is to write the code, test it on a small number of items, and then ask for bot status approval for a bot account to regularly run it on at Wikidata:Requests for permissions/Bot. If you are hoping somebody else will do it then Wikidata:Bot requests is the place to start. ArthurPSmith (talk) 20:25, 3 July 2024 (UTC)
- @BotlordYes, that would be nice and useful for a lot of infoboxes on various wikis. I think it would definitely be possible to do the mapping using XBRL. Feel free to discuss any kind of details at Wikidata talk:WikiProject Companies. Jklamo (talk) 08:08, 9 July 2024 (UTC)
Conventions for Knowledge Graph aligning
Dear Wikidata Community,
We're looking to build a Aerospace Engineering Knowledge Graph, and linking (all) entries to wikidata. For some, like Q3319996, that's easy, for others like conceptual modelling not so much. Others, like CPACS, are not even in Wikidata yet, or Wikipedia for that matter. Given that context, I have the following cases and questions:
- If a perfect match exists, no questions.
- If a match exists that does look correct, but seems to be lacking relations, should we populate this entry as we see fit? (assumed answer: yes, see en:WP:BOLD)
- If a match exists that does look somewhat corect, but does not have the right type, should we split it into two different entities?
- e.g. Q377960 not being a Q3249551, but an Q166142 - should we create a new process instance with the same label?
- what about instances such as Q2623243, which specifically lists conceptual model (an object) and conceptual modelling (a process)? Does the existence of this entry mean differentiation is not desired?
- If no match exists, I assume we should create one. I've taken a look at Wikidata:Notability:
- "It refers to an instance of a clearly identifiable conceptual or material entity that can be described using serious and publicly available references."
- All instances would fall under this category, since all are derived from a systematic literature review and we can link to the respective papers where they are discussed.
- All our instances would be instances of Q10843872, Q7397, Q235557 or similar. Examples: https://github.com/DLR-SC/tixi, https://dlr-sl.github.io/cpacs-website/
- "It refers to an instance of a clearly identifiable conceptual or material entity that can be described using serious and publicly available references."
Furthermore, I have some SPARQL / Database questions, which I'll add to a separate topic to not overflow this one.
Thanks, TimBorgNetzWerk (talk) 11:00, 4 July 2024 (UTC)
- 2. --> yes, be warry of the distinction between instance of (P31) and subclass of (P279). Make sure that the relations you add respect the transitivity rule.
- 3. Probably hard to give a general answer, it's likely a case-by-case basis. For the examples you gave
- - product data management (Q377960) is a bit of a weird case because after a quick skim it appears that the linked wikipedia pages themselves don't have the same type: the English seems to talk about a process (Q3249551) while the French one seems to be talking about software? If you decide to split the two, perhaps the wiki pages should be re-linked as well.
- - On conceptual model (Q2623243) I don't know the original author's intent here, but will point out that the corresponding edit seems to have been partially automated, so there is a chance that mistakes slipped through. The same author added "Conceptual models" as an alias, which is spurious on the sole basis that it shouldn't be capitalized. In any case, I think it makes sense to create an entry for "conceptual modeling", with appropriate cross-links such as facet of (P1269).
- Overall, it's possible that there were no curators with a deep expertise and a good overall view of this part of the graph, hence possible inconsistencies. Improvements welcome!
- 4. I would say yes, especially if you link the references.
- Lastly I am not sure if your Aerospace Engineering Knowledge Graph will be publicly available, but if it is perhaps you could create an identifiers to link to it? Although I must say I'm not sure what is the process to create a new identifier, nor what the criterion are to propose a new one. Alcinos (talk) 10:50, 9 July 2024 (UTC)
API / Pyton / SPARQL access questions
Hi everyone,
please see Wikidata:Project chat#Conventions for Knowledge Graph aligning for context.
TL;DR, we're looking to check if a wikidata instance exists for ~500 entries we have in our database. We also don't want to overburden the Wikidata API, hence:
What can we do to most efficiently query the wikidata database?
What currently do is:
query = f"""
SELECT ?item ?itemLabel (GROUP_CONCAT(DISTINCT ?altLabel; separator = ", ") AS ?altLabels)
(SAMPLE(?description) AS ?description) WHERE {{
{selection[select]}
OPTIONAL {{?item skos:altLabel ?altLabel FILTER(LANG(?altLabel) = "en")}}
OPTIONAL {{?item schema:description ?description FILTER(LANG(?description) = "en")}}
SERVICE wikibase:label {{bd:serviceParam wikibase:language "en".}}
}}
GROUP BY ?item ?itemLabel
LIMIT {limit}
"""
, wherin we limit the results to 20 at most, and select based on:
selection = {
'label' : f'?item rdfs:label "{label}"@en.',
'altLabel' : f'?item skos:altLabel "{label}"@en.'
}
Then, per label, we check if:
- entries with that label are available (e.g. "STEP file" to Q3509055
- if these entries do not sum up to our limit (20), then we also check if entries with that label as altLabel exist (e.g. ".stp" to Q3509055),
- if these entries do not sum up to our limit (20) then we try 1. and 2. again with (if != label):
- label.lower(), so "STEP" -> "step",
- label.capitalize(), so "STEP" -> "Step",
- label.upper(), so "STEP" -> "STEP" -> not done, since == label
Then we store all queries and results so we run no query twice, and can just check our local "copy" for the result.
Given all this, our Question:
- Is there a better way?
Better as in "easier on wikidata / time" as well as "better results", since currently we have about 40% match rate. Likely, many ouf our instances do, in fact, have no match, but others (like Q2117885 "Systems Modeling Language" or "SysML") are currently just not catched. We have seen advise to run some preprocessing on the labels, to lower all wikidata labels in a filter, but that seemed unfathomably taxing on all parties involved.
There is also the general advice to use a data dump. We have checked Wikidata:Database download and https://dumps.wikimedia.org/wikidatawiki/entities/, and not found a dump that contains all labels AND is relatively small. The lexemes do not seem to contain all labels, presumably only Q111352 instances. All the aformentioned entries, e.g. .p21 and .stp, are not mentioned therein.
I really appreciate your help, and am open to suggestions, improvements, hints or anything, really :)
Best, TimBorgNetzWerk (talk) 11:30, 4 July 2024 (UTC)
- Have you considered using a tool like OpenRefine to help reconcile your data with Wikidata's? M2Ys4U (talk) 16:26, 4 July 2024 (UTC)
- Haven't heard about it yet (I think), will be looking into it, thanks! TimBorgNetzWerk (talk) 09:50, 5 July 2024 (UTC)
- OpenRefine is nice if you intend to import data into Wikidata. Last time I checked the reconciliation it uses yielded less than ideal results. Is this a publicly available graph? If your graph had it's own identifier registered on Wikidata you could use Mix'n'match to do a preliminary matching of the dataset and then let you verify each match manually. Asking for a new identifier can be done at WD:PP.
- In any case freetext search may be what WDQS is worst at. Unsurprisingly the built-in search does a much better job, see [1] for Wikidata specific functionality. You won't tax the API as long as you make calls sequentially and support maxlag. There are libraries available that makes this easier. Infrastruktur (talk) 16:36, 5 July 2024 (UTC)
- Haven't heard about it yet (I think), will be looking into it, thanks! TimBorgNetzWerk (talk) 09:50, 5 July 2024 (UTC)
Very widely used property no longer works
See Property talk:P5380#No longer works BhamBoi (talk) 22:25, 4 July 2024 (UTC)
- This is National Academy of Sciences member ID (P5380). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:02, 11 July 2024 (UTC)
We need to put an end to this
For months, items like
and likely more others have been target of constant edit warring, having English and Russian description changed back and forth by various IP addresses and few-edits-accounts. Could anyone have a look, say what is going on and suggest how administrators should deal with it? --Matěj Suchánek (talk) 08:57, 5 July 2024 (UTC)
- Chechen-Ingush wars. All items should be protected at a random version. May be we should block the warriors as well. Ymblanter (talk) 19:41, 5 July 2024 (UTC)
- Though may be things like this would help before protection, but then I need to go manually through the list. I can do it, but very slowly. Ymblanter (talk) 19:44, 5 July 2024 (UTC)
- I see no good reason to protect them to be only edited by admins. Semiprotections should be good enough. ChristianKl ❪✉❫ 21:09, 6 July 2024 (UTC)
"agency" property?
I'm getting "{{cite journal}}: |author= has generic name (help)" from:
- CNN Newsource (24 February 2021). "Urban League of Greater Kansas City unveils social justice bus". KMIZ. Wikidata Q126365824.
in Wikipedia:Gwendolyn Grant (activist)#References.
In a section on "work with template:Cite Q?" on the talk page associated with Wikipedia:Template:Sfn, Wikipedia:User:ActivelyDisinterested said, "CNN News Source is not a valid author name ... . The correct field in this case would be |agency= but [that is not] supported by Wikidata / Cite Q." I've experimented with assigning "CNN Newsource" to different properties, so far without finding one that makes this complaint disappear.
Can someone help me find a property to which to assign "CNN Newsource" (Q5013147) so this complaint in Wikipedia disappears? Thanks, DavidMCEddy (talk) 00:15, 6 July 2024 (UTC)
constraint on instance or subclass of
ISFDB award ID (P11395) has constraint
subject type constraint: class - type of award relation - instance or subclass of
So why is Ditmar Award (Q906455) which is a subclass of (P279) of science fiction award (Q107581015), an instance of (P31) of type of award (Q107467117) OK
While William Atheling Jr. Award (Q8004646) which is an instance of (P31) of literary award (Q378427), an instance of (P31) of type of award (Q107467117) reports a violation? Vicarage (talk) 14:07, 7 July 2024 (UTC)
- Because William Atheling Jr. Award (Q8004646) is currently neither an instance nor a subclass of type of award (Q107467117) (instance of (P31) is not transitive and neither literary award (Q378427) nor Ditmar Award (Q906455) are subclasses of type of award (Q107467117)) while Ditmar Award (Q906455) is a direct instance of type of award (Q107467117). It seems to me that the subject type of ISFDB award ID (P11395) should be instance or subclass of science fiction award (Q107581015), not type of award (Q107467117). - Valentina.Anitnelav (talk) 09:00, 8 July 2024 (UTC)
- Ah, I was expecting instance of (P31) to be transitive in this check. Changing ISFDB award ID (P11395) as you suggest solves my problem. Thanks. Vicarage (talk) 09:12, 8 July 2024 (UTC)
Removing unreferenced religions and ethnicities
@Nikkimaria: Was a decision made to remove all unreferenced religions and ethnicities at some point? If so I missed that discussion. If the decision was made they should be deleted by a bot, not one-by-one by any individual. Doing it that way will lead to selection bias. I noticed some disappearing and traced the deletions to Special:Contributions/Nikkimaria RAN (talk) 16:21, 7 July 2024 (UTC)
- These are mostly reverts of a particular problematic IP editor who pops up periodically in Special:AbuseFilter/95. If there is a preference to revert such edits by bot I have no objection. Nikkimaria (talk) 16:37, 7 July 2024 (UTC)
- For ethinicities, removal was approved for unreferenced statements and deprecation for Wikipedia-referenced statements, see Wikidata:Bot_requests/Archive/2021/10#request_to_depreciated_ethnic_group_only_sourced_with_P143_(2021-10-23). At the time I think it was also notified or discussed on Project chat. Vojtěch Dostál (talk) 07:38, 8 July 2024 (UTC)
- I am all for deprecating the info until referenced, this is deleting the info. --RAN (talk) 12:11, 8 July 2024 (UTC)
Voting to ratify the Wikimedia Movement Charter is ending soon
- You can find this message translated into additional languages on Meta-wiki. Please help translate to your language
Hello everyone,
This is a kind reminder that the voting period to ratify the Wikimedia Movement Charter will be closed on July 9, 2024, at 23:59 UTC.
If you have not voted yet, please vote on SecurePoll.
On behalf of the Charter Electoral Commission,
RamzyM (WMF) 03:45, 8 July 2024 (UTC)
Is there something like a Wikidata WP:DOB?
Specifically Q5364577, which seems to be ultimately sourced to imdb. Gråbergs Gråa Sång (talk) 06:43, 8 July 2024 (UTC)
- Yes - see Wikidata:BLP. ArthurPSmith (talk) 17:35, 8 July 2024 (UTC)
- Thankfully, Wikidata can structure these claims better than a textually-based Wikipedia project can.
- For example, references can be attached directly to the DOB claim; these will unambiguously support that particular claim, and no others.
- Also, multiple claims can be attached to one person's bio. Therefore if there is dispute, ambiguity, or competing claims for a DOB, all can be included!
- Deprecation and preference values can be assigned. Therefore, if a DOB claim is found to be incorrect or invalid, it can be deprecated, colored red, and notations can be made about those reasons. Likewise, if a DOB is found to be valid above the others, it can be marked "preferred".
- In this way we can better document any controversy, weak/strong sourcing issues, or disputes, not only about a date of birth but about any germane fact for a Wikidata item. It's a shame that enwiki still doesn't want to play nice and draw from Wikidata's growing pool of structured data such as this, because it's way easier to document and track such disputes here in one centralised location, than in parallel, on dozens or hundreds of language-specific wikis, with mazes of twisty little policies and guidelines, all different. Elizium23 (talk) 18:36, 8 July 2024 (UTC)
Wikidata weekly summary #635

This is the Wikidata summary of the week before 2024-07-08. Please help Translate.
- Upcoming:
- The next Wikidata+Wikibase office hours will take place on Wednesday, 16:00 UTC on Wednesday, 10th July 2024 (18:00 Berlin time) in the Wikidata Telegram group. The Wikidata and Wikibase office hours are online events where the development team presents what they have been working on over the past quarter, and the community is welcome to ask questions and discuss important issues related to the development of Wikidata and Wikibase.
- Registration for Wikimania 2024 is open! In-person participants: please register until 26 July, 11:59 p.m., UTC. Virtual participants can register anytime. If you received a scholarship from the Wikimedia Foundation, you will receive an email with a registration code and instructions.
Tool of the week
- User:Teester/EntityShape.js - a userscript that adds an input box to a Wikidata page wherein you can enter an EntitySchema (such as E10). When you click "Check", checks whether each statement and property conforms to the schema. It then displays a summary at the top of the Item for each property indicating whether they conform or not. It also adds a badge to each statement and each property on the page indicating whether they conform or not.
Other Noteworthy Stuff
- Wikidata teams' development goals for the third quarter of 2024 have been updated: Wikidata:Development plan
Newest properties and property proposals to review
- Newest General datatypes:
- music mood (qualifier carrying an emotion (mood) relevant to a musical audio recording)
- coin edge (image or images that show the edge of a coin)
- ozone depletion potential (relative amount of degradation to the ozone layer relative to CFC-11)
- EntitySchema for class (schema that members of a class should conform to)
- Newest External identifiers: Naturalis Repository ID, English-Spanish Dictionary ID, Vikidia article ID, thisisbasketball.be player ID, poblesdecatalunya.cat ID, Oqaasersiorfik ID, MNAHA person ID, Greenlandic-English Dictionary ID, Te Aka Māori Dictionary ID, Tropicos person ID, He Pātaka Kupu ID, vehicle keeper marking (VKM), AllGame style ID, FC Metz player ID, MoFo ID, itch.io numeric ID, filmas.lv film ID, filmas.lv person ID, filmas.lv studio ID, Cockroach Species File taxon ID (new), Lygaeoidea Species File taxon ID (new), Phasmida Species File taxon ID (new), Psocodea Species File taxon ID (new), Spanish-English Dictionary ID, Norwegian National Museum producer ID, Burgenwelt ID, Irish-English Dictionary ID, Tesoro della Lingua Italiana delle Origini ID, Tommaseo-Bellini Online ID, danskfodbold.com player ID, DAKA Danish-Greenlandic Dictionary ID, DAKA Greenlandic-Danish Dictionary ID, Canadian Great War Project person ID, English-Irish Dictionary ID, PMC journal ID, Census ID, Douban personage ID, Avibase person ID, Brezhoneg21 ID, European Education Thesaurus ID, Cineuropa distributor ID, Cineuropa production company ID, OpenCitations Meta ID, IGN franchise ID, Federal Reserve Subject Taxonomy ID, Farhang-i forsī ba rusī ID, Devri ID, Cambridge University Press ID, Canadian Virtual War Memorial ID, Personnel Records of the First World War ID, Fowler’s Concise Dictionary ID, NooSFere publisher ID, Plex person key, BHMPI OBJ ID, Index Fungorum person ID, stiga.trefik.cz player ID, UNIBO professor ID, Cineuropa international sales agent ID, Mapes de Patrimoni Cultural ID, Il Nuovo DOP ID, RGALI person ID, RGALI organization ID, Archelec person ID, Lojas com História ID, milononline.net entry ID, Hebrew Academy term ID, LAGL author ID, KANAL ID, Google Play author ID, Overcast episode ID, ArchWiki article, Valencian Library ID, Star Wars.com ID
- New General datatypes property proposals to review:
- number of local branches (number of branches of this organization at the lowest (local) level)
- KANAL inventory ID (inventory number of a creative work assigned by KANAL)
- Tüik mahalle id (Identifier of neighborhoods <small>({{q|Q17051044}})</small> in Turkey in TÜİK <small>({{q|Q1375058}})</small> database)
- myfixguide.com (Photos about how to disassemble hardware)
- date de vote (vote date, date on which people decided or casted their ballot)
- Sandbox-EntitySchema (Sandbox property for value of type "EntitySchema")
- Imperial University of Dorpat student ID (matriculation number of a student of the Imperial University of Dorpat)
- indexer (entity responsible for compiling an index of a book, database, website or other forms of media publications in the form of a methodical arrangement of records designed to enable users to locate information quickly. Example: Hazel K. Bell (Q70226489))
- WorldCyclingStats ID (identifier on the website WorldCyclingStats (www.worldcyclingstats.com))
- New External identifier property proposals to review: Pocket Oxford-Hachette French Dictionary: English-French ID, Biodiversity Information System for Europe ID, Elonet company ID, Numista issuer ID, Metamath statement label, Pocket Oxford German Dictionary: English-German ID, Pocket Oxford Italian Dictionary: English-Italian ID, FEI horse ID, Standard Ebooks ID, Alle Burgen, FC Krasnodar player id, Pocket Oxford Italian Dictionary: Italian-English ID, Pocket Oxford German Dictionary: German-English ID, Pocket Oxford-Hachette French Dictionary: French-English ID, Manhom Arabic Profile ID, GOArt databas, identifikátor filmu ve Filmové databázi (FDb), identifikátor osoby ve Filmové databázi (FDb), ScienceDirect journal ID, Iraqnla Book ID, islamway authority ID, Hermitage Museum artist ID, Coptic Dictionary Online ID, autoritateak.eus, Museum of the Russian Academy of Arts artist ID, Thinkwiki article
You can comment on all open property proposals!
Did you know?
- Query examples:
- Newest database reports: User:Pasleim/commonsmerge - Merge candidates based on same commons category
- Showcase Items: Ferdinand Magellan (Q1496) - Portuguese explorer in the service of Spain
- Showcase Lexemes: bergkant (L1083157) - Nynorsk noun, translates to "the top edge of a mountain"
Development
- EntitySchemas: The new datatype to link to EntitySchemas in statements has been released.
- mul language code: We are preparing the release.
You can see all open tickets related to Wikidata here. If you want to help, you can also have a look at the tasks needing a volunteer.
Weekly Tasks
- Add labels, in your own language(s), for the new properties listed above.
- Contribute to the showcase Item and Lexeme above.
- Participate in this week's Lexeme challenge:
- Govdirectory weekly focus country: Armenia
- Summarize your WikiProject's ongoing activities in one or two sentences.
- Help translate or proofread the interface and documentation pages, in your own language!
- Help merge identical items across Wikimedia projects.
- Help write the next summary!
Showcase items discussion
Editors are invited to join the discussion at Wikidata talk:Showcase items § Formalizing the process. Sdkb talk 19:31, 8 July 2024 (UTC)
Good practice on labels
(pinging @Tm. continuing discussion from User talk:Tm#Edits on stores for Lojas com História. no harm to them.)
to the Wikidata community, I want to ask about Help:Label and what is the good practice in this situation. some Wikidata items, in this case buildings, are labeled simply as their street address with a prefix. take for example, Prédio na Rua Joaquim António de Aguiar, 45 (Q98962545) (literally, "building on Rua Joaquim António de Aguiar, 45") or Q90315021 (literally, "Loja Confeitaria Nacional, ground floor, including integrated movable heritage"). this is taken straight from the sourced external databases.
according to Help:Label: Labels begin with a lowercase letter except for when uppercase is normally required or expected [...] proper nouns such as the names of specific people, specific places, specific buildings, specific books, etc., should be capitalized. my question is, would this be counted as a proper name? the building itself has no name and these are simply descriptions of the given place. so then how would it be labeled? JnpoJuwan (talk) 00:13, 9 July 2024 (UTC)
- These are proper names, as i said to you. They have their proper name by the portuguese cultural heritage, the former DGPC. If you had taken a few minutes you would see that, no ad hoc translation made with any sourced translated name is made, these are proper names named so in legislation and\or portuguese cultural heritage databases kept by the portugues estate organizations that have in their remit said cultural heritage. You said that :Prédio na Rua Joaquim António de Aguiar, 45 that "the building itself has no name and these are simply descriptions of the given place" yet you have the main DGPC database, other database of DGPC with the same name.
- And Loja Confeitaria Nacional, piso térreo, incluindo o património móvel integrado] is the listed part of the shop Confeitaria Nacional, as the shop is not all listed cultural heritage in its totality. Again the name is the proper name, as stated in main DGPC database and legislation listing it, from Anúncio n.º 174/2017, going with Anúncio n.º 38/2020 and ending in Portaria n.º 613/2020. Tm (talk) 00:28, 9 July 2024 (UTC)
- And these names are completly used in Wikidata, just as an example Iron Foundry (building Number 1/140) Iron Foundry (building Number 1/140) Including Railings And Bollards or Vulcan Block (Building Number 21) And Attached Bollards or Number 15 And Attached Agricultural Building or Factory Building Outbuilding Attached To Number 55 or Castle Farm Cottages Number 5 And Farm Building Attached, british listed cultural heritage monuments, among of hundreds silimars. Tm (talk) 00:48, 9 July 2024 (UTC)
- I am also asking to a wider community to determine whether it is a good thing to continue doing as such. JnpoJuwan (talk) 00:50, 9 July 2024 (UTC)
- Any search for listed buildings will show you that this is a common and long established pratice. Tm (talk) 01:07, 9 July 2024 (UTC)
- Like, as an example of an US building New York Herald Building or the listed building in U.S. National Register of Historic Places Building at 73 Mansion Street and this last building uses same the name in its articles in english Wikipedia and german Wikipedia Tm (talk) 01:35, 9 July 2024 (UTC)
- Any search for listed buildings will show you that this is a common and long established pratice. Tm (talk) 01:07, 9 July 2024 (UTC)
- The labels for British buildings comes from their source as being imported from a heritage register. We ensure that the listing property has subject named as (P1810) to preserve this, but I fully expect that we might change the name to a more colloquial form on WD, as we should not be bound by a listing officer's conventions on defining a site's scope, and it makes little sense to create 2 items merely to have them with and without their bollards. There are many examples of things where their official name is different from the common name. We should record both in the body of the entry, but the label, for use by humans scanning the site and report summaries, should be the colloquial version. I'd also expect foreign language labels to adjust for their own formatting conventions (moving the street number to the start in English for example), but without actually translating names literally.
- I am also asking to a wider community to determine whether it is a good thing to continue doing as such. JnpoJuwan (talk) 00:50, 9 July 2024 (UTC)
- And these names are completly used in Wikidata, just as an example Iron Foundry (building Number 1/140) Iron Foundry (building Number 1/140) Including Railings And Bollards or Vulcan Block (Building Number 21) And Attached Bollards or Number 15 And Attached Agricultural Building or Factory Building Outbuilding Attached To Number 55 or Castle Farm Cottages Number 5 And Farm Building Attached, british listed cultural heritage monuments, among of hundreds silimars. Tm (talk) 00:48, 9 July 2024 (UTC)
- "The Vulcan Building" is a good example, and I'd expect other Portsmouth historic buildings to be changed too in time Vicarage (talk) 06:10, 9 July 2024 (UTC)
- I have seen the descriptions given in these items, that I do not doubt. what I ask is: what is considered a proper name here, as the same databases list many other designations for the same item besides that, or for SIPA, do not list the original.
- even if these are the legally recognised names, is it useful to list these names first as opposed to descriptive and importantly translatable names? JnpoJuwan (talk) 00:48, 9 July 2024 (UTC)
- This are monuments are named so in legislation and the databases of the portuguese Património Cultural, IP (former DGPC), so these are proper names. The pratice in Wikidata is not not make ad hoc translation without proper sources that state that there is a commons translated name and you have the example of the british listed cultural heritage monuments to see what these are proper names and they are capitalized in Wikidata.
- These are designations are other proper names to the same items, from the databases of portuguese Património Cultural, IP (former DGPC) from the main DGPC database (and legislation) for cultural heritage monuments. For complement, there are other databases from Património Cultural, IP (former DGPC) like the SIPA database and, for specific cases, other databases of works of specific routes or arquitects like the as the one linked as "other database of DGPC" and or archeological sites database. Tm (talk) 01:04, 9 July 2024 (UTC)
- I think a good rule of thumb here is "how would this label work in the middle of a sentence?" Would one write "This is a photo of Prédio na Rua Joaquim António de Aguiar, 45" or "This is a photo of prédio na Rua Joaquim António de Aguiar, 45"? Maybe in English the answer would be one way and in another language it would be different; I know for example in French lower-case is very common for what in English would be upper-cased. In any case I think this general rule should work across most languages. ArthurPSmith (talk) 15:43, 9 July 2024 (UTC)
- for this case, it is would certainly be lowercased as "prédio" is not a proper noun, it is just the word for building and describing the given address, for that reason I suggest using an informal approach for these items. JnpoJuwan (talk) 21:50, 9 July 2024 (UTC)
- In this case this is certainly uppercase as this is a proper name of this building, as stated in two different sources databases of portuguese Património Cultural, IP (former DGPC) , that talk and describe specifically this building. in one source is clearly stated "Designação:Prédio na Rua Joaquim António de Aguiar, 45" or "Name\designation:Prédio na Rua Joaquim António de Aguiar, the same as other source. These is a name with not one but two sources. Tm (talk) 01:34, 10 July 2024 (UTC)
- Also im portuguese one capitalizes the proper names of buildings as stated in Ciberdúvidas da Língua Portuguesa "A maiúscula é obrigatória apenas para os nomes próprios (dos edifícios, das vias, dos bairros, das localidades...)." or in english "Capital letters are mandatory only for proper nouns (of buildings, roads, neighborhoods, towns...)". Tm (talk) 01:44, 10 July 2024 (UTC)
- Another page on Ciberdúvidas da Língua Portuguesa states that "Como geralmente é expressão referente a um edifício histórico importante, escreve-se com maiúsculas iniciais" or in english "As it is generally an expression referring to an important historical building, it is written with initial capital letters". This also applies to this building as is also an important historical building, as is described, with this same capitalized name, in three different databases of the Património Cultural, IP (former DGPC), the department of the portuguese Ministry of Culture responsible for the listing of the portuguese immovable cultural heritage, besides another database of the portuguese Ministry of Culture, so showing that this is a clear important historical building. Tm (talk) 02:00, 10 July 2024 (UTC)
- And, if any there is any doubt the Portuguese Language Orthographic Agreement of 1990 as "SÍNTESE DO USO DA MAIÚSCULA INICIAL E DA MINÚSCULA INICIAL [I A letra inicial maiúscula é utilizada: (...) 11.º Na letra inicial de palavras usadas em categorizações de logradouros públicos, de templos e de edifícios: Bairro de Alvalade; Rossio; a Alta de Lisboa; (...) Rua Augusta; Rua da Palma; Pátio do Tijolo; Basílica da Estrela; Capelas Imperfeitas; Convento dos Capuchos; Igreja de Santa Maria Maior; Igreja do Bonfim; Templo do Apostolado Positivista; Mosteiro de Santa Maria(...); Edifício Azevedo Cunha. or in english ""SUMMARY OF THE USE OF INITIAL CAPITAL AND INITIAL LOWERCASE The initial capital letter is used: (...) 11th In the initial letter of words used in categorizations of public places, temples and buildings" Tm (talk) 02:17, 10 July 2024 (UTC)
- Another page on Ciberdúvidas da Língua Portuguesa states that "Como geralmente é expressão referente a um edifício histórico importante, escreve-se com maiúsculas iniciais" or in english "As it is generally an expression referring to an important historical building, it is written with initial capital letters". This also applies to this building as is also an important historical building, as is described, with this same capitalized name, in three different databases of the Património Cultural, IP (former DGPC), the department of the portuguese Ministry of Culture responsible for the listing of the portuguese immovable cultural heritage, besides another database of the portuguese Ministry of Culture, so showing that this is a clear important historical building. Tm (talk) 02:00, 10 July 2024 (UTC)
- Also im portuguese one capitalizes the proper names of buildings as stated in Ciberdúvidas da Língua Portuguesa "A maiúscula é obrigatória apenas para os nomes próprios (dos edifícios, das vias, dos bairros, das localidades...)." or in english "Capital letters are mandatory only for proper nouns (of buildings, roads, neighborhoods, towns...)". Tm (talk) 01:44, 10 July 2024 (UTC)
- In this case this is certainly uppercase as this is a proper name of this building, as stated in two different sources databases of portuguese Património Cultural, IP (former DGPC) , that talk and describe specifically this building. in one source is clearly stated "Designação:Prédio na Rua Joaquim António de Aguiar, 45" or "Name\designation:Prédio na Rua Joaquim António de Aguiar, the same as other source. These is a name with not one but two sources. Tm (talk) 01:34, 10 July 2024 (UTC)
- for this case, it is would certainly be lowercased as "prédio" is not a proper noun, it is just the word for building and describing the given address, for that reason I suggest using an informal approach for these items. JnpoJuwan (talk) 21:50, 9 July 2024 (UTC)
- all in all, I will respect Tm's decision and keep the labels as is, as they have made good arguments in regards to this. I am so sorry if this was a headache for you. --JnpoJuwan (talk) 10:47, 10 July 2024 (UTC)
- It's a complex question and it partially depends on the language. I'm not sure for Portuguese but in French, proper names often start with a lowercase. For English, I would like a confirmation (ping ArthurPSmith) but cases like building at 73 Mansion Street (Q1003019) should probably be begin with a lowercase too (like in the Wikipedia article). Cheers, VIGNERON (talk) 12:53, 10 July 2024 (UTC)
- Yes, given that the associated enwiki article starts "The building at 73 Mansion Street [...]" the wikidata entry should be labeled "building at 73 Mansion Street" in English. I've fixed it. ArthurPSmith (talk) 12:39, 11 July 2024 (UTC)
The Community Wishlist is reopening July 15, 2024
Here’s what to expect, and how to prepare.
Hello everyone, the new Community Wishlist (formerly Community Wishlist Survey) opens on 15 July for piloting. I will jump straight into an FAQ to help with some questions you may have:
Q: How long do I have to submit wishes?
A: As part of the changes, Wishlist will remain open. There is no deadline for wish submission.
Q: What is this ‘Focus Area’ thing?
A: The Foundation will identify patterns with wishes that share a collective problem and group them into areas known as ‘Focus Areas’. The grouping of wishes will begin in August 2024.
Q: At what point do we vote? Are we even still voting?
A: Contributors are encouraged to discuss and vote on Focus Areas to highlight the areas.
Q: How will this new system move wishes forward for addressing?
A: The Foundation, affiliates, and volunteer developers can adopt Focus Areas. The Wikimedia Foundation is committed to integrating Focus Areas into our Annual Planning for 2025-26.
Focus Areas align to hypotheses (specific projects, typically taking up to one quarter) and/or Key Results (broader projects taking up to one year).
Q: How do I submit a wish? Has anything changed about submissions?
A: Yes there are some changes. Please have a look at the guide.
I hope the FAQ helped. You can read more about the launch.
You are encouraged to start drafting your wishes at your pace. Please consult the guide as you do so. Also if you have an earlier unfulfilled wish that you want to re-submit, we are happy to assist you draft.
You can start your draft (see an example) and don't hesitate to ask for support when drafting, please notify me via the Drafts List.
–– STei (WMF) (talk) 13:01, 9 July 2024 (UTC)
U4C Special Election - Call for Candidates
- You can find this message translated into additional languages on Meta-wiki. Please help translate to your language
Hello all,
A special election has been called to fill additional vacancies on the U4C. The call for candidates phase is open from now through July 19, 2024.
The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members are invited to submit their applications in the special election for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.
In this special election, according to chapter 2 of the U4C charter, there are 9 seats available on the U4C: four community-at-large seats and five regional seats to ensure the U4C represents the diversity of the movement. No more than two members of the U4C can be elected from the same home wiki. Therefore, candidates must not have English Wikipedia, German Wikipedia, or Italian Wikipedia as their home wiki.
Read more and submit your application on Meta-wiki.
In cooperation with the U4C,
-- Keegan (WMF) (talk) 00:02, 10 July 2024 (UTC)
I'm new to Wikidata specific practices, sorry if I screw anything up or asked in the wrong place. I don't think HDS should require a GND ID, considering there are many topics with an HDS article that do not have a GND ID. For example, Journal du Jura (Q633032). Unless I am horrifically misunderstanding something. PARAKANYAA (talk) 05:40, 10 July 2024 (UTC)
- I agree, I have removed; too many topics don't have GND ID (P227), so having this constraint isn't worthwhile. Epìdosis 08:36, 10 July 2024 (UTC)
- @Epìdosis Thank you very much :) PARAKANYAA (talk) 08:53, 10 July 2024 (UTC)
Spelling convention for labels and descriptions in English: Request for comment
Hello everyone, just a heads up that a new Request for comment has been created to discuss a standard approach for labelling in Wikidata. If you're interested, please take a look and share your thoughts. Carbonaro. (talk) 08:48, 10 July 2024 (UTC)
Proposed Changes to personal pronoun (P6553)
Introduction
Wikidata: WikiProject Personal Pronouns has set out to clean up Wikidata’s modeling and implementation of personal pronoun data. This data is currently inconsistent, difficult to query, and frequently inaccurate. What follows is our detailed proposal to remedy this situation, which we hope to implement on July 24, 2024.
Background
- personal pronoun (P6553) is:
- instance of (P31):
- Wikidata item of this property (P1629): personal pronoun set (Q65067284)
- Data type: lexeme (Q111352)
- related property (P1659): sex or gender (P21)
- living people protection class (P8274)
- citation-needed constraint (Q54554025): not mandatory
- allowed-entity-types constraint (Q52004125): Wikibase item (Q29934200)
- instance of (P31):
- The Property talk:P6553 has brought important points to light and we believe reflects consensus on the following proposed changes
Problems with personal pronoun (P6553)
- Messy data modeling and false statements:
- Personal pronouns are modeled as individual lemmas, as pronoun set items, and as items representing individual pronouns
- Values exist which are not third person pronouns at all, but things like honorifics
- Individual pronouns as lexemes don’t account for cases where a principal pronoun isn’t enough to go on to identify a pronoun set (for instance, “ze/zir” vs. “ze/hir”). Therefore, lexemes cannot be consistently implemented with accuracy
- Inferring gender identity based on personal pronouns, and vice versa, is inaccurate and causes disproportionate harm to marginalized groups
- Including “gender” in pronoun property leads to inferring gender based on pronouns
- Wikidata includes heuristic (Q201413) (“problem-solving method that is sufficient for quick, short-term solutions/approximations”) related to personal pronoun (P6553) which expressly encourage inferring gender based on personal pronouns or other unrelated data points: inferred from pronoun used (Q73168402) and inferred from given name, image and pronoun used (Q123757526)
- The idea that personal pronouns are associated with gender identity is not consistently true (see gender neutrality in genderless languages)
- There is not consensus that gender can be inferred from personal pronouns
Proposed Changes
- Change data type from Lexeme to Wikidata Item
- Change allowed value type to instance of (P31) personal pronoun set (Q65067284)
- Remove relationship with sex or gender (P21)
- Remove references to gender in aliases and descriptions of property, because the concepts are not related (gender and pronouns need not agree)
- Delete heuristics inferred from pronoun used (Q73168402) and inferred from given name, image and pronoun used (Q123757526)
Use Cases
The following use cases support the need for the proposed data structure. Many more can be provided.
- Dua Saleh (Q84766127) they/xe; see Oct 2020 Tweet
- Mel Baggs (Q4080459) – sie/hir, ze/zer
- Conchita Wurst (Q113581) – she/her (in drag) he/him (out of drag)
Implementation Plan
Scope of Use
5,945 items in Wikidata use P6553 (as of 2024-05-29 using this query)
~60 items have more than one statement/value pair for P6553, so there are a total of 6,005 statements using P6553 (as of 2024-05-29 using this query)
5,920 of these pronoun statements have values that are actually pronouns rather than honorifics, etc. (as of 2024-05-29 using this query)
2,296 of these statements have a value of “he” (as of 2024-05-29 using this query), 2,595 have a value of “she” (as of 2024-05-29 using this query), and 701 have a value of “they” (as of 2024-05-29 using this query) leaving 328 statements with other values
All valid P6553 statements have values from a small group of 63 lemmas (as of 2024-05-29 using this query) from sixteen languages (as of 2024-05-29 using this query)
Language-Lemma Count breakdown:
Bokmål/Nynorsk = 4
Catalan = 3
Dutch = 5
English = 14
Esperanto = 8
French = 4
German = 7
Japanese = 4
Latin = 1
Portuguese = 3
Spanish = 5
Swedish = 3
Yiddish = 1
Yoruba = 1
Partnerships
- Initial working hours with the Wikidata:WikiProject_LD4_Wikidata_Affinity_Group are planned for July and September 2024
- Support is anticipated from the LGBT+ User Group
- This would be a project of Wikidata: WikiProject Personal Pronouns
Data Model
- Make changes to Wikidata Property subclass of (P279) (proposed above)
- Delete heuristics inferred from pronoun used (Q73168402) inferred from given name, image and pronoun used (Q123757526)
Building Pronoun Sets
- Create Wikidata items for personal pronoun sets based on well-modeled metadata in source datasets:
- Homosaurus
- GSSO
- Ensure that all pronouns which are used as values of personal pronoun (P6553) are covered
- Use resources to describe and accurately identify pronoun sets:
- Other Wikis, such as Pronoun Wiki, Nonbinary Wiki, Nonbinary Database, Gender Wiki
- Pronoun generators, which include examples of use, such as Pronoun Island, Pronouns.page, and Pronouny.
Data Cleanup
- Export all personal pronoun (P6553) statements from Wikidata into OpenRefine
- Divide into several spreadsheets based on the type of issues encountered
- Remove statements without references
- Remove invalid statements, replacing with other properties as feasible (for example, when a value is an honorific, replace with honorific prefix (P511))
- For referenced statements:
- Change value from lexeme or item for a single pronoun to a personal pronoun set based on references given
- Add on focus list of Wikimedia project (P5008): WikiProject Personal Pronouns (Q118382780) to track activity
Rodriguez.UW (talk) 21:50, 10 July 2024 (UTC)
Strong support I fully support these changes and am committed to participating in their implementation. --Crystal Yragui, University of Washington Libraries (talk) 23:25, 10 July 2024 (UTC)
Strong support This is a great idea, I fully support it. Brimwats (talk) 01:25, 11 July 2024 (UTC)
- This seems like a bad idea. Pronouns are lexicographical data and should exist as lexemes. If the lexeme data is inconsistent, it can and should be improved. Items are not inherently more consistent than lexemes, and creating new entities instead of fixing the existing ones won't make the data better - it will actually result in duplication of data, which then leads to problems with data getting out of sync.
- It's completely possible to have multiple lexemes with the same subject form and different object forms (e.g. sier (L304659) vs sier (L304660)). If your objection is that the links don't display the object form, that is a data display issue, not a problem with the data itself.
- Lexemes have forms with grammatical features, allowing machines to select the correct form in different sentences (e.g. "I see them" versus "They see me"). You haven't explained how that will work with your proposal.
- I don't think it makes sense to remove all mentions of gender. Whether you like it or not, people do associate pronouns with gender and there is a lot of correlation between someone's gender identity and the gender of the pronouns they use. Removing links to other properties and aliases for terms that people do use makes things harder to find, and makes it more likely that they will do things like add sex or gender (P21) based on pronouns, because they are more likely to be unaware that we even have a separate property for pronouns.
- The existence of languages which don't have gendered pronouns does not seem relevant. If a language doesn't have multiple pronouns for the same grammatical person/number, then I don't see how personal pronoun (P6553) would be useful. If you know of another distinction used for pronouns referring to other people, other than gender and formality, I would love to know about it (and it would be relevant to Wikidata:Lexicographical data in general).
- I don't think unsourced statements for a property should be mass removed before making an effort to add sources. Creating lists of statements with issues and encouraging people to help fix the issues would be a perfect task for a wikiproject.
- - Nikki (talk) 04:31, 11 July 2024 (UTC)
- Also pinging @BlaueBlüte who already responded to your proposal on Wikidata talk:WikiProject Personal Pronouns back in May. - Nikki (talk) 04:38, 11 July 2024 (UTC)
- @Nikki Lexemes could be linked from pronoun set items as parts, but as we stated in this proposal, individual lexemes are not enough to go on much of the time to identify pronoun sets. We gave examples and fully explained why lexemes are not appropriate or sufficient for personal pronoun sets. These examples aren't different senses of the same word, but often distinct words in different senses. Lexemes would still exist, but would not be used as values for this particular property. Rather, they would be parts of sets. Which is how people use them. --Crystal Yragui, University of Washington Libraries (talk) 19:52, 11 July 2024 (UTC)
Strong support:I support the goals of this project. Wikidata editors should not be supplying sex or gender (P21) based on a person's pronouns. And sex or gender (P21) should not be used to supply personal pronoun (P6553) without references that document a person's choice of their pronouns. The more that can be done to prevent misgendering people in Wikidata, the better. AdamSeattle (talk) 06:11, 11 July 2024 (UTC)
Strong oppose I don't understand why "Change data type from Lexeme to Wikidata Item", it would be a lot of work for no gain (I would even say it would be a loss, as the data would be poorer). Cheers, VIGNERON (talk) 11:59, 11 July 2024 (UTC)
- The gain would be solving the problem we point out here: "Individual pronouns as lexemes don’t account for cases where a principal pronoun isn’t enough to go on to identify a pronoun set (for instance, “ze/zir” vs. “ze/hir”). Therefore, lexemes cannot be consistently implemented with accuracy". It would not be very much work, and we laid out a very detailed implementation plan for doing the work. --Crystal Yragui, University of Washington Libraries (talk) 19:37, 11 July 2024 (UTC)
- (1) this should be an RFC, not an announcement on (English) Project Chat
- (2) In principle I think the item datatype may make sense here, but I don't understand how you would label or make consistent across languages. That needs to be discussed: would a person with "ze/zer" in English have a consistent label in every other language, or might different people make different choices in other languages? If consistent translations are expected then an item seems fine, otherwise I think this needs to stick with lexeme datatype.
- (3) Technically I don't believe the datatype can be just "replaced" - a new property would need to be created for the new datatype, and data migrated etc. As User:VIGNERON notes this would be considerable work fixing the ~6000 statements.
- ArthurPSmith (talk) 12:48, 11 July 2024 (UTC)
- We just asked in the Administrators' noticeboard about the process for requesting changes to properties, and you were one of the people who gave us feedback saying this was a good way to go and gave us further suggestions for how to go about this properly. I am confused about why you are now saying it should be done differently @ArthurPSmith. Is it because you don't agree with the proposed changes? --Crystal Yragui, University of Washington Libraries (talk) 19:40, 11 July 2024 (UTC)
- @Clements.UWLib, only two people replied to you in that discussion. @Ymblanter and @ArthurPSmith. It is not clear whether they were aware of the massive scope and depth of your intended proposal (because you were asking about a generality and not linking to specifics.) You implied that you wanted to change a single property or something. Indeed, the scope of your proposal appeared quite trivial there, compared to the overhaul you're actually hatching in this proposal. Elizium23 (talk) 19:45, 11 July 2024 (UTC)
- We do want to change a single property. This doesn't have massive scope or depth beyond the single property we're talking about and the cleanup work we and our project partners would complete. The data is already a messy mix of items and lemmas I don't understand why this is being perceived as some sort of overhaul. This would be a cleanup and implementation of a coherent data model in place of no coherent data model. --Crystal Yragui, University of Washington Libraries (talk) 19:55, 11 July 2024 (UTC)
- @Clements.UWLib, only two people replied to you in that discussion. @Ymblanter and @ArthurPSmith. It is not clear whether they were aware of the massive scope and depth of your intended proposal (because you were asking about a generality and not linking to specifics.) You implied that you wanted to change a single property or something. Indeed, the scope of your proposal appeared quite trivial there, compared to the overhaul you're actually hatching in this proposal. Elizium23 (talk) 19:45, 11 July 2024 (UTC)
- We just asked in the Administrators' noticeboard about the process for requesting changes to properties, and you were one of the people who gave us feedback saying this was a good way to go and gave us further suggestions for how to go about this properly. I am confused about why you are now saying it should be done differently @ArthurPSmith. Is it because you don't agree with the proposed changes? --Crystal Yragui, University of Washington Libraries (talk) 19:40, 11 July 2024 (UTC)
- +1 on VIGNERON's and Arthur's points. Most importantly, let's not discuss it here. A dedicated RfC or conversation at the property's discussion page will be better suited for a proposal like this. Vojtěch Dostál (talk) 12:58, 11 July 2024 (UTC)
- Conversation has been ongoing on the property's discussion page for quite some time, and we believe reflects consensus on the proposed changes. We brought it here because we thought the broader community should have input before we moved ahead with the changes. --Crystal Yragui, University of Washington Libraries (talk) 20:10, 11 July 2024 (UTC)
Leaning oppose Inferring sex or gender from gender-specific pronouns or styles (i.e. the "Mr."/"Mstr." vs. "Mrs."/"Miss" or whatever) comes up all the time when dealing with obscure historical people, including e.g. people involved in research or contributors to creative works. The gold standard will always be self-identification of preferred gender of course, but realistically that's going to be exceedingly rare for pre-20th century humans, and still somewhat uncommon even afterwards. I'm all for explicitly disclaiming this practice wrt. Wikidata:Living people where concerns about both individual privacy and harmful misrepresentation of marginalized gender-non-conforming groups will be rather more relevant; but recording sex-or-gender inferences about people in history is widely seen as useful for, e.g. extracting gender representation statistics wrt. Wikidata itself or subsets thereof. --Hupaleju (talk) 18:14, 11 July 2024 (UTC)
Oppose Per the reasoning above, please convert this from an announcement into an RFC for debate and discussion. I am concerned with data loss or loss of precision in moving from lexemes to items. I concur with the fact that inferred sex happens routinely from historical documents. I am also concerned with the proliferation of en:Neopronouns and their associated burden of maintenance. The English Wikipedia tends not to indulge these neopronouns in article prose. Is it Wikidata's intent to catalog, document, and apply neopronouns in a completely credulous fashion? It is a fact that neopronouns can and will be used to troll and disrupt communications. It would be inadvisable for us to take them always at face value. Lastly, I am concerned about the size and scope of these changes. This is a large proposal, and difficult for us to digest as a monolith. Perhaps itemize it, prioritize elements of it, and propose options/choices within each major decision. A proper RFC should have a central and identifiable proposal for debate, and not a lot of moving parts! Elizium23 (talk) 19:40, 11 July 2024 (UTC)
- Yes, of course it is advisable to take people's pronouns at face value. This is not what we came here to debate, and it's not up for debate in this community as far as I know. This is not a moving part. We suggested five bullet-pointed changes to a single property. The rest is supporting information. Your comments about neopronouns make it difficult for me to think you are engaging with this proposal in good faith. --Crystal Yragui, University of Washington Libraries (talk) 20:05, 11 July 2024 (UTC)
Suggestion for Wikidata property to identify online accounts
i did already create this before posting (whoops) because i am still kind of new (i can't figure out how/if i could delete it,double whoops, sorry!) basically i've noted several authors who have "official" archive of our own (AO3) accounts and I feel we should be able to note this, so now archive of our own username exists, and i would like to actively use it. Honeybeeandtea (talk) 01:52, 11 July 2024 (UTC)
- by "official" i mean linked to on their official websites, but aren't offical in the sense that their publishers are involved Honeybeeandtea (talk) 01:53, 11 July 2024 (UTC)
- You have rather jumped the gun. You should use Archive of Our Own tag (P8419) with qualifiers, and ask for your item Q127358232 to be deleted. Vicarage (talk) 03:32, 11 July 2024 (UTC)
- i don't disagree with you on the fact that i jumped the gun, but in the case of AO3, tags have a specific purpose that is separate from that of a username. using the the "ao3 tag" and then amending it to "but actually i mean the username" feels like an ineloquent and roundabout way to express a relationship as linear as "this is that person's username" Honeybeeandtea (talk) 15:55, 11 July 2024 (UTC)
- You have rather jumped the gun. You should use Archive of Our Own tag (P8419) with qualifiers, and ask for your item Q127358232 to be deleted. Vicarage (talk) 03:32, 11 July 2024 (UTC)
- There's currently no dedicated property for AO3 accounts. You can use website account on (P553) instead (see the property examples on that page for how to use it). Additionally, you could propose a new property specifically for AO3 accounts (see Wikidata:Property proposal) - but those are usually only succesfull if there's a significant number of potential items for persons with such accounts. --2A02:810B:580:11D4:D5AC:53B7:B54E:983E 19:14, 11 July 2024 (UTC)
- I tend to use described by source (P1343) with URL and other qualifiers for non-property relations to external sites. Vicarage (talk) 19:48, 11 July 2024 (UTC)
not sure what transcluded means in new property proposal
I have tried to propose a new property Wikidata:Property proposal/africanmusiclibrary.org artist id but somehow messed it up. it says "You have not transcluded your proposal on Wikidata:Property proposal/Person yet. Please do it." but I'm not sure what this means, and it's not obvious what to do when clicking the "Please do it" link Please could someone help me Thanks. QWER9875 (talk) 16:24, 11 July 2024 (UTC)