Jump to content

Module talk:Infobox gene/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Lowercase sigmabot III (talk | contribs) at 06:22, 2 February 2021 (Archiving 1 discussion(s) from Module talk:Infobox gene) (bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Targeted by drug

Hi @Julialturner: While I am highly sympathetic to allowing links to "targeted by drug" (this is what I do in real life), I think the display of this section of the infbox should be suppressed unless there are relevant entries in Wikidata. It appears that this is what you had in mind since you added the comment "check if any drugs have references if not then don't render the headers" to the template code, but for some reason, the drug header is displayed whether or not there any drugs targeting this protein. I would appreciate if you would check this when you get a chance. Cheers. Boghog (talk) 19:39, 15 July 2016 (UTC)[reply]

@Boghog: Thank you. I just noticed this wasn't being suppressed yesterday on some pages and I am working on a solution. Best, Julialturner (talk) 21:20, 15 July 2016 (UTC)[reply]

Julia, thanks for fixing the suppression of the section. Much appreciated. Boghog (talk) 14:33, 2 August 2016 (UTC)[reply]

Wikidata citations

Hi @Julialturner:. Another request. I noticed that infobox gene is now staring to display citation data (see for example CNGA1). Unfortunately bare URLs are displayed which is far from ideal. They are not very readable and lead to link rot. They are also ugly. Furthermore I can't locate the citation in Wikidata to see if it is even possible to add a formatted citation. Do you know if formatted citations are supported in Wikidata? If not, I humbly suggest that it may be premature to display this type of data. Cheers. Boghog (talk) 14:47, 2 August 2016 (UTC)[reply]

Some kind of formatted citations you can see here. If that's not what you're thinking about, then you can simply ignore me :) --Edgars2007 (talk/contribs) 16:18, 2 August 2016 (UTC)[reply]
@Julialturner, Boghog, and Daniel Mietchen: Wikidata won't hold formatted references as a string if that is what you are thinking. There is ongoing work to hold all the components of a reference (authors, journal, title, etc.) as structured data there such that a script could automatically compose a formatted string for references. So.. what to do now? The URLs you are referring to appear as references within the corresponding wikidata item. For your example, you can see them at https://www.wikidata.org/wiki/Q17907825 under the corresponding property - e.g. genetic association. In many cases, it seems the link to the database where the information was collected from is the appropriate reference. When the appropriate reference is indeed an online resource - which admittedly is subject to linkrot - what is an appropriate referencing pattern? --Benjamin Good (talk) 21:59, 2 August 2016 (UTC)[reply]
@I9606: Thanks for the response. Actually some of the data database links look OK (e.g., human and mouse PubMed references). At a minimum, the database references should display like these PubMed references where the displayed link is replaced by readable text. The OMIM reference ideally should look like what {{OMIM}} produces. Also In the CNGA1 example, targeted by drug, Dequalinium cites https://www.ncbi.nlm.nih.gov/pubmed/12508052. This reference as well as the references for Genetically Related Diseases (Retinitis pigmentosa) are not contained in https://www.wikidata.org/wiki/Q17907825. Boghog (talk) 06:11, 3 August 2016 (UTC)[reply]
@Boghog: Thanks, clearly we need to work on the presentation of the references. I really hope we can get some help from the people focused on citation like @Daniel Mietchen:. To clarify where things are coming from, the drug interaction and its reference are located on the wikidata item associated with the protein product for that gene https://www.wikidata.org/wiki/Q21105636 under the 'physically interacts with' property. The Lua script that builds the template pulls such interactions by traversing from the gene item to the protein item in the same way that it gets the Gene Ontology annotations. The references for Genetically Related Diseases (Retinitis pigmentosa) are indeed coming directly from https://www.wikidata.org/wiki/Q17907825 . Look for the 'genetic association' property on that item. --Benjamin Good (talk) 18:32, 3 August 2016 (UTC)[reply]
@I9606: OK, thanks. I now see the reference links for Dequalinium and Retinitis pigments (in Wikidata pages, the Safari search tool doesn't seem to search beyond what is visible in the currently displayed browser window). I hope someone is able to come up with a better long term solution to the display of the citations. A partial solution for the PubMed citation would be to parse the bare url https://www.ncbi.nlm.nih.gov/pubmed/12508052 for the pmid and replace the displayed url with "PMID 12508052" that renders as "PMID 12508052". Boghog (talk) 19:49, 3 August 2016 (UTC)[reply]
Hi, @Boghog: I have been adjusting the citation format for genetically related diseases. I created an example here that I think would be a better solution (https://en.wikipedia.org/wiki/User:Julialturner/RELN). What are your thoughts? Julialturner (talk) 04:42, 27 September 2016 (UTC)[reply]

renderCaption

@Julialturner: The not yet implemented renderCaption function will indeed have a problem trying to get information dynamically from comments in Commons. Wikidata provides a property media legend (P2096) that is monolingual text designed to carry the image caption and it would be better to add that property manually (or perhaps by a bot fetching comments from Commons?) to each gene entry, as it would then make the image legend available programmatically to all. There's an example of using the property to fetch the image caption in {{Infobox telescope}}, which relies on the getImageLegend call in Module:Wikidata (currently lines 890–963). It implements arbitrary access, preferred ranks (to cope with multiple images) and uses the local wiki language unless an language iso-code is given in the call:

  • {{#invoke:Wikidata |getImageLegend |FETCH_WIKIDATA |id=Q1513315}} β†’ The South Pole Telescope in November 2009
  • {{#invoke:Wikidata |getImageLegend |FETCH_WIKIDATA |lang=lt |id=Q1513315}} β†’ PietΕ³ aΕ‘igalio teleskopas 2009 m. lapkritΔ―

It should be easy for you to adapt the Wikidata module code to fill in renderCaption so that it gets a media legend (P2096), leaving editors with the job of updating the relevant Wikidata entries in their own time. Hope that helps. --RexxS (talk) 23:10, 28 February 2017 (UTC)[reply]

@RexxS: I appreciate your suggestion very much. I will definitely try to adapt the wikidata module code here and see if I can get the captions into the infobox. Julialturner (talk) 05:20, 1 March 2017 (UTC)[reply]

Invalid HTML

@Julialturner: It seems this infobox, outputs same incorrect HTML. The parser mostly cleans this up, but it fails sometimes and then the infobox breaks in VisualEditor for instance as it does for BCL2-related_protein_A1. If you go to this page, you can see that the td, and tr elements are sometimes not closed, or closed too often, causing the parser having to go into a bit of a guessing game. β€”TheDJ (talk β€’ contribs) 14:21, 16 December 2016 (UTC)[reply]

@TheDJ: Thank you for notifying me of this issue I will look into correcting the code. Julialturner
Hi, @TheDJ: I did some code cleanup and now all the tag elements should be closed. Julialturner (talk) 06:55, 6 January 2017 (UTC)[reply]
I ran into this via some testing as well. The problem is on line 1603 in the module. It emits a table directly inside a tr tag. You are missing a required td tag to wrap the table - only td, th, and captiont can include content in a table. That is at least one thing that I ran into that needs fixing. SSastry (WMF) (talk) 18:17, 21 March 2017 (UTC)[reply]
@Julialturner: In case you didn't see my earlier message. Pardon me if you have seen that and just haven't gotten around to looking into it yet. Thanks. SSastry (WMF) (talk) 16:19, 13 April 2017 (UTC)[reply]
@SSastry (WMF): Thank you for reminding me. I haven't had a chance to fix this yet, but I will try to fix it in the next week or so.76.167.64.98 (talk) 20:09, 13 April 2017 (UTC)[reply]

Biomedical content - disease associations and interactions

Please see Wikipedia_talk:WikiProject_Medicine#More_Wikidata_funk_-_infobox_gene Jytdog (talk) 01:28, 5 April 2017 (UTC)[reply]

Not seeing the funk discussion there now, assuming its been resolved. --Benjamin Good (talk) 16:54, 26 April 2017 (UTC)[reply]
Nope it was not. A means of addressing the problem is being discussed below. Jytdog (talk) 17:45, 26 April 2017 (UTC)[reply]

Seriously, how do we suppress individual fields

So at KCNB1 the infobox had several pieces of content that was WP:Biomedical information that was not sourced per MEDRS, and I couldn't remove it, so I went into Wikidata and removed it there. Some of it has now been restored in Wikidata. I have no desire to get into arguments at Wikidata about content that is appearing in Wikipedia.

So how do we suppress individual fields at a local article? One of the bad fields was "genetic association". The other bad field is more complex as I mentioned above...

I don't want to go nuclear and call for this infobox to be nuked but if we cannot selectively control what comes in, that will be the only option. Jytdog (talk) 06:29, 16 April 2017 (UTC)[reply]

Adding the ability to suppress individual fields based on passing in a parameter like 'nodisease' etc. seems doable. A more general solution would be to consistently identify which kinds of sources were acceptable for the MEDRS folks, tag them as such within Wikidata, and then add code that would act accordingly without the requirement to touch individual infoboxes. We will have a look at both options. By the way, its grossly inappropriate to threaten 'nuclear options' in your discussions here. Please keep your tone under control and we will work together to continue to improve Wikipedia together. --Benjamin Good (talk) 16:54, 26 April 2017 (UTC)[reply]
Thanks for replying! It would not be appropriate to try to enforce en-WP standards in Wikidata, and if someone did, there is no policy basis there to object if someone should revert. I also have no desire to try; I have no desire to edit Wikidata on any kind of regular basis. If folks want to bring in Wikidata for some things in en-WP via infoboxes that is fine of course, but there must be a way to exclude unreliable data from appearing in en-WP, from within en-WP, that is reasonably easy to implement at the template level or on a per article basis. There is nothing inappropriate (much less "grossly") about proposing to delete a template, which is what I will indeed do, if there is no way to control things at field level. I can wait a while but I hope that field-level control will be introduced soon. Thanks again for replying! Jytdog (talk) 17:42, 26 April 2017 (UTC)[reply]
I looked into it and it is indeed fairly straightforward to adjust this template to take in a parameter like |showdisease=false and hide a section. But looking at what you are attempting to conceal, I'm not really convinced its a good idea. Looking at ATG16L1 for example, there is a line in the infobox saying that there is a genetic association between the gene and Crohn's disease. That relationship is supported by a link to the database entry where this information was gathered that in turn links to several journal articles and entries in other databases that support the claim. I think this is useful information to people interested in this gene. I don't see how it is medically dangerous. I'm concerned that adding a 'hide' function to this module will result in good information like that being hidden from view. I'd like to hear more from the biology community e.g. @Boghog: before making a change. --Benjamin Good (talk) 05:22, 2 May 2017 (UTC)[reply]
@I9606: Unfortunately in the example you give, the association between ATG16L1 and Crohn's disease is supported in the Gemma database by three primary studies and one secondary (a meta-analysis). It is not acceptable per WP:MEDRS to have a situation where biomedical claims are made though an indiscriminate process that does not guarantee that secondary sources are used. You will note that the article ATG16L1 does not present an association as fact, although the Crohn's disease article discusses associations in greater detail. A bald assertion that gene X is genetically related to disease Y, without any assurance that the conclusion is based on anything more than a primary study, breaches MEDRS and is completely unsuitable for inclusion in an infobox where information must be clear-cut and not subject to caveats. --RexxS (talk) 12:12, 2 May 2017 (UTC)[reply]
User:RexxS do you think we should just exclude the disease field from this template? Checking its output article by article is unreasonable, probably. Jytdog (talk) 13:22, 2 May 2017 (UTC)[reply]
The Gemma database does have a "Quality code" that might help us restrict the associations to only those supported by high quality sources. Unfortunately they don't seem to discriminate between primary studies and meta-analyses which all received one star. The OMIM tertiary source received three stars and I am not sure how critical OMIM is. Boghog (talk) 13:48, 2 May 2017 (UTC)[reply]
@Jytdog: I'd prefer to develop methods of marking the Wikidata with a field indicating quality of evidence, so we could filter the data returned, but we're a long way from that at present. I'd agree that in the meantime, we ought to seriously consider leaving out of infoboxes any claims that don't meet our own policies of V, RS and MEDRS. It's just making a rod for own backs when we're trying to import reliable data. @Boghog: You only have to look at the 7 sources that OMIM cites: 1 mouse study, 2 comparative studies, 3 research studies and 1 meta-analysis. The meta-analysis concluded "the ATG16L1 T300A polymorphism was associated with CD risk in Caucasians ... no significant association was found in Asians." How are you going to fit that can of worms into an infobox? --RexxS (talk) 16:22, 2 May 2017 (UTC)[reply]
I wasn't suggesting that we cite OMIM, just suppress associations with low quality as a potential way of increasingly the reliability. But it looks like Gemma database quality ratings are too crude for that purpose. Boghog (talk) 16:35, 2 May 2017 (UTC)[reply]
User:RexxS For what its worth, its far from an indiscriminate process and that information is vastly more likely to be of use to people looking for information about that gene than it is to cause some sort of medical harm - which as I understand it is the main motivation behind the MEDRS approach. But i'm not going to continue that particular argument here. I concur with your sentiment about developing "methods of marking the Wikidata with a field indicating quality of evidence" and I don't see why its so far off. In fact we already have code in place that prevents unreferenced statements from appearing in the infobox. The claim in question here has references saying the claim is stated in Phenocarta and in a journal article, both of which have items in Wikidata. All we would need to do would be to add a property to those items indicating that they were 'MEDRA approved references' or something like that and then we could build a filter into the template that would automatically hide any claims that did not meet the criteria. Another approach that could be implemented right now without creating any new properties or tagging anything would be to use the determination method qualifier property on the claim. In this case, the claim is made based on evidence from a genome wide association study or 'GWAS'. If we can decide on what forms of evidence are suitable for inclusion of a given kind of claim - e.g. 'genetic association', we could use that as a filter immediately. --Benjamin Good (talk) 17:03, 2 May 2017 (UTC)[reply]
The Gemma database ratings are precisely what I was describing as indiscriminate - which for our purposes, they are if a 3 star rating means "1 mouse study, 2 comparative studies, 3 research studies and 1 meta-analysis with conflicting conclusions". Without taking the time to evaluate each entry in the Gemma database against MEDRS, how do you propose we eliminate the possibility that a Wikidata claim may be insufficient to support a biomedical claim by our standards? If people find the information in Phenocarta (Q22330995) interesting, then they can read it there. We have no need to include information indirectly via Wikidata that falls short of what we require from any source used directly in an article. I've invested a lot of time and effort over the last few years in creating and refining tools to facilitate the import of information from Wikidata, and the last thing I want to see is it being thrown away because of knee-jerk reactions by the community to accusations that "Wikidata is not a reliable source" and "Using Wikidata breaches our Verifiability policy". To get an idea of what we're going to be up against, take a look at Wikipedia:Wikidata/2017 State of affairs #Perceived disadvantages of using Wikidata on enwiki – or at the edit war to force inclusion of the phrase "The lack of reliable sourcing means that imported Wikidata text violates WP:V and WP:BLP". Are you sure it's a good idea to add "violates MEDRS" as ammunition for the nay-sayers? --RexxS (talk) 18:02, 2 May 2017 (UTC)[reply]
User:RexxS why is it better, in your opinion, to put out a blanket deletion of content than to formalize and clarify the reasons for not showing it? How else do you imagine bridging the gap between structured content and Wikipedians? Wikidata has, and will continue to have, a variety of different sources for the information it contains. We need to develop mechanisms for taking advantage of its infrastructure to provide the community of data consumers (Wikipedia here) with ways to use it as they see fit. I would much rather have the MEDRS people make a definitive statement like "we don't think GWAS studies provide enough evidence to justify showing the resultant content in Wikipedia" and encode that logic in a combination of wikidata statements and template code then I would like to see people simply deleting infoboxes or data in wikidata. Do you have an alternative plan for the future here? --Benjamin Good (talk) 19:20, 2 May 2017 (UTC)[reply]
It is better, in my opinion, not to display a field in an infobox than to display information that falls short of what is required by our guidelines on sourcing. I have difficulty understanding your reasons for taking the opposite view; perhaps you could explain them to me? If you would kindly elucidate what you mean by "the gap between structured content and Wikipedians", I might better be able to see why you are taking your position.
WP:MEDRS is not a person, but a Wikipedia guideline that enjoys project-wide community support. We are all "MEDRS people" just as we are all "NPOV people", etc. so there's no reason why you can't make the definitive statement yourself: it would carry the same weight (i.e. one person's opinion).
If you want to "encode that logic in a combination of wikidata statements and template code", then I'm all in favour: please go ahead and do it. Let me know as soon as you've finished, and you'll have my support for importing the filtered data into {{Infobox gene}}. In the meantime, how do you suggest we suppress individual fields in this infobox, as the topic of this thread asks? If we can't ensure that the infobox doesn't include dud information from Wikidata, nor remove the offending claim from Wikidata, what solution are you offering now? --RexxS (talk) 20:25, 2 May 2017 (UTC)[reply]
As someone working here in en-Wikipedia, User:I9606 you are obligated to follow en-WP policies and guidelines and as RexxS noted MEDRS has broad and deep consensus in en-WP. We all know that there is no similar thing in Wikidata and as I noted way above, I have zero desire to try to change Wikidata or enforce en-WP policies and guidelines there. But the differing policy/guideline environments, is why care must be taken moving data between the two projects. Jytdog (talk) 20:31, 2 May 2017 (UTC)[reply]
  • so at this point i am asking that the "diseases" field and the field that produces "interacts with" be removed from this template. Both get into health content that we cannot rely on to be MEDRS-sourced in Wikidata. The purely biochem data, I don't mind. But these two should go. Would you please remove from them from the output? (here is the corresponding thread at WT:MED that I had opened. There have been a bunch of discussions at WT:MED about these kinds of fields in Wikidata. See for example this one which was probably the most clear about the concerns). Thanks. Jytdog (talk) 17:50, 2 May 2017 (UTC)[reply]

I removed the offending information from the infobox. The change should percolate. More later on a strategy for moving forward and why I think this was not such a great move. --Benjamin Good (talk) 21:07, 2 May 2017 (UTC)[reply]

Thank you!!! That was a great move. Jytdog (talk) 22:23, 2 May 2017 (UTC)[reply]
That change just made more than 11,000 articles less informative, losing the community value produced by a lot of people's hard work here, in wikidata, and elsewhere. I personally don't think that was a positive change. I did it as a temporary patch while a pattern that is more satisfactory to the majority of WP editors can be constructed. Benjamin Good (talk) 03:36, 3 May 2017 (UTC)[reply]
That change just removed unreliable information from 11,000 articles. en-WP =/= Wikidata and I appreciate you respecting the policies and guidelines of en-WP in deed, if not in word. Thanks again. (I really mean that. I did not want this to become a dramafest. So thanks. ) Jytdog (talk) 05:12, 3 May 2017 (UTC)[reply]

Linking to the UCSC genome browser for mouse chromosomes is broken

See e.g. the link at Peripheral myelin protein 22, which goes to human chromosome 11 instead of mouse chromosome 11. Substituting 'hg38' with 'mm10' in the browser URL field seems to do the trick (in the final URL); as does substituting 'mm0' with 'mm10' in the URL provided in the article. --Njardarlogar (talk) 17:11, 11 June 2017 (UTC)[reply]

 Done Fixed[1] --Was a bee (talk) 14:57, 28 July 2017 (UTC)[reply]

Multiple errors

I just undid the most recent changes as they seem to be breaking the infobox on a number of pages, in particular MT-ND4, MT-CYB, MT-ND3, MT-ND1, MT-ND5, MT-ND6. MT-ND4L. At time of writing they have the error Lua error in Module:Infobox_gene at line 927: attempt to perform arithmetic on local 'chrLength_mm' (a nil value). in place of the infobox. As the box contains no parameters I cannot fix it in the articles, and do not feel confident fixing it in the module as it unclear what it is trying to do with that data.--JohnBlackburnewordsdeeds 18:28, 19 August 2017 (UTC)[reply]

@JohnBlackburne: Sorry for your worries. These are mtDNA pages which need a bit special treatment, among around ten thousand of gene pages. I have already written new code which doesn't generate error. I'm fixing details and related settings now. Thank you. --Was a bee (talk) 19:17, 19 August 2017 (UTC)[reply]
 Done Fixed. (diff from former revision) --Was a bee (talk) 07:48, 20 August 2017 (UTC)[reply]
Thanks. I wish there was some better way to handle this. From your description this is some edge case you were not aware of, so did not test against. but that happens all the time with infoboxes and templates. Often the only way to test them fully is publish your changes and see what breaks.
The problem is with the way this template is implemented. Other templates that go wrong are small, and so have little impact if they break. Or if they are large like {{infobox settlement}} they are mostly Wikitext, so if a script goes wrong it only affects one element of it. But this template/module is entirely implemented in code. If it goes wrong nothing is rendered except the error. And as many articles containing it contain little else it effectively wrecks them.
Often if a template breaks in a few articles it’s often easy to identify the problem with those articles; perhaps a particular parameter is missing or incorrect, and it can be fixed in the articles. But this template normally takes no parameters, so even when the error message suggests some data is missing (as this did) it’s of no help.
It would be better perhaps if this template worked more like other infobox templates, with the layout done in Wikitext, with individual fields implemented as calls to templates/modules. Then if one part of it fails the rest of it still works. It would also be much easier to identify what the problem is, if it were with just one part of it. Those individual parts would be much simpler, much easier to fix if they were broken. Long term we can’t assume someone familiar with the code such as yourself will always be available, so having it as maintainable as possible is also important.--JohnBlackburnewordsdeeds 12:19, 20 August 2017 (UTC)[reply]
Certainly. You got a point. This template is complex. But at the same time, it seems there is a reason for that. It is Wikidata data structure is not simple. For example, genomic data of ABO (gene) about human is stored at d:Q14839826 (this page is connected to Wikipedia page). But genomic data of ABO (gene) about mouse is stored at d:Q14839892. And these two pages are interconnected each other through "ortholog" (P684) property entry. So this module searches mouse data through "ortholog" property in the human genomic data page and shows it. I don't know whether it is possible to simplify the code processing this kind of task. But if possible, it's nice :) --Was a bee (talk) 19:41, 23 August 2017 (UTC)[reply]

Breaks when wikipedia article renamed ?

I renamed old revision and it now gives an infobox error. Looked at the template talk page (and two wikidata items CMTM7 and cmtm7) but Can't see what is wrong or how to fix it. - Do I have to change something in wikidata or should the template still work ? - Rod57 (talk) 15:41, 3 September 2017 (UTC)[reply]

Gene location column added

New column added

Even though there are gene location data at Wikidata, the data was not shown in Wikipedia. So I implemented the gene location column to the gene infobox (diff).

Addition in a nutshell:

  • Genetic location is shown in graphically and numerically with the source information. Gene position data is coming from Wikidata. The location data have been maintained by wikidata:User:ProteinBoxBot (operated by Su team). Thank you team!
  • Cytogenetic band data is not yet implemented (shown as No data available). Inclusion is now under discussion at wikidata:Wikidata:Property proposal/Cytogenetic location. Need more participant.
  • Some bit technical memos about newly added column is here (User:Was a bee/Gene).

Though I suppose newly added part works properly in all gene article pages, if there are error/bug, and any comment, feel free to post here! Thanks!--Was a bee (talk) 11:40, 18 August 2017 (UTC)[reply]

Some page links for your convenience when checking

@Was a bee: A problem is appearing in several articles, for example, CALM2 and HSPA1A. I had a very quick look at the module to see what was happening. If you wanted an alternative to unpack(args), see function wikidata_call in Module:Convert. That won't help with the error but ... is recommended. Johnuniq (talk) 07:52, 19 August 2017 (UTC)[reply]
I cleaned out a few unrelated articles and this API call lists articles with a script error. Johnuniq (talk) 08:04, 19 August 2017 (UTC)[reply]
@Johnuniq: Thank you. That API link is very helpful. I investigate the cause of errors. --Was a bee (talk) 09:29, 19 August 2017 (UTC)[reply]

@Was a bee: The cytogenic location for human genes are now added. Gstupp (talk) 16:52, 18 September 2017 (UTC)[reply]

@Gstupp: oh, amazing! When I see this page (d:Wikidata:Database reports/Constraint violations/P4196), I was surprised about its page numbers. Current usage of cytogenic location property is 55k pages in gene category. Its astonishing number. Thank you very much! --Was a bee (talk) 11:39, 19 September 2017 (UTC)[reply]

Add an optional autocollapse parameter to collapse "Gene location (Human)" and "RNA expression pattern"?

Wondering what others think about this. Autocollapsing those sections would improve page formatting in some gene articles where the infobox breaks into sections where images are located and pushes them downward as a result. Seppi333 (Insert 2Β’) 20:14, 25 November 2017 (UTC)[reply]

Infobox gene mysteriously forgets to add commas between aliases

In Special:Diff/830125782, I tried to add a (rather incorrect) infobox for Dll on the DLX family page by manually specifying a root_qid, and was greeted by a very wide infobox. It appears that all the commas between aliases are lost for some reason. Manually fetching the entity and its aliases via REPL seems to give normal results, so something else must be wrong. Can it be the "get rid of gene name if in aliases list" part? --Artoria2e5 contrib 22:30, 12 March 2018 (UTC)[reply]

Bingo! It seems that gene_symbol is nil for Dll. Perhaps we should make this thing a bit less human-centric starting on April Fool's Day. --Artoria2e5 contrib 22:32, 12 March 2018 (UTC)[reply]

Linking "Ortholog"

The rendered output uses (where appropriate) the term "orthologs", which is a term a general reader is unlikely to have heard of before. We have a redirect at ortholog that seems to remedy that; I believe we should link the term in the rendered output. I've tried this in the sandbox, and it seems to work, so I've implemented it here. Please back the change off if it breaks anything. -- The Anome (talk) 16:56, 5 June 2018 (UTC)[reply]

The links to IUPHAR have recently been split into targets and ligands. Since proteins can behave either as targets (Dopamine receptor D1) or ligands (insulin), Infobox gene should be updated to reflect the new IUPHAR database structure. A new wikidata property has been created for this purpose (Guide to Pharmacology Target ID).

I modified the the sandbox (diff) to display this IUPHAR data and the test cases appear to work (see Template:Infobox gene/testcases, compare the External IDs section where IUPHAR IDs now appear in the sandbox version). Does this look OK?

The second question is how to upload the data into wikidata. {{IUPHAR}} has a mapping between HUGO gene symbol and the IUPHAR target IDs (although some of these need to be updated). I am not very familiar with wikidata, but I have added the data manually (e.g., diff) for the test cases. I can provide an Excel file with the required data, but is HUGO gene symbol sufficient to locate the required wikidata entry (e.g., Dopamine receptor D1 (Q21110867))? I assume that file a bot request to get this done, but I am insure that the HUGO gene symbol is sufficient information. Suggestions would be appreciated. Boghog (talk) 20:27, 27 July 2018 (UTC)[reply]

I noticed that wikidata is not populated with the enzyme commission numbers, but the infobox can display them if the data is loaded into wikidata (see for example diff and ADH1A). As with the IUPHAR request above, I can supply an Excel sheet with the required data, but I am unsure about which identifier I should use to locate the required human protein page. Is the Hugo Gene symbol sufficient? Boghog (talk) 20:34, 27 July 2018 (UTC)[reply]

Making Infobox gene more understandable/useful for the general reader

There has been some rather passionate dicussion about making Infobox gene more accessible for the general reader (see here and here). One idea that had gained some support was promoting a subset of the most important GO data for display in a uncollapsed form near the top of the infobox. These key properties would provide answers to basic questions such as what is the function, mechanism, and subcellular location of the protein.

Using the Gene Ontology section in Beta-2 adrenergic receptor as an example, if we were to rank a subset of GO data as "preferred" for the wikidata adrenoceptor beta 2 (Q287961) data set (see diff), the following could be displayed in the infobox:

Key Properties
Property Description
Biological function regulation of smooth muscle contraction
Molecular mechanism G-protein coupled receptor activity, epinephrine binding, norepinephrine binding
Subcellular location membrane

For BRCA1 the promoted data could look like:

Key Properties
Property Description
Biological function DNA repair
Molecular mechanism damaged DNA binding
Subcellular location plasma membrane, cytoplasm, cell nucleus

First question: Is there support for such a proposal? Second question: if so, would someone be willing to make a mockup in a sandbox? I would do this myself, but I am not very familiar with lua. Boghog (talk) 20:59, 27 July 2018 (UTC)[reply]

Promoting the data would be a lot of work, but because of similarities of proteins within the same family, the data were processed family wise, the process could be speed up considerably. The process could be partially automatted:

  1. down load the GO data from wikidata into a spread sheet (automated)
  2. organize by family (automated)
  3. select preferred data (manual)
  4. adjust the wikidata ranking of the selected data (automated)

Boghog (talk) 21:12, 27 July 2018 (UTC)[reply]

@Boghog: That is interesting feature. At first, technically implementing "filtering by rank" in Lua is possible by claimRanks. So I think I can implement that.
One question I have is "how to select preferred data"? Surely I agree that "Key Properties" feature is good and very helpful. But at the same time, I feel that we have to be able to show the reason or algorithm that "how did we choose this as Key property", especially for function field. For example, HBA1 (building block of hemoglobin which is important content of red blood cell) has various GO functions. How do you choose one key property among those....? Is there any algorithm or idea or something? --Was a bee (talk) 12:49, 28 July 2018 (UTC)[reply]
@Was a bee: Thanks for your response and for confirming that is technically possible to implement this proposal. One thing that I wanted to make absolutely clear is that there is no reliable algorithm to select key properties. This subjective selection must rely on human editors. The problem with GO descriptions is that for many proteins, the list of terms is so overwhelming to the point where is useless to the average reader. We need to filter the list to highlight most important GO terms. For well understood protein families, this is fairly straight forward. For less studied proteins, it may not be so clear. Hence promotions of GO terms should only be done by editors that that have taken the time to read the relevant literature. Another important point is that this is an editor controled "opt-in" scheme. If an editor does not make a conscious decision to promote a GO term, it will not be displayed. If another editor objects to the decision, it can be reverted. In short, the selection is editor driven, not bot driven. Finally I wanted to emphasize that there is no rush to promote GO terms. We can start with well understood protein families where the assignment of primary function is uncontroversial. More difficult cases can wait. Boghog (talk) 16:01, 28 July 2018 (UTC)[reply]
@Boghog: Thank you for answering.
  • I think it is OK. I think it is something like commonsensical summary among researchers who know that topic and share contexts. When I am asked "Where is Los Angeles?", I'll answer "California" or "West coast" or "America". I don't answer "On surface of the earth" or "At east side of Pacific ocean" or "Milky-way galaxy". I can't say clearly why this is so, but shared context forcing me to do so.
  • I did simple test edit at sandbox[2] which shows only "preferred" GO items. Here is sample page which uses sandbox module (User:Was a bee/gene sandbox). --Was a bee (talk) 22:09, 29 July 2018 (UTC)[reply]
(edit conflict) The analogy between city/geographical location on protein/subcellular location is not a good one since a city has only one instance and hence only one location whereas there can be many copies of the same protein that be distributed in more than one location. In biology, there are at least three parts to the location question: (1) species (specified by External IDs/HomoloGene), (2) tissue (specified indirectly by "RNA expression pattern"), and (3) subcellular location (specified by the GO cellular component term). The best way to give context in internal wikilinks that define what the subcellular location means. Perhaps we should add "species distribution" (e.g., primate, mammals, vertebrates, eukaryotes, etc.)"tissue distribution" captured from for example the Human Protein Atlas as a key fact, but that would be two new projects.
Thanks for your test edit that displays only preferred GO data. Just to clarify what I had intended is that the all the GO data be retained in a collapsed form as is now done. But then in addition, insert a new "key facts" section in an uncollapsed state. Boghog (talk) 05:10, 30 July 2018 (UTC)[reply]
I think the key word here is "context". For Los Angeles, California is the proper context, for earth, the solar system is the proper context, for a protein, a species, tissue or part of a cell is the proper context. Boghog (talk) 05:57, 30 July 2018 (UTC)[reply]

Just sharing some quick thoughts here. Overall, I'm super supportive of this infobox finally getting a refresh. I've been meaning to chime in on the other longer discussion, but just haven't found the time. On this specific proposal, I have two questions/concerns. First, is the plan that when preferred rank statements exist, the normal rank statements are not shown? I'm not sure I support that. As an alternative, perhaps those statements get bolded at put at the top of each section? Second, I'm not sure I like the idea of using the wikidata rank system. Since WD changes do not show up in WP edit history, changes in WD don't trigger WP watchlists. And I know this has been a source of friction from the WP community in the past. As an alternative, perhaps the preferred GO terms can be indicated in the #invoke statement? Anyway, again, just throwing out some ideas for discussion. Unfortunately I'm a bit tied up over the next 2 weeks so forgive me if I can't stay engaged here. But definitely don't let that stop you from being bold and moving forward! Best, Andrew Su (talk) 04:44, 30 July 2018 (UTC)[reply]

Thanks Andrew for your reply. Just to clarify, my proposal was to display all the GO data in a collapsed state as is done now, but only display "preferred" GO data in an uncollapsed state. Boghog (talk) 05:12, 30 July 2018 (UTC)[reply]
Also thanks for pointing out the issue of using the wikidata ranking system. Specifying preferred data using #invoke statements might be better from a documentation standpoint, although not as clean. I could support either solution. Boghog (talk) 06:08, 30 July 2018 (UTC)[reply]

Italics

It seems that gene symbols are italicized, and should be when they're the title of the infobox invoked here. See DGCR2. Can this be implemented? If this is the wrong place to ask, can someone point me in the right direction for the request? -- JHunterJ (talk) 19:21, 18 September 2018 (UTC)[reply]

Image caption

Wikidata image (P18) can be accopanied by a caption (P2096) to annotate the image, which is especially useful for protein/receptor-ligand complexes because it helps to know which is which. Maybe add it to getImage()? CC User:Was a bee. --Artoria2e5 πŸŒ‰ 20:46, 11 April 2019 (UTC)[reply]

Hi @Artoria2e5:. That sounds good idea. From your user page, I think you can implement new code by yourself. So if you make test version, I'll test the new code at various pages :) --Was a bee (talk) 07:54, 12 April 2019 (UTC)[reply]

Hi @Julialturner: The old {{GNF Protein box}} produced PDB links (PDB Ortholog search: PDBe RCSB) based on {{Homologene2uniprot}} so that crystal structures for both human and orthologs in other species were returned. The new {{Infobox gene}} returns crystal structures only for mouse. At a bare minimum, the link should be changed to human from mouse since there are far more human crystal structures (currently 35929 human vs. 5474 mouse). Ideally of course, the crystal structures for all the orthlogs should be returned. Thanks. Boghog (talk) 12:54, 9 August 2016 (UTC)[reply]

Thanks, @Boghog: I will look into the differences between old and new and see if I can make adjustments to the new infobox. Julialturner (talk) 19:41, 9 August 2016 (UTC)[reply]
@Boghog: the PDB link and RCSB should now have both the human and mouse structures returned. Currently, we are only displaying mouse data, but maybe future development could include other species. Julialturner (talk) 21:33, 9 August 2016 (UTC)[reply]
The current version once again only does it for the human entity_protein. I am thinking about iterating through all of entity:getAllStatements(ortholog_propertyID), grabbing protein_propertyID from genes, and storing the orthologs so getPDB can use them. I guess storing QIDs for mw.wikibase.getAllStatements(id, prop) would be good enough, since we don't need the whole entity (expensive). --Artoria2e5 πŸŒ‰ 20:51, 11 April 2019 (UTC)[reply]

Hi! Revision 892056776 in the sandbox is the new version for displaying human, mouse, and for trying to find more orthologs (diff). If you think it's OK, please merge it into the current Module. --Artoria2e5 πŸŒ‰ 22:14, 11 April 2019 (UTC)[reply]

 Not done: please establish a consensus for this alteration before using the {{edit template-protected}} template. Artoria2e5, the template asks for "a complete and specific description of the request, so that an editor unfamiliar with the subject matter could complete the requested edit immediately". You're certainly not going to get an opinion on humans, mice and orthologs from template editors. Please gather some consensus from folks who may have a clue like Boghog, Julialturner, or probably better still at WT:GEN. A more complete description of what your changes are intended to achieve would be a good starting point. Cabayi (talk) 12:21, 19 April 2019 (UTC)[reply]
I agree that it would have been best to provide a link to all the available orthologs (just to note, the current version of the infobox does provide links to both human and mouse orthologs, see for example Estrogen receptor beta, RCSB PDB link). Before implementing this, we need to test the sandbox version in Module:Infobox gene/testcases. I am not at all sure how to do this for a Lua (programming language) module. Perhaps @Julialturner: or one of the other maintainers of this module could implement test cases? Boghog (talk) 05:09, 20 April 2019 (UTC)[reply]
The sandbox version doesn't seem to return other orthlogs (see for example User:Boghog/sandbox for estrogen receptor beta). The PDB contains structures for human and rat, while the links produced by the sandbox version only searches for human, mouse, but not rat. Boghog (talk) 05:27, 20 April 2019 (UTC)[reply]
There is also a bug in the sandbox. For example, in this test case: If one uncollapses "List of PDB id codes [show]", the following error message is displayed: VALUE_ERROR (bad argument #1 to 'getAllStatements' (string expected, got boolean)). Compare with the infobox in Estrogen receptor beta where there is no error.
@Andrew Su: Would you be able to recommend next steps in testing this implementation?

Template-protected edit request on 27 May 2019: duplicate ref tags

The module currently always emits the same-named ref tags over multiple invocations, causing problems when a page is used for different genes. There should be a suffix added to remedy this issue. Please do the following:

  • For every instance of "refGRCh38Ensembl", change to "refGRCh38Ensembl" .. ensembl. Likewise for "refGRCh37Ensembl".
  • For every instance of "refGRCm38Ensembl", change to "refGRCh38Ensembl" .. ensembl_mm.

Here I am using ensembl's IDs as suffixes, which nicely enough does encode the species and its ensembl origin in it. And honestly I don't get why it is putting "ref" as a prefix. You might want to change "refGRCh38Ensembl" to "GRCh38_" to make things shorter and vice versa.

Artoria2e5 πŸŒ‰ 01:53, 27 May 2019 (UTC)[reply]

Agreed! But where does "ensembl" come from? Will it be unique when invoked for different QIDs on the same page? -- Mikeblas (talk) 00:32, 28 May 2019 (UTC)[reply]
User:Mikeblas, the ensembl variables are the gene identifiers used by Ensemble to uniquely identify a gene. It is the long ENSG00000 something you see in these link names. Even if someone erroneously calls the template for the same gene twice, there will be no reference error reported since the content of the identically-named ref tags will be the same anyways. --Artoria2e5 πŸŒ‰ 00:56, 28 May 2019 (UTC)[reply]
 Not done: please make your requested changes to the module's sandbox first; see WP:TESTCASES. β€” Martin (MSGJ Β· talk) 08:05, 4 June 2019 (UTC)[reply]
Is it really not possible to fix this problem? It's been broken for more than a month. -- Mikeblas (talk) 23:56, 14 June 2019 (UTC)[reply]
 Not done @Mikeblas: it can certainly be discussed still, but as was already requested above, this needs to be carefully sandboxed first. β€” xaosflux Talk 16:21, 15 June 2019 (UTC)[reply]

Fixing localization support

Instead of parsing for "MT" from chromosome name this should lookup P1813 which has it correctly: that way code would be easier to port without changing hard-coded names in strings. Lengths should be used from property P2043 instead of having hard-coded table of lengths in the code. Changing this second part would actually completely do away with the first part since the short name is not used anywhere else? Ipr1 (talk) 07:54, 8 August 2019 (UTC)[reply]

(un)required modules?

So why are those three other modules marked as required when they are not required and not used or referenced in any way in this module? Ipr1 (talk) 17:24, 8 August 2019 (UTC)[reply]

Causing reference errors on Cryptochrome?

I don't know if this actually a problem with the module, or with the data on Wikidata... But using it on Cryptochrome gives cite errors....

 GRCh38: Ensembl release 89: ENSG00000008405 - Ensembl, May 2017 Cite error: Invalid <ref> tag; name "refGRCh38Ensembl" defined multiple times with different content
 GRCm38: Ensembl release 89: ENSMUSG00000020038 - Ensembl, May 2017 Cite error: Invalid <ref> tag; name "refGRCm38Ensembl" defined multiple times with different content

Reedy (talk) 01:23, 10 December 2019 (UTC)[reply]

I believe you are saying that if the contents of Cryptochrome is replaced with the following and previewed, then the error message is displayed.
{{#invoke:Infobox_gene|getTemplateData}}
{{#invoke:Infobox_gene|getTemplateData|QID=Q14866005}}
{{#invoke:Infobox_gene|getTemplateData|QID=Q17909946}}
The first line is inserted; the following two lines already exist in the article. Here are the corresponding Wikidata items:
Searching the previewed page for [1] finds the places where the invalid ref is used.
I don't know what is going on, but there appears to be a conflict between these items. Are they really wanted? If so, the next step might be to examine what generates the refs at the Wikidata links. Johnuniq (talk) 02:06, 10 December 2019 (UTC)[reply]
 Fixed {{Infobox gene}} is meant to be used once and only once per article. If more than one {{Infobox gene}} is used per article, a conflict between identical ref names with different content results. It appears that CRY1 and CRY2 were merged into one article which caused the conflict. I have replaced {{Infobox gene}} with the more compact {{Infobox protein}} to removed the conflict. Boghog (talk) 06:29, 10 December 2019 (UTC)[reply]