User:Cfeffer/sandbox
The PGLS protein in Homo sapiens
Overview
PGLS is a gene found on chromosome 19 in Homo sapiens, but is also found in most animals. The gene is prevalently expressed throughout all tissue in humans; however, the function of its protein is still not known. It is a very slowly evolving gene and because of this the sequence is fairly similarity across species. There is much known about this gene and its interactions, but there is still much to learn.
Gene
AF091091 is a human gene located on chromosome 19p13.11.[1] It is more commonly known as the PGLS gene, but is also referred to as 6PGL and HEL-S-304.[2] This gene has not yet been fully sequenced and is actually a mRNA sequence that spans 1008 nucleotides long. In this sequence, there are a total of 5 exons. These exons code for the PGLS protein which is also known as 6PGL (6-phosphogluconolactonase).[2] The PGLS mRNA is found in greater than 25 different tissues types throughout the human body including spleen and blood tissue.
Protein
This protein is found in the cytosol of the cell and have many identifiers; however, its stable identifier is R-HAS-71294.[1] It is 258 amino acids long, protein.[3] Since it is a 6PGL protein, it is believed to be a part of the pentose pathway; specifically, the second step in the oxidative phase shown below. Although the mRNA is expressed in over 25 parts of the body, the protein is not expressed in all of them. There are three important motifs to recognize when looking at the PGLS protein. The first is glucosamine-6-phosphate isomerase, this motif is slightly shorter that the entirety of the protein sequences; however, it does match up exactly with 251 out of the 258 amino acids which make up PGLS.[4] This motif is a protein which is known to function in the nervous system, giving some clue that PGLS may also function in a similar capacity. Next, an important site on the PGLS protein is the PCSK cleavage site which is highly conserved on this protein. The IAP-binding motif is the third important motif found on PGLS as it too is highly conserved. This site tells us that PGLS is important in cell which do not undergo apoptosis.[5] The tertiary structure of the PGLS protein which is shown in the picture at the top is made up 39% by alpha helices and 21% by beta sheets, there are a few spots of disorder in this proteins final structure. [6]
Gene Level Regulation
The promoter sequence is a rather long portion of the PGLS gene; however, the three important parts of the sequence all are located on the later half of the promoter. These things are regulatory elements, CpG and a DNase cluster. There are five regulatory elements on the promoter, most of which overlap with each other as well as the segments of CpG and the DNase cluster. Furthermore, there are three prominent transcription factor binding sites found on this gene and they are H3K4Me1, H3K4Me3, and H3K27Ac. All three of these are found consistently in the following seven cell lines; GM12878, H1-hESC, HSMM, HUVEC, K562, NHEK and NHLF. While H3K4Me1 and H3K27Ac are often found near regulatory elements on the gene, H3K4Me3 is found more commonly near the promoter sequence.[7] NFKB, IMSN1 - zinc finger , RNA polymerase II , E2F-myc activator/cell cycle regulator, nuclear respiratory factor are all common transcription factor binding site found in the promoter sequences and repeated multiple times throughout. [8]
The PGLS gene is expressed in all tissue types in the body. In normal tissue, it displays the greatest expression in fat tissue and the spleen.[1] In the fetal tissue, the mRNA is expressed at the highest level in the adrenal tissue weeks 16 through 20. Lastly, in the Illumina bodyMap 2 Transcriptome PGLS mRNA is expressed significantly higher in white blood cells than any of the other tissues, although it is expressed in all of them at some level. When looking at specific cell lines the expression of the mRNA is in a high abundance throughout most of the tissue. The few cell lines in which it is not significantly expressed are in the ganglion during fetal development. [9]
Transcription Level Regulation
Translation starts at about the 26 nucleotide which is the A of the ATG allowing RNA to know to begin translating. It then continues until it reaches the stop codon of TAG creating a 258 amino acid long protein. The 5' UTR is 24 bp long while the 3' UTR is much longer at 207 bp. [10]
Protein Level Regulation
When looking at the PGLS protein it is clearly rich in the amino acids alanine (A) and leucine (L) toward the N-terminal; specifically from amino acid 20 to 70. The protein is also very phenylalanine (P) rich throughout the entire sequence. The richness of these amino acids can also be found in orthologs of this protein.[11] PGLS also has many possible phosphorylation sites throughout the entirety of the protein. Other immportant sites on the protein that do not occur as often are 3 O-glycosylation sites, 1 palmitolation site, 2 C-mannosylation sites, 2 glycation sites for lysines and 1 O-beta-GlcNAc attachment site. There is also a potential sumoylation site, but this site is of a very low probability. [12]
Homology
PGLS is a slowly evolving protein which is determined by comparing its divergence versus the number of changes per amino acid to that of Cytochrome C, which is known to have a slow evolution, and Fibrinogen Alpha Chain, which is known to evolve quickly. Although PGLS seems to be present in all animal species there is a large difference in the sequence of amphibians and Homo sapiens. On average, there is about a 60-500% similarity between the human and reptile PGLS protein. The human and mammals relation is much closer with about an 98-80% similarity.[13] These are shown in Figure 2 below.
Genus species | Common Name | Taxonomic Group | Date of Divergence | Percent Similarity |
Sapajus appella | tuft capuchin | mammal | 42.9 MYA | 98.10% |
Podarcis muralis | common wall lizard | reptile | 318 MYA | 59.10% |
Figure 2. The above chart shows the date of divergence for a mammal with the PGLS gene and a reptile with a PGLS gene. As shown in the table the reptile clearly diverged from human than from the mammal; therefor, the similarity between the sequences is smaller.
Function
There is not yet a known function for the PGLS protein; however, it is believed to be a factor in the second step of the pentose phosphate pathway shown in Figure 3.[2]

Interacting Proteins
As the PGLS protein is found in so many different cells and tissues, it physically interacts with many other proteins. The first of these proteins which it is known to interact with is phospholipid scramblase 1 which much like PGLS its function is unknown. MyoD family inhibitor, protection of telomerase 1 homolog, telomeric repeat binding factor 1 and 2, and Ewing sarcoma breaking point region one are all other protein which have a physical interaction with PGLS, but who's role is unspecified. [14] All of these protein physical interactions where determined invitro through two hybrid detection method. There is an endless list of proteins in which PGLS have been found to react with; however, the ones listed above are the most prevalent and reoccurring interactions.
Clinical Significance
When looking at muscle tissues of people with Duchenne muscular dystrophy verses those without there is a much greater expression of PGLS in those with the disease. Normal tissues do not even detect PGLS; whereas, the tissues of those with muscular dystrophy express it at a fairly high level in some muscle cells. [9]