User:Narges.sharif/Graphical Models for Protein Structure

This is not a Wikipedia article: It is an individual user's work-in-progress page, and may be incomplete and/or unreliable. For guidance on developing this draft, see Wikipedia:So you made a userspace draft.

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs
This page was last edited by Narges.sharif (talk | contribs) 15 years ago. (Update timer)

Finished writing a draft article? Are you ready to request an experienced editor review it for possible inclusion in Wikipedia? Submit your draft for review!

Graphical Models have become powerful frameworks for Protein structure prediction, Protein-protein interaction and Free energy calculations for Protein Structures. Using a graphical model to represent the protein structure allows us to solve many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.

There are two main approaches to use Graphical Models in Protein Structure Modeling. First approach uses Discrete variables for representing coordinates or Dihedral angles of the protein structure. The variables are originally all continuous values, to transform them into discrete values, a discretization process is typically applied. Second approach uses Continuous variables for the coordinates or Dihedral angles.

Discrete Graphical Models for Protein Structure

Markov random field, also known as undirected graphical model is a common representation for this problem. Given an undirected graph G = (V, E), a set of random variables X = (X_v)_v ∈ V indexed by V form a Markov random field with respect to G if they satisfy the Pairwise Markov property:

Any two non-adjacent variables are conditionally independent given all other variables:

X_{u}\perp \!\!\!\perp X_{v}|X_{V\setminus \{u,v\}}\quad {\text{if }}\{u,v\}\notin E

In the Discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are dihedral angles, the discretization is typically done by mapping each value to the corresponding Rotamer conformation.

Model

Let X = {X_b, X_s} be the random variables representing the entire protein structure. X_b can be represented by a set of 3-d coordinates of the backbone atoms, or equivalently, by a sequence of bond lengths and dihedral angles. The probability of a particular conformation x can then be written as:

p(X=x|\Theta )=p(X_{b}=x_{b})p(X_{s}=x_{s}|X_{b},\Theta )

.

where $\Theta$ represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in $\Theta$ . This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them).

Given this representation, the probability of a particular side chain conformation x_s given the backbone conformation x_b can be expressed as

$p(X_{s}=x_{s}|X_{b}=x_{b})={\frac {1}{Z}}\prod _{c\in C(G)}\Phi _{c}(x_{s}^{c},x_{b}^{c})$

where C(G) is the set of all cliques in G, $\Phi$ is a potential function defined over the variables, and Z is the so called partition function.

To completely characterize the MRF, it is necessary to define the potential function $\Phi$ . To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In Goblin System, this pairwise functions are defined as

$\Phi (x_{s}^{i_{p}},x_{b}^{j_{q}})=exp(-E(x_{s}^{i_{p}},x_{b}^{j_{q}})/K_{B}T)$

where $E(x_{s}^{i_{p}},x_{b}^{j_{q}})$ is the energy of interaction between rotamer state p of residue $X_{i}^{s}$ and rotamer state q of residue $X_{j}^{s}$ and $k_{B}$ is the Boltzmann constant.

Using a PDB file, this model can be built over the protein structure. From this model free energy can be calculated.

Free Energy Calculation: Belief Propagation

It has been shown that the free energy of a system is calculated as

$G=E-TS$

where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as

$G=\sum _{x}p(x)E(x)-T\sum _{x}p(x)ln(p(x))$

Calculating p(x) on discrete graphs is done by the Generalized belief propagation algorithm. This algorithm calculates an approximation to the probabilities, and it is not guaranteed to converge to a final value set. However, in practice, it has been shown to converge successfully in many cases.

Continuous Graphical Models for Protein Structures

Guassian Graphical Models

Free Energy Calculation

References

External links

example.com