Talk:Conjugate gradient method/Archive 1

This is an archive of past discussions about Conjugate gradient method. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Needs example

This article needs an example usage case, preferably with picture. —Preceding unsigned comment added by 18.243.0.128 (talk) 17:09, 30 March 2008 (UTC)

Stretching, preconditioning

From reading Shewchuk's PDF (linked from this article), I'm beginning to understand this. I'm still a bit unclear about what the conjugacy condition does compared with what preconditioning does. As I understand it, the problem with steepest descent is that long valleys lead to zigzagging. It seems like picking conjugate orthogonal directions essentially stretches the energy space so that all valleys are circular rather than long and thin (see Shewchuk's figure 22 on p. 23). Is that accurate? (If so, it seems like a much simpler way of explaining the solution.) The thing is, this sounds a lot like what preconditioning is supposed to accomplish. That is, we want to use M so that $M^{-1}A$ has similar eigenvalues. Could someone clarify? Thanks. —Ben FrantzDale 19:11, 17 January 2007 (UTC)

Removal of "Nonlinear conjugate gradient" section

I would like to explain this edit I made. While it is true that Wikipedia material evolves in time, and starting with at least something is better than nothing, and people will eventually fix and expand on things, nevertheless, I find it a bad idea to insert content which you deliberatly know to need a lot more work.

I would still understand creating a poor quality new article. However, I would be opposed to taking a reasonably well written article as this one, and adding to it material you do not consider of the same quality. Such material will (in my experience) linger for ages without anybody touching it, resulting in a decreased overall quality of the article.

I suggest to BenFrantzDale to either fork it off to a new article, or actually make the effort of making sure the material is already good enough. Comments? Oleg Alexandrov (talk) 03:27, 18 January 2007 (UTC)

Fair enough. I started nonlinear conjugate gradient method. If it gets good enough it might be worth merging with this one or it might not. —Ben FrantzDale 03:36, 18 January 2007 (UTC)

OK. Actually I think Nonlinear conjugate gradient method has enough info to stand on its own. Although it would be nice to be expanded I think. Oleg Alexandrov (talk) 04:17, 18 January 2007 (UTC)

Another version of algorithm

I'm using such algorithm. IMHO it's clearer, and easier to understand.

r_{1}:=b-Ax_{1}\,

p_{1}:=r_{1}\,

k:=1\,

repeat until

r_{k}\,

is "sufficiently small":

k:=k+1\,

\alpha _{k}:={\frac {r_{k}^{\top }r_{k}}{p_{k}^{\top }Ap_{k}}}\,

r_{k+1}:=r_{k}-\alpha _{k}Ap_{k}\,

\beta _{k}:={\frac {r_{k+1}^{\top }r_{k+1}}{r_{k}^{\top }r_{k}}}\,

p_{k+1}:=r_{k+1}+\beta _{k}p_{k}\,

x_{k+1}:=x_{k}-\alpha _{k}p_{k}\,

end repeat

The result is

x_{k}\,

—Preceding unsigned comment added by 149.156.83.157 (talk • contribs) 10:11, 30 March 2007

I agree, that's a bit clearer so I changed it in the article. I changed it to check for the remainder immediately after you computed it, which I think is more logical. Thanks for your comment! -- Jitse Niesen (talk) 13:50, 30 March 2007 (UTC)

I don't know for me it is more logical in while() expresion. When implementing CG, You can save additional vector (4, instead of 5) when written in reversed order, but it's only technical problem, not so important. y

Yes, the algorithm is not optimized and it doesn't need to be. It seems we disagree about the logical place for the test. Do change it if you feel strongly about it, because I don't really care. Incidentally, profuse thanks (if it was you) for fixing my sign error. -- Jitse Niesen (talk) 04:07, 31 March 2007 (UTC)

Thanks

I had a conversation two days ago with the CTO of a small high-tech company in Silicon Valley (the specific details are not relevant here). At some point we needed to see how exactly the conjugate gradient works. He said (literally) "Wikipedia must have an article about that." And sure enough, the article was there. It felt gratifying to have a personal experience of how much Wikipedia is used. In particular, Jitse, thanks for this very nice article. The time you spent on it was well-worth it. :) Oleg Alexandrov (talk) 19:44, 31 March 2007 (UTC)

That's very good to hear. Fortunately it wasn't an expert; that would be embarrassing as I think that the presentation can be much improved. I suppose the check with my fee is in the mail? ;) -- Jitse Niesen (talk) 04:51, 1 April 2007 (UTC)

Good information, but could be clearer

I came across the conjugate gradient method when programming it for a class in numerics for solving elliptic PDEs. I used the information here in conjunction with our own notes, and found it useful for coding the algorithm. I think the information could be made a bit clearer and and the formatting and presentation of this article improved a little. While everyone will probably already know this, maybe A^T could be made clearer as A Transpose (just to make it quite clear) at the beginning. The diagram's information could be linked to the text and vice-versa. Currently, the diagram seems to stand on its own. (Starscreaming Crackerjack 09:57, 18 August 2007 (UTC))

Bold for vectors

I found this page useful, but it would help immensely if the vectors and matrices were in bold font. I sat staring for a while at a value that was treated like a vector but looked like a scalar summation. —Preceding unsigned comment added by 138.64.2.77 (talk) 20:16, 11 September 2007 (UTC)

A good point. This article uses the convention used in numerical mathematics: A (upper case letter) is a matrix, x (lower case letter) is a vector and α (lower case Greek letter) is a real number. I will add a note to the beginning of this article. TomyDuby (talk) 02:57, 18 June 2008 (UTC)

Philefou's Octave Implementation

In the past I was using Octave and needed the lsqr function, which is currently unavailable in Octave (http://www.gnu.org/software/octave/projects.html). I was planning to implement it with the conjugate gradient method, but I see Philefou added Octave code on January 14th, 2008. Since he/she has already written the code, I highly recommend he/she submit it to the Octave team. It should probably conform to this specification: http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/lsqr.html If Philefou reads this and decides not to contribute the code, please leave an appropriate comment and I (or maybe someone else) will write an implementation of lsqr. 168.7.230.201 (talk) 18:36, 21 January 2008 (UTC)

(Philefou) Hi everyone, I don't have much time to look at the specification and submit it. I don't mind if anyone want to use this code and submit it to Octave, if it can help someone else I'm alright with the idea!

description of iterative method is confusing

The description of the iterative method is confusing, particularly the formula defining p_{k+1}. I changed it to what it sounds like it needs to be in order to make the previous paragraphs have a lick of sense, but I'm still not sure it's right. Can someone fix this? (I guess I will eventually if no one else does, but I don't have my books on hand...)--141.211.61.254 (talk) 23:54, 4 May 2008 (UTC)

Re-render equations

Some equation images are not rendered correctly. You can see this on horizontal bars in some equations, it is blurry whereas it should be sharp. Or was that a joke, because it's about gradients? (see here, the second equation has the top symbols rendered blurry, whereas the first eq has a distinct line)

\alpha _{k}:={\frac {r_{k}^{\top }r_{k}}{p_{k}^{\top }Ap_{k}}}\,

\beta _{k}:={\frac {r_{k+1}^{\top }r_{k+1}}{r_{k}^{\top }r_{k}}}\,

Visitor.iQ (talk) 15:16, 21 May 2008 (UTC)

Such is the TeX->png conversion. No font hinting is used, so you get that when it tries to draw a one-pixel-thick line aligned to the half pixel. —Ben FrantzDale (talk) 23:28, 21 May 2008 (UTC)

Typo in optimal step equation

Very good article, but it seems the equation should be :

\alpha _{k}:={\frac {r_{k}^{\top }p_{k}}{p_{k}^{\top }Ap_{k}}}\,

And actually, this is correctly implemented in the octave procedure. —Wikisgoodforyou (talk) 10:07, 12 August 2008 (UTC)

Agreed. I have that form of the algorithm from class notes from a numerical methods class. I originally assumed that this had something to do with the mentioned simplifications, but I think the form that exists on the page is actually from the steepest decent method, as CG converges to SD if p_k = r_k. 152.3.159.173 (talk)

Possible Example

An anonymous user suggested an example be included on the Conjugate Gradient Method page. I'm considering adding this but will make a good attempt to keep the procedure and format as close as possible to the generalized algorithms already on the page. Please comment if you agree or disagree with this addition.

BigDix56 (talk) 03:49, 28 April 2009 (UTC)

First few lines are messy

I think that it is more correct to say: Conjugate gradient is a line search method for optimizing a function of several variables. It is often applied to solve linear systems of equations. (save the details for later) —Preceding unsigned comment added by 203.200.55.101 (talk • contribs) 05:55, 2 September 2006 (UTC)

This article is too condensed. Could you please give some more explications? Where do the alphas come from? What does mean: "This suggests taking the first basis vector p1 to be the gradient of f at x = x0, which equals -b" ? Sorry, I dont understand what you are talking about.

Ulrich —Preceding unsigned comment added by 85.90.4.11 (talk • contribs) 07:46, 6 November 2006 (UTC)

Notation and clarification.

Could someone explain the beta? The article explains that $\alpha _{k}$ is the kth component of the solution in the p basis (i.e., it is how far to go at the kth step). But what about beta? We have

\beta _{k}:={\frac {r_{k+1}^{\top }r_{k+1}}{r_{k}^{\top }r_{k}}}={\frac {\|r_{k+1}\|^{2}}{\|r_{k}\|^{2}}}

and it gets used to find the next p according to

p_{k+1}:=r_{k+1}+\beta _{k}p_{k}\,

The preceding section says that we'll update p by

p_{k+1}=r_{k}-{\frac {p_{k}^{\top }Ar_{k}}{p_{k}^{\top }Ap_{k}}}p_{k}

and

r_{k+1}=r_{k}-\alpha Ap_{k}

so

r_{k}=r_{k+1}+\alpha Ap_{k}

so

p_{k+1}=r_{k+1}+\alpha Ap_{k}-{\frac {p_{k}^{\top }Ar_{k}}{p_{k}^{\top }Ap_{k}}}p_{k}=r_{k+1}+\alpha Ap_{k}-{\frac {\langle p_{k},r_{k}\rangle _{A}}{\langle p_{k},p_{k}\rangle _{A}}}p_{k}

How does this algebra work out and what does the beta mean? Maybe it's the end of the day and my algebra is just failing me. Also, the Octave code should have similar variable names to the mathematics. —Ben FrantzDale (talk) 22:56, 15 May 2008 (UTC)

The explanation is presented in the context. Please derive it once again yourself giving the hint provided. —Preceding unsigned comment added by 173.26.252.42 (talk) 21:46, 16 November 2009 (UTC)

While this is an old thread, it shows that this article has been very unclear since quite some time ago about how the the formulae for

\alpha _{k}

and

\beta _{k}

as they are presented in the “The resulting algorithm” section are derived from those in previous sections. The root cause of such unclarity is that it fails to point out the orthogonality of residuals, i.e.,

\langle r_{i},r_{j}\rangle =0

for any

i\neq j

, which is an important property of the conjugate gradient method and crucial to the simplication of the formulae of

\alpha _{k}

and

\beta _{k}

.

With the orthogonality of residuals, one can take advantage of the fact that

p_{k}

is the sum of

r_{k}

and some linear combination of

r_{0},r_{1},\ldots ,r_{k-1}

to show that

p_{k}^{\top }r_{k}=r_{k}^{\top }r_{k}

and thus

\alpha _{k}={\frac {p_{k}^{\top }r_{k}}{p_{k}^{\top }Ap_{k}}}={\frac {r_{k}^{\top }r_{k}}{p_{k}^{\top }Ap_{k}}}{\text{.}}

To simplify

p_{k+1}=r_{k+1}-\sum _{i\leq k}{\frac {p_{i}^{\top }Ar_{k+1}}{p_{i}^{\top }Ap_{i}}}p_{i}{\text{,}}

one can take advantage of the fact that for

i\leq k

,

p_{i}^{\top }Ar_{k+1}={\frac {1}{\alpha _{i}}}(r_{i+1}-r_{i})^{\top }r_{k+1}={\frac {1}{\alpha _{i}}}r_{i+1}^{\top }r_{k+1}\neq 0

only if

i=k

. [EDIT Apr 11, 2010: Corrected formula.]

Hence,

p_{k+1}=r_{k+1}-{\frac {1}{\alpha _{k}}}{\frac {r_{k+1}^{\top }r_{k+1}}{p_{k}^{\top }Ap_{k}}}p_{k}=r_{k+1}-{\frac {p_{k}^{\top }Ap_{k}}{r_{k}^{\top }r_{k}}}{\frac {r_{k+1}^{\top }r_{k+1}}{p_{k}^{\top }Ap_{k}}}p_{k}=r_{k+1}-{\frac {r_{k+1}^{\top }r_{k+1}}{r_{k}^{\top }r_{k}}}p_{k}{\text{,}}

which gives

\beta _{k}={\frac {r_{k+1}^{\top }r_{k+1}}{r_{k}^{\top }r_{k}}}{\text{.}}

All these derivations as well as the orthogonality property should really have been readily available in the text to help people understand the subject more easily rather than the overly simple and context-incoherent opening paragraph of “The resulting algorithm” section. Not every reader can be assumed to have sufficient capability to derive these by themselves.Kxx (talk) 04:17, 5 February 2010 (UTC)

Mistake ?

In sections 'The conjugate gradient method as an iterative method' and 'The resulting algorithm', the definition of alpha changes from p.r to r.r. Is this intentional ? The p.r version is correct, right ? - max. —Preceding unsigned comment added by 78.226.234.207 (talk) 08:50, 14 October 2010 (UTC)

Both expressions are correct. They are equal. Kxx (talk | contribs) 19:59, 14 October 2010 (UTC)

Minor point about quadratic forms

Is x'A'x + b'x really a quadratic form? It's not homogeneous, and my understanding is that quadratic forms must be homogeneous. Note that this is just a usage quibble; I have no complaint about the math. Birge (talk) 03:26, 16 May 2009 (UTC)

It is not according to "Quadratic form". I have made the modifications.Kxx (talk) 05:48, 16 May 2009 (UTC)