Talk:Tikhonov regularization/Archive 1

This is an archive of past discussions about Tikhonov regularization. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Least Squares Solution

"For α = 0 this reduces to the least squares solution of an overdetermined problem (m > n)."

This isn't correct. (A^TA)^-1 won't exist unless A has full column rank, and (m>n) does not imply that A has full column rank. The sentence should read "For α = 0 this reduces to the least squares solution provided that (A^TA)^-1 exists."

Isn't this equivalent to saying just that

A^{-1}

exists? —Simetrical (talk • contribs) 21:37, 14 August 2007 (UTC)

No: A doesn't have to be square. If A has full column rank then

A^{T}A

is invertible. --Zvika 06:28, 15 August 2007 (UTC)

Textbooks?

I take it that Tikhonov regularization is not discussed in text books yet? If it is, I would love a reference. 17:50, 16 October 2006 (UTC)

Yes of course. See some of the references in Inverse problem Billlion 08:22, 15 August 2007 (UTC)

explain what the variables represent

like the "m" and "q" in the bottom section. They just appear out of nowhere. Not very helpful. It's kinda like making up words adn randomly using them without explaining what they mean. Not very keaulnor. 65.26.249.208 (talk) 22:58, 9 December 2007 (UTC)

Also "and q is the rank of A" pops out of nowhere in Wiener filter formula (there is no q in the formula!) Merilius (talk) 08:26, 17 June 2008 (UTC)

There is indeed some woodoo to this technology. It requires some time spent on it. This is usually done when it is the last straw for a fitting problem. Consider you have to fit experimental data covering a bigger dynamic range and you're doing the approximation on a log scale. A normal least square fit prefers the bigger values and thus dumps the finer details. Tikhonov with a parameter lets you choose how the weight between smaller and bigger features is distributed. —Preceding unsigned comment added by 84.227.21.231 (talk) 16:40, 18 January 2009 (UTC)

Notation suggestions

It would be helpful to match notation here with the general linear regression page. Further, the reader is left to infer which symbol is a predictor, parameter, and response variable. This is also complicated by the traditional Ax=b notation used in systems of linear equations. — Preceding unsigned comment added by 24.151.151.137 (talk) 21:18, 21 December 2011 (UTC)

Generalized Tikhonov

The formula

x₀+ (A^TP A + Q)^-1A^TP(b- Ax₀)

is from Tarantola, eg (1.93) page 70. There are several other equivalent formulas.

Although the present article only treats linear inverse problems, Tikhonov regularization is widely used in nonlinear inverse problems. Under some conditions it can be shown that the regularized solution approximates the theoretical solution. See H.W. Engl, M Hanke, A Neubauer, Regularization of Inverse Problems, Springer 1996. — Preceding unsigned comment added by 187.40.219.120 (talk) 13:35, 26 May 2012 (UTC)

Description of conditioning needs more detail

The article says, "This demonstrates the effect of the Tikhonov parameter on the condition number of the regularized problem." I can't see how. I can see how it demonstrates the effect on the condition number of the matrix A; the condition number of the least squares problem is more subtle than that, as demonstrated briefly in Lecture 18 of Trefethen and Bau, and more fully in many other texts. If anyone knows the details, or at least where to look them up, could you please add them? It would be nice to state precisely what the effect is, for a start.

Actually, this seems to be an open research problem. See for instance Chu et al. 136.186.19.229 (talk) 01:05, 4 December 2012 (UTC)

Some examples?

I came across Tikhonov years ago in reading, and recently started using it. Would an examples section be appropriate? When I get time, I can type up some examples from image reconstruction, using weighted Fourier operators as the T matrix. From there, it would be simple to show how any particular type or location of details within a reconstructed image can be minimized or maximized via Tikhonov.

90.194.203.28 (talk) 13:17, 18 January 2013 (UTC)

Merge suggestion

I have added a {{mergefrom}} tag, suggesting that Bayesian linear regression is merged into here, since the content of that article viz. introducing a quadratic penalty on the size of the regression coefficients is exactly the same as that considered in this article, the only twist being that the quadratic penalty is interpreted as a multivariate Gaussian prior probability on the regression coefficients; and that having put the model into a Bayesian framework, Bayesian model selection can be used to fix the size of the trade-off parameter α.

(More on this can be found eg in David J. C. MacKay's 2003 book Information Theory, Inference, and Learning Algorithms, the relevant chapters being developed from his 1992 CalTech PhD thesis; the Bayesian linear regression article also cites a number of book sources.) Jheald (talk) 17:52, 11 November 2012 (UTC)

Minor comment -- why merge Bayesian linear regression in, Tikonov regularization is important enough in its own right and I would not confuse thing.165.124.167.130 (talk) 22:58, 8 January 2013 (UTC)

I also disagree, Tikhonov is not Bayesian anything and it would be confusing. The fact that Bayesians figure out ways to do the same calculation does not mean we have to rewrite everything under Bayesian lights.

In fact, I find surprising that the first section is Bayesian Interpretation moving to second place Generalized Tikhonov regularization. How is possible that stating the Bayesian translation of the method is more important than the very subject of the article which is Tikhonov? Viraltux (talk) 16:22, 18 March 2013 (UTC)

Relation to probabilistic formulation -- Dangling reference

The reference to $\alpha$ appears to be dangling. Csgehman (talk) 19:46, 6 January 2014 (UTC)

What is matrix A?

In the article, it is not clear what A, x, and b represent. In the case of linear regression, which of these represent the predictors, the outcomes, and the parameters? — Preceding unsigned comment added by 161.130.188.133 (talk) 17:15, 1 April 2014 (UTC)

Removal of SVD & Wiener filter information

Why were these sections removed? There does not appear to have been any discussion here, and they clearly contained useful information. If that information was redundant or should be somewhere else (though I don't see where that would be exactly) then there should be a link here, but there isn't. Here is the diff for the two edits that removed all this information. Unless someone has a good reason for removing these sections, I'd like to revert these egregious edits.

Caliprincess (talk) 20:30, 3 August 2013 (UTC)

Thank you for pointing this out, I completely agree. Reverted the edit now. Jotaf (talk) 19:24, 3 April 2014 (UTC)

Link between constrained optimization and filtering

The difference operator is high-pass, not low-pass (quick check: diff_t(exp(i*omaga*t)) = i*omega*exp(i*omaga*t); the higher omega, the higher the derivative). It is the Tichonov regularization process as a whole that is low-pass (i.e.: enforcing smoothness) with the finite difference operator. More generally, in penalized optimization, the constraints express what the result should not look like, so if one puts a high-pass as the operator, then the Tichonov system is low-pass and conversely. — Preceding unsigned comment added by 12.54.94.28 (talk) 17:24, 24 April 2014 (UTC)

"Known as ridge regression"

Isn't ridge regression the special case of L₂ regularization applied to regression analysis, rather than a synonym of general Tihkonov regularization? QVVERTYVS (hm?) 12:26, 27 January 2014 (UTC)

  This is also what my understanding is.  Support vector machine is also Tihkinov regularization, just with Hinge loss and L1 norm Robbieboy74 (talk) 02:01, 4 December 2014 (UTC)

"Lowpass operators"

Isn't a difference operator a highpass operator? MaigoAkisame (talk) 04:17, 16 December 2014 (UTC)