Talk:Durand–Kerner method

See Talk:Root-finding algorithm for some discussion relating to this method. -- Jitse Niesen (talk) 12:58, 10 January 2006 (UTC)[reply]

Jitse, thanks! Moving the method here was the right thing to do! Oleg Alexandrov (talk) 16:30, 10 January 2006 (UTC)[reply]

PS The title is not perfectly correct. It should have been not Jacoby's method, rather Bo Jacoby's method. :) Oleg Alexandrov (talk) 16:30, 10 January 2006 (UTC)[reply]

This seems to be the same method: The method of Weierstrass (a.k.a the Durand-Kerner method). Fredrik Johansson - talk - contribs 18:21, 25 January 2006 (UTC)[reply]

That's great Fredrik! Go ahead and include the new reference and edit the title, preserving the links. Bo Jacoby 10:07, 26 January 2006 (UTC)[reply]

I moved this to "Durand-Kerner method" because that name turns up the most Google hits by far. So does anyone know what each of Weierstrass, Durand and Kerner has to do with it? I've been unsuccessful in finding any historical information. Fredrik Johansson - talk - contribs 11:58, 26 January 2006 (UTC)[reply]

There is something at http://netlib3.cs.utk.edu/cgi-bin/mfs/02/89/v89n10.html . I've been planning to go to the library to chase references but I haven't found the time yet. -- Jitse Niesen (talk) 14:00, 26 January 2006 (UTC)[reply]

Your good work, Fredrik, and the authority of the name Weierstrass has finally saved this article from the risk of being deleted by Oleg. Thank you ! Bo Jacoby 13:11, 26 January 2006 (UTC)[reply]

The references to Durand and Kerner ought to be sufficient, but I'm curious about the Weierstrass link. My calculus professor said he recognized "Durand-Kerner" when I asked today, but that seemed to be about it. There are some experts in numerical analysis on campus who probably know more; I'll try to find out. Mailing the author of the page I linked to above might also be a good idea. Fredrik Johansson - talk - contribs 16:21, 26 January 2006 (UTC)[reply]

In unrelated work, I came across this article in ACM ToMS, which seems to be freely available (at least, I could download it). It discusses the method and mentions the link to Weierstrass. -- Jitse Niesen (talk) 13:25, 31 January 2006 (UTC)[reply]

Thank you Jitse. The new reference talks about how well the method finds multiple roots. Well, I don't think that multiple roots should be found numerically (and approximately), because they can be isolated from the polynomial algebraically (and exactly). The point is this. If f has a multiple root, p, then p is also root in the derivative f' , which can be computed algebraically from f. Use the Euclid algoritm to compute f₂ := the greatest common divisor of f and f' . The roots of f₂ are exactly the multiple roots of f. Now any triple root of f will be double root of f₂, so f₃ is calculated as the greatest common divisor between f₂ and f₂' , and so on until the degree is 1. Let f_n have only simple roots. Substitute f_m-k:=f_m-k / f_m^k, for k=1..m-1, for m=n..2. All these polynomial divisions give zero remainder. Now all the f_i have only simple roots, and f=f₁×f₂²×...×f_nⁿ. The polunomials f_i are now subject to numerical root-finding. Bo Jacoby 15:15, 31 January 2006 (UTC)[reply]

This may be okay if you know in advance that the polynomial has multiple roots. If you do not, then you can theoretically use Euclid's algorithm to find the gcd, but I'm not sure this works in the presence of round-off errors (for essentially the same reason as you can't compare to zero in numerical algorithm). Furthermore, understanding how the algorithm performs with multiple roots helps you understand how it performs when roots are clustered. -- Jitse Niesen (talk) 18:52, 1 February 2006 (UTC)[reply]

The polynomial gcd is done in integer arithmetic without round-off errors. If the polynomial has no double roots, then f₂ has degree zero. A cluster of roots slow down convergence until it is infiltrated by p,q,r,s. The real problem is the precise evaluation of the polynomial inside a cluster. Polynomials with reasonably small integer coefficients do not have clusters of roots, as small fractions have big denominators. Bo Jacoby 08:56, 2 February 2006 (UTC)[reply]

True if your polynomial has small integer coefficients; these are the easiest to handle. -- Jitse Niesen (talk) 13:52, 2 February 2006 (UTC)[reply]

Initialization

The recent references have expensive initializations.

Random numbers should be used with care, noting that one cannot reproduce a calculation that depend on random numbers.
Computing points evenly distributed on the unit circle is as costly as computing the roots.

So stick to the simple suggestion of the article. But the references support the method. Bo Jacoby 13:17, 30 January 2006 (UTC)[reply]

The simple way seems practical enough, but the advantages and disadvantages of more complex initialization schemes should probably be discussed in the article. Fredrik Johansson - talk - contribs 13:31, 30 January 2006 (UTC)[reply]

Connection to Newton nethod

I've now been informed that this is a quasi-Newton method, the product in the denominator being an approximation of the derivative. I've also been shown that vanilla Newton can be used for simultaneous root-finding in an identical manner, and received the question of in which way Durand-Kerner is better in this setting. Fredrik Johansson - talk - contribs 16:50, 1 February 2006 (UTC)[reply]

Great. Lets have an article on vanilla newton. (d/dx)((x-p)(x-q)(x-r)(x-s)) = (x-q)(x-r)(x-s)+(x-p)(x-r)(x-s)+(x-p)(x-q)(x-s)+(x-p)(x-q)(x-r), and surely if x=p this reduces to (x-q)(x-r)(x-s). Bo Jacoby 08:56, 2 February 2006 (UTC)[reply]

To answer the question posed to Fredrik, quasi-Newton is generally used because it's faster. As the expression given by Bo shows, the exact derivative is given by a rather long formula, and the approximation (x-q)(x-r)(x-s) is easier to evaluate. On the downside, it might slow down convergence.

The ToMS article says that Durand-Kerner can also be seen as a Newton method (and not just quasi-Newton), but in a rather complicated way and I haven't studied the details. -- Jitse Niesen (talk) 13:52, 2 February 2006 (UTC)[reply]

substitutions parallel or sequential

The article is not clear whether the new roots should be updated sequentially or in parallel. I've tried both and haven't found any significant advantage either way, but I'm wondering whether someone can give reasonable justification for either. --njh 00:12, 14 June 2006 (UTC)[reply]

My observations confirm yours: no significant advantage either way. In the article I inserted semicolons after the substitutions, just to fix that the algorithm stated was ment to be sequential, but it doesn't matter. If you use the APL programming language or the J programming language or another array programming language you would prefer to do it in parallel. If you use C programming language or fortran or pascal programming language or another scalar programming language you would prefer to do it sequentially. The article root-finding algorithm says that global convergence properties are still under investigation, so please share your experience with the curious world. Bo Jacoby 19:14, 14 June 2006 (UTC)[reply]

computational cost

I was going to also ask, but felt it perhaps off topic, is there any theoretical knowledge about the computational cost of complex polynomial factorization? For example, we can compute the roots of a linear, quadratic, cubic, quartic and quintic polynomial in a fixed sequence of root operations and basic arithmetic, so if we can perform all of these in roughly linear time we can find the root of an arbitrary such polynomial in linear time. Also note that the 'big O' factor of these increases in a non-linear way with increasing degree (the quartic is only a little more work than the cubic). So is there any known bound on the complexity of single variable polynomial root finding? (or alternatively, eigenvalue computation) There were a few results on undecidability of simultaneous polynomials, and on diophantane quadratics, but I don't know of any results for complex domain, single variable polynomials. I do know that all existing algorithms give no guarantees at all for convergence on higher degree polys.

My personal experiences with global converge have been on relatively low order polynomials (up to around degree 10) used in graphics applications. My program generates thousands of polynomials whilst interacting, and occasionally one of these stops and thinks for a bit. So I'm looking for algorithms that have predictable worst case performance, perhaps being able to estimate the number of steps for a required root tolerance. --njh 08:46, 16 June 2006 (UTC)[reply]

The important question on worst case behaviour of root-finding algorithms has a pessimistic answer: For any root-finding algorithm there exists polynomial equations that cannot be solved. Why is it so? Well, consider a simple case, the square root of e^2·π·i·x where x is a real number, 0 ≤ x ≤ 1. The equation is z²−e^2·π·i·x = 0. The roots are z₁ = +e^π·i·x and z₂ = −e^π·i·x . A 'good' root-finding algorithm is a continuous function, f, (because it is a finite number of iterations of the initial guesses, and each iteration is a continuous function), with a value f(x) approximating (+e^π·i·x, −e^π·i·x). When x moves from x = 0 to x = 1, then the continuous approximation of (+e^π·i·x , −e^π·i·x ) moves from about (+1,−1) to about (−1,+1). But on the other hand, f(0) = f(1), because solving the same equation, z² − 1 = 0, must give exactly the same approximate solution. So, no good root-finding algorithm exists! There will always exist some number x in the interval 0 ≤ x ≤ 1 for which f(x) does not approximate neither (+e^π·i·x, −e^π·i·x) nor (−e^π·i·x, +e^π·i·x). This is the sad truth. It does not mean, however, that no useful root-finding algorithm exists. The unsolvable equations are exceptional. The Durand-Kerner method solves almost all n^th degree equations (in the sense of Lebesgue), but it makes no sense to ask for worst case behaviour.

Note that the root operations required in the classical formulas for equations of low degree, (2,3,4, not 5), are not easier than the general case. The above example shows that there is not even a good algorithm for finding the square root. Bo Jacoby 18:13, 31 August 2006 (UTC)[reply]

Ok, that makes sense, and confirms my vague feelings. Does this mean that even for IEEE 754 floating point there are some inputs to sqrt that are slow or inexact? --njh 18:40, 31 August 2006 (UTC)[reply]

Hi njh! You need to take the square root of a complex number. An algorithm that gives sqrt(1) = +1 is insufficient for complex numbers. You must also find the other square root: sqrt(1) = −1. So a good algoritm must return a vector containing both roots: sqrt(1) = (+1,−1). Another vector containing both roots is sqrt(1) = (−1,+1). The equation x²−1 = 0 does not care, so the algorithm must take the decision. The two vectors represent the same multiset, (namely {+1,−1} with curly brackets), but they are not the same complex vector and they don't even approximate one another. A root-finding algorithm will compute approximations to one of the two vectors, but not to the other vector. Now a continuous change of variable a, starting in 1, following the unit circle around 0, and ending back in 1, will have the square root sqrt(a) moving continuously from the vector (+1,−1) to (−1,+1). Do you see the disaster? If the root-finding algorithm returns sqrt(1) = (+1,−1) , then it cannot also return sqrt(1) = (−1,+1). So a unique square root is not continuous. It must discontinuously switch the components of the vector. Such a discontinuity cannot be constructed by means of a finite number of continuous iterations. The composition of continuous functions is continuous, you know.

Another problem is that the precision of floating point arithmetic may be insufficient for the precise evaluation of polynomials of moderately high degree. See Wilkinson's polynomial. Bo Jacoby 19:36, 31 August 2006 (UTC)[reply]

I realised you meant complex number after I had posted and gone to bed :). My reading of your followup is that you are mainly concerned that the csqrt function has discontinuities? Any program that relies on csqrt giving predictable results is thus in for a surprise.

Ok, so continuity of the solution function is impossible. A program can handle discontinuity quite happily. So we accept that and simply require finding all the roots in some order (say lexicographical order). Does that problem then go away? Accuracy is certainly a problem still, but we could give error estimates at the same time as evaluating them. I guess I'm now looking for a practical solution - I'd be happy with an algorithm that could find me the most likely solution with a pessimistic estimate of the condition number, error radius, or similar.--njh 00:08, 1 September 2006 (UTC)[reply]

The problem does not go away. A finite number of rational iterations produces a continuous, singlevalued function, while the square root is either discontinuous or multivalued. A continuous function cannot approximate a discontinuous function close to the discontinuity. The program cannot detect the discontinuity. It either stops without having found the roots, or goes on for ever. (Try for example Newton's method for the equation x²+1=0 by starting with x=1. You will never find the roots +i and −i). For any algoritm and for any initial guess there exists unsolvable equations. There is no bound on the complexity of single variable polynomial root finding. However, unsolvable equations are extremely rare. You may construct them from the algorithm and initial guesses, but you do not find them by chance. Bo Jacoby 12:07, 4 September 2006 (UTC)[reply]

Hi, you should better change the image in your explanation. Consider the set of possible initial tuples instead of parametrized polynomials. Each iteration step is a continuous map of one tuple to the next iterate. The combination of a sufficiently high number of iterations is still a continuous map. But by the hypothesis of convergence it has only values close to the n! permutations of the solution tuple. And initial tuples close to a permutation surely converge to this permutation (conditions for convergence were given at least by Weierstrass, Presic and Carstensen). So all permutations are approximated, the iterated map is nearly everywhere constant, thus the conclusion of "bad" initial tupels. The image for this situation should be a multidimensional analogue to the Newton fractal.--LutzL 15:02, 4 September 2006 (UTC)[reply]

As to the initial question: Factorizing a polynomial of degree n in precision s in linear factors for simple isolated zeros and higher degree factors for multiple zeros or clusters (wrt. the desired precision) costs O(n log(n+s)loglog(n)) bit operations, as one can learn in papers by Victor Pan. This algorithm is, as far as I know, unimplemented. There is an implementation of Schönhage's original splitting circle method in pari-gp, this method has O(n² log(n+s)loglog(n)).--LutzL 15:02, 4 September 2006 (UTC)[reply]

Relation to Newton's method

I do not understand the new subsections relating the Durand-Kerner method to other methods. Is the D-K method a special case of Newton's method, or an approximation to N's method? LutzL's work is hard to understand. Please explain. Bo Jacoby 18:42, 31 August 2006 (UTC)[reply]

It's a special case of the n-dimensional Newton's method, where n is the degree. Not, as I interpret your question, of the one-dimensional Newton iteration that is used to find one root at a time. I'll try to expand it. As I understand the history, the derivation as Newton's method was also the way Durand and Kerner took. I'm not sure about Weierstrass, in his article he just gives the iteration formula without derivation.--LutzL 07:30, 1 September 2006 (UTC)[reply]

I fail to understand. The problem solved by the Durand-Kerner method is finding n solutions to one polynomial equation having one variable and degree n, while the problem solved by the n-dimensional Newton's method is finding one solution to n polynomial equations having n variables and any degrees. Bo Jacoby 11:16, 4 September 2006 (UTC)[reply]

Hi, did you read the modified text in the article? There it is now explained that one solves a system of n identities between coefficients of monic or nomalized degree n polynomials. One is the given polynomial, the other the product of n linear factors. After several transformations the DK-method results--LutzL 14:48, 4 September 2006 (UTC)[reply]

Thanks. Now I read the new text and I understand. I was not aware of this interpretation of the D-K-method, and it is interesting.

The subscript is used in two ways in the formula