Talk:Floating-point arithmetic
![]() | Computing: CompSci B‑class Top‑importance | |||||||||||||||||||
|
![]() | Computer science B‑class Top‑importance | ||||||||||||||||
|
|
|||||
This page has archives. Sections older than 90 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
spelling inconsistency floating point or floating-point
The title and first section say "floating point". But elsewhere in the article "floating-point" is used. The article should be consistent in spelling. In IEEE 754 they use "floating-point" with hyphen. I think that should be the correct spelling.JHBonarius (talk) 14:18, 18 January 2017 (UTC)
- This is not an inconsistency (at least, not always), but usual English rules: when followed by a noun, one adds an hyphen to avoid ambiguity, e.g. "floating-point arithmetic". Vincent Lefèvre (talk) 14:26, 18 January 2017 (UTC)
hidden bit
The article Hidden bit redirects to this article, but there is no definition of this term here (there are two usages, but they are unclear in context unless you already know what the term is referring to). Either there should be a definition here, or the redirection should be removed and a stub created. JulesH (talk) 05:43, 1 June 2017 (UTC)
- It is defined in the Internal representation section. Vincent Lefèvre (talk) 17:56, 1 June 2017 (UTC)
Seeking consensus on the deletion of the "Causes of Floating Point Error" section.
that: 'and cancellation amplifies any accumulated rounding error exponentially' should be mentioned as the! main source for fp-math errors, outperforms any other by at least a factor of 10' [bs 2021-03-11]
There is a discussion with Vincent Lefèvre seeking consensus on the deletion of the "Causes of Floating Point Error" from this article on whether this change should be reverted.
Softtest123 (talk) 20:16, 19 April 2018 (UTC)
- It started with "The primary sources of floating point errors are alignment and normalization." Both are completely wrong. First, alignment (of the significands) is just for addition and subtraction, and it is just an implementation method of a behavior that has (most of the time) already been specified: correct rounding. Thus alignment has nothing to do with floating-point errors. Ditto for normalization. Moreover, in the context of IEEE 754-2008, a result can be normalized or not (for the decimal formats and non-interchange binary formats), but this is a Level 4 consideration, i.e. it does not affect the rounded value, thus does not affect the rounding error. In the past (before IEEE 754), important errors could come from the lack of normalization before doing an addition or subtraction, but this is the opposite of what you said: the errors were due to the lack of normalization in the implementation of the operation, not due to normalization. Anyway, that's the past. Then this section went on about alignment and normalization...
- The primary source of floating-point errors is actually the fact that most real numbers cannot be represented exactly and must be rounded. But this point has already been covered in the article. Then, the errors also depend on the algorithms: those used to implement the basic operations (but in practice, this is fixed by the correct rounding requirement such as for the arithmetic operations +, −, ×, /, √), and those that use these operations. Note also that there is already a section Accuracy problems about these issues.
- Vincent Lefèvre (talk) 22:14, 19 April 2018 (UTC)
- Perhaps it would be better stated that the root cause of floating point error is alignment and normalization. Note that either alignment or normalization must delete possibly significant digits, then the value must be rounded or truncated, both of which introduce error.
- Of course the reason there is floating point error is because real numbers, in general, cannot be represented without error. This does not address the cause. What actual operations inside the processor (or software algorithm) causes a floating point representation of a real number to be incorrect.
- Since you have not addressed my original arguments as posted on your talk page, I am reposing them here:
- In your reason for this massive deletion, you explained "wrong in various ways." Specifically, how is it wrong? This is not a valid criteria for deletion. See WP:DEL-REASON.
- When you find errors in Wikipedia, the alternative is to correct the errors with citations. This edit was a good faith edit WP:GF.
- Even if it is " badly presented", that is not a reason for deletion. Again, see WP:DEL-REASON.
- And finally, "applied only to addition and subtraction (thus cannot be general)." Addition and subtraction are the major causes of floating point error. If you can make cases for adding other functions, such as multiplication, division, etc., then find a resource that backs your positions and add to the article.
- I will give you some time to respond, but without substantive justification for your position, I am going to revert your deletion based on the Wikipedia policies cited. The first alternative is to reach a consensus. I am willing to discuss your point of view.
- Because you have not responded specifically to these Wikipedia policies (WP:DEL-REASON and WP:GF), I am reverting the section. Please feel free to edit it to correct any errors you might see. I would refer you to the experts on floating point such as Professor Kahan and David Goldberg
- Softtest123 (talk) 23:03, 24 April 2018 (UTC)
- You might not know, but Vincent is one of those experts on floating point. ;-)
- Nevertheless, it is always better to correct or rephrase sub-standard contents instead of deleting it.
- --Matthiaspaul (talk) 11:43, 16 August 2019 (UTC)
- @Softtest123 and Matthiaspaul: I think that this is more complex than you may think. The obvious cause of floating-point errors is that real numbers are not, in general, represented exactly in floating-point arithmetic. But if one wants to extend that, e.g. by mentioning solutions as what was expected with this section, this will necessarily go too far for this article. IMHO, a separate article would be needed, just like the recent Floating point error mitigation, which should be improved and probably be renamed to "Numerical error mitigation". Vincent Lefèvre (talk) 14:46, 16 August 2019 (UTC)
- I agree that "...real numbers are not, in general, represented exactly in floating-point arithmetic" so then the question is, "How does that manifest itself in the algorithms, and consequently the hardware design?" What is it in the features of these implementations that manifests the errors?" As I have pointed out, rounding error occurs when the results of an arithmetic operation produces more bits than can be represented in the mantissa of a floating point value. There are methods of minimizing the probability of the accumulation of rounding error, however, there is also cancellation error. Cancellation error occurs during normalization of subtraction when the operands are similar, and cancellation amplifies any accumulated rounding error exponentially [Higham,1996, "Accuracy and Stability...", p. 11]. This is the material that I presented that was deleted.
- Softtest123 (talk) 18:14, 16 August 2019 (UTC)
- @Softtest123 and Matthiaspaul: I think that this is more complex than you may think. The obvious cause of floating-point errors is that real numbers are not, in general, represented exactly in floating-point arithmetic. But if one wants to extend that, e.g. by mentioning solutions as what was expected with this section, this will necessarily go too far for this article. IMHO, a separate article would be needed, just like the recent Floating point error mitigation, which should be improved and probably be renamed to "Numerical error mitigation". Vincent Lefèvre (talk) 14:46, 16 August 2019 (UTC)
- Interestingly, it just so happens that this week I have been doing some engineering using my trusty SwissMicros DM42 calculator[1] which uses IEEE 754 quadruple precision decimal floating-point (~34 decimal digits, exponents from -6143 to +6144) and at the same time am writing code for a low end microcontroller used in a toy using bfloat16 (better for this application than IEEE 754 binary16 which I also use on some projects). You really have to watch for error accumulation at half precision. --Guy Macon (talk) 19:28, 16 August 2019 (UTC)
- The effect on the algorithms is various. Some algorithms (such as Malcolm's algorithm) are actually based on the rounding errors in order to work correctly. There is no short answer. Correct rounding is nowadays required in implementations of the FP basic operations; as long as this requirement is followed, the implementer has the choice of the hardware design. Cancellation is just the effect of subtracting two numbers that are close to each other; in this case, the subtraction operation itself is exact (assuming the same precision for all variables), and the normalization does not introduce any error. Vincent Lefèvre (talk) 20:13, 16 August 2019 (UTC)
Fastfloat16?
[ https://www.analog.com/media/en/technical-documentation/application-notes/EE.185.Rev.4.08.07.pdf ]
Is this a separate floating point format or another name for an existing format? --Guy Macon (talk) 11:32, 20 September 2020 (UTC)
- Same question for [ http://people.ece.cornell.edu/land/courses/ece4760/Math/Floating_point/ ] Somebody just added both to our Minifloat article. --Guy Macon (talk) 11:37, 20 September 2020 (UTC)
- As the title of the first document says: Fast Floating-Point Arithmetic Emulation on Blackfin® Processors. So, these are formats convenient for a software implementation of floating point ("software implementation" rather than "emulation", as they don't try to emulate anything since they have their own arithmetic, without correct rounding). The shorter of the two formats has a 16-bit exponent and a 16-bit significand (including the sign). Thus that's a 32-bit format. Definitely not minifloat. And the goal (according to the provided algorithms) is not emulate minifloat formats either (contrary to what I have done with Sipe, where I use a large format for a software emulation of minifloat formats). In the second document, this is a 24-bit format with a 16-bit significand, so I would not say that this is a minifloat either. — Vincent Lefèvre (talk) 16:23, 20 September 2020 (UTC)
- Thanks! That was my conclusion as well but I wanted someone else to look at it on case I was missing something. As an embedded systems engineer working in the toy industry I occasionally use things line minfloat and brainfloat, but I am certainly not an expert. I fixed the minifloat article. --Guy Macon (talk) 17:50, 20 September 2020 (UTC)