User:Webmasterpdx/FFPR Algorithm
![]() | This is not a Wikipedia article: It is an individual user's work-in-progress page, and may be incomplete and/or unreliable. For guidance on developing this draft, see Wikipedia:So you made a userspace draft. Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
FFPR Algorithm is an acronym for the term Fast Floating Point Replacement Algorithm, which is a new algorithm to both simplify and speed up floating point calculations for low cost integer microprocessors to reduce the cost and increase the power of embedded circuit implementations of algorithms that traditionally involve a lot of floating point calculations.
References
External links
This is both an initial draft of this article (not meant for publication yet), and waiting for a creative way to get references necessary as this is a typical article of a new algorithm (which often don't have references).
As a draft, I'll start by presenting this article as an outline which I'll fill up. We can then later decide how to format the document around that.
Name: Fast Floating Point Replacement (FFPR) Algorithm.
Purpose: This algorithm is a technique that when applied to embedded floating point operation can result in new algorithms for certain embedded operations (such as FIR filters for example). The idea is to replace expensive Floating Point operations with replacement integer operations that can be executed much faster at some cost (could be resolution, accuracy or some other feature), but this cost may be worth the exchange, as by eliminating the expensive FP operations can reduce price, size, complexity, power consumption and time to compute. This algorithm was invented originally to implement some signal processing algorithms in an 8-bit integer PIC processor without even a multiplier.
Details: A typical Floating Point operation (or series of operations) may be indicated by the following equation. y = Ax0 + Bx1 This represents 2 floating point values, A and B, that are part of the algorithm being implemented, while x0 and x1 are samples, perhaps from an A/D. This type of calculation is the most common in signal processing. e.g. Ax0 + Bx1 + Cx2 is a simple FIR filter. So, in this, our simplest representation of our algorithm, we have 2 floating point multiplies. Typically our samples, xi, are integers, but on a PC would be implemented as Floating Point values.
So, FFPR, or Fast Floating Point Replacement, theory simply dictates that the FP numbers be replaced with combinations of powers of 2. For example, the number 0.3778 could be replaced by the number 0.375 which is 3/8. To multiply by 3/8 is represented by this formula. (3/8)x = ((x >> 2) + x) >> 3, which is 3 fast integer operations. So, instead of using a 30 instruction fast floating point operation, we've changed it to 3 instructions. This is a speedup of a factor of 10 (AT LEAST, as 30 instructions is conservatively fast for a floating point operation on an integer processor with no multiplier).
This algorithm is presented here as in the simplest example above to be with sampled data. This assumes that you start with an n-bit integer, to be multiplied by a floating point value. The FP value needs to be converted to a fraction that has a power of 2 as it's denominator. This denominator results in a final shift right (the speed of this operation is faster if the uP has a barrel one cycle shifter). The numerator results in the sample value being shifted right and added with values.
The order of the operations is important. This is so that as much of the calculations should be done to increase the size of the integer before the final shift right. This will result in a minimal loss of bits due to integer operations. This may mean the ability to catch overflow of bits in calculation. e.g. if doing 8-bit calculations, you may want to use a 16-bit integer to catch these extra bits.
Now, when applying this algorithm to an embedded operation, it is typically done ahead of time with specific values, such as in the case of a FIR filter. This is because there are losses due to the conversion of the floating point value to the integer fraction. Before the final implementation of the calculations can be used, one needs to compare the results of the floating point implementation and the integer version and make sure that the results fall withing acceptable tolerances to be of actual use. This is usually done by implementing the algorithm in C using floating point, using the integer calculations, using a known input value and a means of measuring the output for accuracy. gnuplot can then be used to plot the outputs on the same graph in different colors. The algorithm can ONLY be used if the results are good enough to be acceptable for the embedded device to be useful as stated in it's specifications.
Apart from the power of two substitution, there is another feature of algorithms that can be used. This is to use interim calculations where the result of such calculations may be reused. This way, this part of the calculation is done once and stored and then used in multiple places. Likewise, noticing patterns such as symmetry is important. For example for Low Pass Filter FIR Filter calculations, the companion High Pass Filter uses the same coefficients, but uses different signs in the middle term in a 3 term FIR filter. This way, to calculate the LPF value and HPF value, one just needs to calculate the coefficient multiplies without adding them first. Then once calculation can add them while the other subtracts one of the coefficients instead of adding. So, making use of any such symmetry or patterns in calculations can speed up calculations considerably.
In cases where the floating point is ill conditioned (where small changes to an integer fraction causes too much change in the output results to be useful), one may have to use a conventional fast floating point algorithm instead. This might be convert the FP number to a fixed point, where the mantissa is a 16 bit integer, the exponent is a byte and the sign is part of the 16-bit integer mantissa. The sample might be 8-bits, so in this case, we just multiply the byte by the mantissa, adjust the exponent as appropriate and do it using a technique like that.