To compute the sqrt, we can use the Newton-Raphson method to converge to a sqrt.
Newton serie is:
x_(i+1) = x_i - f(x_i)/f'(x_i)
for the sqrt we have:
sqrt(n) = x
or
n = x^2
our newton function f(x) becomes:
f(x) = x^2 - n.
and first derivative:
f'(x) = 2x;
with our particular f(x), we have:
x_i+1 = x_i - (x_i^2 - n)/(2x_i)
One problem with that method, we need to divide by 2*x_i at each iteration, and as we know, divisions are really slow.
One way to avoid it, is to instead compute 1/sqrt(n) = x
this equation is equivalent to:
sqrt(n)/(sqrt(n)*sqrt(n)) = x
which becomes:
sqrt(n)/n = x
so once we find x, we just need to multiply by n to get the result:
sqrt(n) = x*n
so lets see the Newton method for 1/sqrt(n):
f(x) = x^-2 - n.
first derivative:
f'(x) = 2x^-3;
lets write f(x) and f'(x) in Newton:
x_i+1 = x_i - (x_i^2 - n)/(-2x_i^-3)
we can develop
x_i+1 = x_i + (x_i - n*x_i^3)/2
finally:
x_i+1 = x_i*(3/2 - 1/2 nx_i^2)
Great, as promised, we don’t have any divisions during the iterations.
To limit the number of iterations, we can use the IEEE to find a good “first” guess number. IEEE approximate the log base-2 of a number. The idea is to change the float value by changing the mantissa and exp to get a good qrt value.
The very good explanation by Chris Lomont of the technique can be found here:
http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
float xhalf = 0.5f*x; int i = *(int*)&x; // get bits for floating value i = 0x5f375a86- (i>>1); // gives initial guess y0 x = *(float*)&i; // convert bits back to float
Good news, We can totally do that in Flash with alchemy opcode !
our rsqrt function:
static public function rsqrt(val:Number):Number { /* if (val <= 0) { return val==0 ? 0 : Number.NaN; }; */ fastmem.fastSetFloat(val, 0); fastmem.fastSetI32(1597463174 - (fastmem.fastGetI32(0) >> 1), 0); var val2:Number = fastmem.fastGetFloat(0); return ((val2 * (1.5 - (val * 0.5 * val2 * val2)))); }
and to get the sqrt, we just need to multiply by val:
static public function sqrt(val:Number):Number { var val2:Number; /* if (val <= 0) { return (val==0)? 0 : Number.NaN; }; */ fastmem.fastSetFloat(val,0); fastmem.fastSetI32((1597463174 - (fastmem.fastGetI32(0) >> 1)),0); val *= (val2= fastmem.fastGetFloat(0)); return val * (1.5 - ((val * 0.5) * val2)); }
for rsqrt, the generate code is:
+1|-0 GetLocal(2) +1|-0 PushByte(0) +0|-2 SetFloat() +1|-0 PushInt(1597463174) +1|-0 PushByte(0) +1|-1 GetInt() +1|-0 PushByte(1) +1|-2 ShiftRight() +1|-2 Subtract() +1|-0 PushByte(0) +0|-2 SetInt() +1|-0 PushByte(0) +1|-1 GetFloat() +1|-1 ConvertDouble() +0|-1 SetLocal(3) +1|-0 GetLocal(3) +1|-0 PushDouble(1.5) +1|-0 GetLocal(2) +1|-0 PushDouble(0.5) +1|-2 Multiply() +1|-0 GetLocal(3) +1|-2 Multiply() +1|-0 GetLocal(3) +1|-2 Multiply() +1|-2 Subtract() +1|-2 Multiply() +0|-1 ReturnValue()
which we can re-arranged to avoid the need of another local variable, take advantage of the integer operations to use SubtractInt instead of Subtract.
here my proposal:
static public function rsqrt_asm(val:Number):Number { /* if (val <= 0) { return val==0 ? 0 : Number.NaN; }; */ //fastmem.fastSetFloat(val, 0); //fastmem.fastSetI32(1597463174 - (fastmem.fastGetI32(0) >> 1), 0); //var val2:Number = fastmem.fastGetFloat(0); //return ((val2 * (1.5 - (val * 0.5 * val2 * val2)))); __asm( GetLocal(val), PushByte(0), SetFloat, PushInt(1597463174), PushByte(0), GetInt, PushByte(1), ShiftRight, SubtractInt, PushByte(0), SetInt, PushByte(0), GetFloat, //[val2] Dup, //[val2 val2] PushDouble(1.5), //[val2 val2 1.5] Swap, //[val2 1.5 val2] Dup, //[val2 1.5 val2 val2] Multiply, //[val2 1.5 (val2*val2)] PushDouble(0.5), //[val2 1.5 (val2*val2) 0.5] Multiply, //[val2 1.5 (val2*val2*0.5)] GetLocal(val), //[val2 1.5 (val2*val2*0.5) val] Multiply, //[val2 1.5 (val2*val2*0.5*val)] Subtract, //[val2 (1.5-val2*val2*0.5*val)] Multiply, //[val2*(1.5-val2*val2*0.5*val)] ReturnValue ); return 0; }
and for the sqrt, similar changes can be done:
static public function sqrt_asm(val:Number):Number { __asm( GetLocal(val), //[val] Dup, //[val val] PushByte(0), //[val val 0] SetFloat, //[val] PushInt(1597463174), //[val 1597463007] PushByte(0), //[val 1597463007 0] GetInt, //[val 1597463007 valint] PushByte(1), //[val 1597463007 valint 1] ShiftRight, //[val 1597463007 valint 1 >>] => [val 1597463007 (valint>>1)] SubtractInt, //[val 1597463007 (valint>>1) -] => [val (1597463007-(valint>>1))] PushByte(0), //[val (1597463007-(valint>>1)) 0] SetInt, //[val] PushByte(0), //[val 0] GetFloat, //[val valfloat] ConvertDouble, Dup, //[val valfloat valfloat] SetLocal(val), //[val valfloat] valfoat->val Multiply, //[val valfloat *] => [(val*valfloat)] //ConvertDouble, //-#ben is that needed ? Dup, //[(val*valfloat) (val*valfloat)] PushDouble(1.5), //[(val*valfloat) (val*valfloat) 1.5] Swap, //[(val*valfloat) 1.5 (val*valfloat)] PushDouble(0.5), //[(val*valfloat) 1.5 (val*valfloat) 0.5] Multiply, //[(val*valfloat) 1.5 (val*valfloat) 0.5 *] => [(val*valfloat) 1.5 ((val*valfloat)*0.5)] GetLocal(val), //[(val*valfloat) 1.5 ((val*valfloat)*0.5) val2] Multiply, //[(val*valfloat) 1.5 ((val*valfloat)*0.5) val2 *] => [(val*valfloat) 1.5 (val*valfloat*0.5*val2)] Subtract, //[(val*valfloat) 1.5 (val*valfloat*0.5*val2) -] => [(val*valfloat) (1.5-val*valfloat*0.5*val2)] Multiply, //[(val*valfloat) (1.5-val*valfloat*0.5*val2) *] => [(val*valfloat)*(1.5-val*valfloat*0.5*val2)] ReturnValue ); return 0; }
here some profiling result:
sqrt
rsqrt
okay, the optimized assembly code didn’t really help, but our new as3 version works really well !
Leave a Reply