Today, I am going to try to optimize Math.atan2(y,x), atan2 is very useful when working with polar coordinates.
Here the definition of atan2:
Based on the definition, we will have to deal with multiple quadrants, but here some useful maths we can use:
atan(-x) = -atan(x)
atan(1/x)
= PI/2 - atan(x) for x>0
=-PI/2 - atan(x) for x<0
so going back to the definition of atan2, when we need to compute atan(y/x) we can also instead compute +-PI/2 – atan(x/y)
if we choose to compute atan(k) with k =x/y when y>x and k=y/x when x>y, we ensure the range is always in the [0;1]
We can pre compute the atan and store the results in a lookup table, the range is small and convenient [0;1]
so the first attempt:
static public function arcTan2_v1(y:Number,x:Number):Number { var octant:int = 0; var r:Number; if (x < 0) { octant = 1; x = -x; } if (y < 0) { octant |= 2; y = -y; } if (y > x) { octant |= 4; r = x / y; } else r = y / x; r = fastmem.fastGetFloat( (r * 32768) & 0x0ffffffc); //LUT switch (octant) { case 0: break; case 1: r = 3.1415926535897932384626433832795 - r; break; //pi-r case 2: r = -r; break; case 3: r = -3.1415926535897932384626433832795 + r; break; //-pi+r case 4: r = 1.5707963267948966192313216916398 - r; break; //pi/2-r case 5: r = 1.5707963267948966192313216916398 + r; break; case 6: r = -1.5707963267948966192313216916398 + r; break; case 7: r = -1.5707963267948966192313216916398 - r; break; } return r; } static public function initArcTanTable():void { var tableSize:int = 8192; var ooTableSize:Number = 1.0 / Number(tableSize); for(var i:int = 0;i<=tableSize;i++) { fastmem.fastSetFloat( Math.atan(Number(i)*ooTableSize),i<<2); } }
We can rewrite the function above to limit the number of if/branches:
static public function arcTan2_v2(y:Number,x:Number):Number { if (x < 0) { if (y < 0) { if (y < x) { return -1.5707963267948966192313216916398 - fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return -3.1415926535897932384626433832795 + fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } else { x = -x; if (y > x) { return 1.5707963267948966192313216916398 + fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return 3.1415926535897932384626433832795 - fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } } else { if (y < 0) { if ((y=-y) > x) { return -1.5707963267948966192313216916398 + fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return -fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } else { if (y > x) { return 1.5707963267948966192313216916398 - fastmem.fastGetFloat( (x / y * 32768) & 0x0ffffffc); } } } return fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); }
now, we can try something different by using the double angles formulations:
x = tan(a+b) = (tan(a)+tan(b)) / (1-tan(a)tan(b))
z = tan(a) => atan(z) = a
k = tan(b) => atan(k) = b
x = (z+k) / (1-zk)
=>
z = (x-k) / (1+kx)
atan(x) = b + atan(x-k / 1+kx)
we can use it :
-remaps inputs x to a smaller z range
-we can subdivide 0..1 into multiple segments with their b offset and k parameter.
so using those ideas, here a new implementation :
static public function arcTan2_v3(y:Number,x:Number):Number { if(y>0) { if (x >= 0) return 0.78539816339744830961566084581988 - 0.78539816339744830961566084581988 * (x - y) / (x + y); else return 2.3561944901923449288469825374596 - 0.78539816339744830961566084581988 * (x + y) / (y - x); } else { if (x >= 0) return -0.78539816339744830961566084581988 + 0.78539816339744830961566084581988 * (x + y) / (x - y); } return -2.3561944901923449288469825374596 - 0.78539816339744830961566084581988 * (x - y) / (y + x); }
here a "branch-less" version of the above:
static public function arcTan2_v4(y:Number,x:Number):Number { var sign:Number = 1.0 - (int(y < 0.0) << 1); //[-1,1] var absYandR:Number = y * sign + 2.220446049250313e-16; var partSignX:Number = (int(x < 0.0) << 1); // [0,2] var signX:Number = 1.0 - partSignX; // [1.0/-1.0] absYandR = (x - signX * absYandR) / (signX * x + absYandR); return ((partSignX + 1.0) * 0.7853981634 + (0.1821 * absYandR * absYandR - 0.9675) * absYandR) * sign; }
Last, we can rewrite v3 version by saving in a look-up table the k and b values based on the signs of x and y.
We can extract the sign of x and y using alchemy tricks, and write the absolute value at the same time:
//lut k1 & k2 , to be pre computed: var k1:Vector.<Number> = new Vector.<Number>(); k1.push(0.78539816339744830961566084581988); k1.push(2.3561944901923449288469825374596); k1.push(-0.78539816339744830961566084581988); k1.push(-2.3561944901923449288469825374596); var k2:Vector.<Number> = new Vector.<Number>(); k2.push(-0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push(-0.78539816339744830961566084581988);
and the new atan2:
static public function arcTan2_v5(y:Number,x:Number,k1:Vector.<Number>,k2:Vector.<Number>):Number { var xIsNegative:int; var yIsNegative:int; fastmem.fastSetFloat(x,0); fastmem.fastSetByte((xIsNegative=fastmem.fastGetByte(3))&0x7f,3); x = fastmem.fastGetFloat(0); fastmem.fastSetFloat(y,0); fastmem.fastSetByte((yIsNegative=fastmem.fastGetByte(3))&0x7f,3); y = fastmem.fastGetFloat(0); return (x - y) / (x + y) * k2[(xIsNegative = (((xIsNegative & 0x80) >> 7)|((yIsNegative & 0x80)>>6)))] + k1[xIsNegative]; }
How accurate are our versions ?
How fast is it ?
**UPDATED** so here what was my initial profiling results:
But I had a bug in the profiling code, which I now fixed, so here the updated results:
okay ... nothing beats Math.atan2, any ideas to improve the code ?
** UPDATED** , My winner is V3 !!!
What do you get ?
What do you get?
Here the full code tested:
static public function initArcTanTable():void { var tableSize:int = 8192; var ooTableSize:Number = 1.0 / Number(tableSize); for(var i:int = 0;i<=8192;i++) { fastmem.fastSetFloat( Math.atan(Number(i)*ooTableSize),i<<2); } } static public function arcTan2_v1(y:Number,x:Number):Number { var octant:int = 0; var r:Number; if (x < 0) { octant = 1; x = -x; } if (y < 0) { octant |= 2; y = -y; } if (y > x) { octant |= 4; r = x / y; } else r = y / x; r = fastmem.fastGetFloat( (r * 32768) & 0x0ffffffc); switch (octant) { case 0: break; case 1: r = 3.1415926535897932384626433832795 - r; break; //pi-r case 2: r = -r; break; case 3: r = -3.1415926535897932384626433832795 + r; break; //-pi+r case 4: r = 1.5707963267948966192313216916398 - r; break; //pi/2-r case 5: r = 1.5707963267948966192313216916398 + r; break; case 6: r = -1.5707963267948966192313216916398 + r; break; case 7: r = -1.5707963267948966192313216916398 - r; break; } return r; } static public function arcTan2_v2(y:Number,x:Number):Number { if (x < 0) { if (y < 0) { if (y < x) { return -1.5707963267948966192313216916398 - fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return -3.1415926535897932384626433832795 + fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } else { x = -x; if (y > x) { return 1.5707963267948966192313216916398 + fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return 3.1415926535897932384626433832795 - fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } } else { if (y < 0) { if ((y=-y) > x) { return -1.5707963267948966192313216916398 + fastmem.fastGetFloat( (x/y * 32768) & 0x0ffffffc); } else { return -fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } } else { if (y > x) { return 1.5707963267948966192313216916398 - fastmem.fastGetFloat( (x / y * 32768) & 0x0ffffffc); } } } return fastmem.fastGetFloat( (y/x * 32768) & 0x0ffffffc); } static public function arcTan2_v3(y:Number,x:Number):Number { if(y>0) { if (x >= 0) return 0.78539816339744830961566084581988 - 0.78539816339744830961566084581988 * (x - y) / (x + y); else return 2.3561944901923449288469825374596 - 0.78539816339744830961566084581988 * (x + y) / (y - x); } else { if (x >= 0) return -0.78539816339744830961566084581988 + 0.78539816339744830961566084581988 * (x + y) / (x - y); } return -2.3561944901923449288469825374596 - 0.78539816339744830961566084581988 * (x - y) / (y + x); } static public function arcTan2_v4(y:Number,x:Number):Number { var sign:Number = 1.0 - (int(y < 0.0) << 1); //[-1,1] var absYandR:Number = y * sign + 2.220446049250313e-16; var partSignX:Number = (int(x < 0.0) << 1); // [0,2] var signX:Number = 1.0 - partSignX; // [1.0/-1.0] absYandR = (x - signX * absYandR) / (signX * x + absYandR); return ((partSignX + 1.0) * 0.7853981634 + (0.1821 * absYandR * absYandR - 0.9675) * absYandR) * sign; } static public function init_arcTan2_v5(var k1:Vector.<Number>,var k2:Vector.<Number>):void { //k1 & k2 , to be pre initialized k1.push(0.78539816339744830961566084581988); k1.push(2.3561944901923449288469825374596); k1.push(-0.78539816339744830961566084581988); k1.push(-2.3561944901923449288469825374596); k2.push(-0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push(-0.78539816339744830961566084581988); } static public function arcTan2_v5(y:Number,x:Number ,var k1:Vector.<Number>,var k2:Vector.<Number>):Number { var xIsNegative:int; var yIsNegative:int; fastmem.fastSetFloat(x,0); fastmem.fastSetByte((xIsNegative=fastmem.fastGetByte(3))&0x7f,3); x = fastmem.fastGetFloat(0); fastmem.fastSetFloat(y,0); fastmem.fastSetByte((yIsNegative=fastmem.fastGetByte(3))&0x7f,3); y = fastmem.fastGetFloat(0); return (x - y) / (x + y) * k2[(xIsNegative = (((xIsNegative & 0x80) >> 7)|((yIsNegative & 0x80)>>6)))] + k1[xIsNegative]; }
and more here:
protected function mathAtan2Test():void { var i:Number,j:Number; var val:Number; const REPS:Number = 14; const INC:Number = 0.01; var t:int; TestPerf.initArcTanTable(); var calibrateTime:Number = AccurateTimer.calibrateTimer(); t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { val += Math.atan2(i,j); } } var time0:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { val += TestPerf.arcTan2_v1(i,j); } } var time1:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { val += TestPerf.arcTan2_v2(i,j); } } var time2:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { val += TestPerf.arcTan2_v3(i,j); } } var time3:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { val += TestPerf.arcTan2_v4(i,j); } } var time4:Number = AccurateTimer.endTimer(t,calibrateTime); var k1:Vector.<Number> = new Vector.<Number>(); k1.push( 0.78539816339744830961566084581988); k1.push( 2.3561944901923449288469825374596); k1.push(-0.78539816339744830961566084581988); k1.push(-2.3561944901923449288469825374596); var k2:Vector.<Number> = new Vector.<Number>(); k2.push(-0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push( 0.78539816339744830961566084581988); k2.push(-0.78539816339744830961566084581988); var xIsNegative:int; var yIsNegative:int; var x:Number; var y:Number; t = AccurateTimer.startTimer(); for(i=-REPS;i<REPS;i+=INC) { for(j=-REPS;j<REPS;j+=INC) { fastmem.fastSetFloat(j,0); fastmem.fastSetByte((xIsNegative=fastmem.fastGetByte(3))&0x7f,3); x = fastmem.fastGetFloat(0); fastmem.fastSetFloat(i,0); fastmem.fastSetByte((yIsNegative=fastmem.fastGetByte(3))&0x7f,3); y = fastmem.fastGetFloat(0); val += (x - y) / (x + y) * k2[(xIsNegative = (((xIsNegative & 0x80) >> 7)|((yIsNegative & 0x80)>>6)))] + k1[xIsNegative]; } } var time5:Number = AccurateTimer.endTimer(t,calibrateTime); startTest(); setTest( "Math.atan2" , "atan2", time0); setTest( "atan2 v1" , "atan2", time1); setTest( "atan2 v2" , "atan2", time2); setTest( "atan2 v3" , "atan2", time3); setTest( "atan2 v4" , "atan2", time4); setTest( "atan2 v5" , "atan2", time5); endTest(); TestPerf.initArcTanTable(); visibleitems.addElement(curve); resetCurve(); drawCurve("Math.atan2(1,x)",atan2_v0,-10,10,0.5); drawCurve("atan2_v1(1,x)",atan2_v1,-10,10,0.5); drawCurve("atan2_v2(1,x)",atan2_v2,-10,10,0.5); drawCurve("atan2_v3(1,x)",atan2_v3,-10,10,0.5); drawCurve("atan2_v4(1,x)",atan2_v4,-10,10,0.5); drawCurve("atan2_v5(1,x)",atan2_v5,-10,10,0.5); } public function atan2_v0(v:Number):Number { return Math.atan2(-1,v); } public function atan2_v1(v:Number):Number { return TestPerf.arcTan2_v1(-1,v); } public function atan2_v2(v:Number):Number { return TestPerf.arcTan2_v2(-1,v); } public function atan2_v3(v:Number):Number { return TestPerf.arcTan2_v3(-1,v); } public function atan2_v4(v:Number):Number { return TestPerf.arcTan2_v4(-1,v); } public function atan2_v5(v:Number):Number { return TestPerf.arcTan2_v5(-1,v); }
I haven’t, but you might want to look at a libc implementation to see how it’s implemented there. Perhaps it applies to AS3.
I updated the post, I found a bug in what I was profiling … now all versions presented are faster !!!
v2 is my winner.