Math.sin is slow, we have many options to optimize it, a good post from Jackson dunstan “Even Faster Trig Through Inlining” show lots of methods (http://jacksondunstan.com/articles/1213)
But there is a major problem with all the proposed methods : it only works for a certain range of angle, and if you want to use those functions with any values,
you need to first compute the modulo angle , (angle % (2*PI)) which is pretty slow, so today I am going to explore better and faster solutions !
Taylor serie
so lets start simple, Taylor series:
sin x = x - x^3/3! + x^5/5! - x^7/7!...
we can refactor that code:
sin x= x ( 1- x^2/3! + x^4/5! - x^6/7!...)
we can further do the re-factoring by computing x^2:
x2 = x*x
sin x= x ( 1 + x2* (-1/3! + x2 * (1/5! - x2* (-1/7! ...)
what about the modulo?
the period of the sin function is 2PI.
for our functions, after the modulo, we want values from [-PI;PI], which makes the modulo computation to take even more time:
//modulo 2PI angle %= 6.283185307179586476925286766559; //-PI PI range if(angle>PI) angle-=2*PI else if(angle<-PI) angle +=2*PI;
this can be optimized by first offsetting the values by PI, so we only need to test for values < -PI:
//always wrap input angle to -PI..PI angle += 3.14159265; angle %= 6.283185307179586476925286766559; angle -= 3.14159265; if (angle < -3.14159265) angle += 6.28318531;
and the full code:
static public function sinTaylor7(rad:Number): Number { rad += 3.14159265; rad %= 6.283185307179586476925286766559; rad -= 3.14159265; //always wrap input angle to -PI..PI if (rad < -3.14159265) rad += 6.28318531; const p2:Number = rad * rad; return rad + p2 * rad * ( -0.16666666666 + p2 * ( 0.008333333333333333 - p2 * 0.0001984126984126984)); }
hummm, can we do better ?
If we divide the angle by 2PI, the period becomes 1:
angle = angle / (2PI)
or:
angle01 = angle * 0.15915494309
and to compute the modulo, we just need to get the fractional part:
angle01 = angl01 - int(angle01);
so we have:
angle*= 0.15915494309189533576888376337251; angle-= (angle>>0); if(angle>0.5) angle-=1.0; else if(angle<-0.5) angle+=1.0; //then we might want to go back to normal range: //angle *= 2*PI;
now, back to the taylor serie, we need to re-multiply by 2PI, but we can re-integrate directly the 2PI value in the taylor serie.
and the result is:
x2 = x*x*((2PI)^2)
sin x= x*2PI ( 1 + x2* (-1/3! + x2 * (1/5! - x2* (-1/7! ...)
or better, further develop and refactor, pre compute 1/3! , 1/5! ...
static public function sinTaylor7_2(rad:Number): Number { rad*= 0.15915494309189533576888376337251; rad-= int(rad); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); }
I also wrote a post before on how to optimize the int conversion by using alchemy opcode tricks:
static public function sinTaylor7_Normalized(rad:Number): Number { rad*= 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); }
If we want to further develop taylor series to degree 9 for bigger precision, we get:
static public function sinTaylor9(rad:Number): Number { rad*= 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 + p2 * ( -76.7058597530 + p2* 42.058693944897653144986811111526)))); }
Sin LUT
What about Lookup Table solutions ?
The quickest way to compute a modulo is for power of 2 values:
modval = val & (pow2mod-1);
So if we build a table of lets say 2048 values, the index in our lut is:
angleLUT = int(angle * (2048/2PI))
and with the modulo
angleLUT = angleLUT & 2047;
so lets first initialize our LUT table:
for(var luti:int=0;luti<2048;luti++) { lut[luti&2047] = Math.sin(luti*0.00306796157577128245943617517898); //0.003067 = 2PI/2048 }
and to get the sin, we just need to do:
sinangle = lut[(a*325.94932345220164765467394738691)&2047]; //325.949 is 2048/2PI
We can also store our lookup table in alchemy byteArray as float or as double, for our test, lets do both:
for(var luti:int=0;luti<2048;luti++) { lut[luti&2047] = Math.sin(luti*0.00306796157577128245943617517898); fastmem.fastSetFloat(lut[luti&2047],((luti&2047)<<2)+1024); fastmem.fastSetDouble(lut[luti&2047],((luti&2047)<<3)+10000); //trace("sin:",luti,luti*360/6.2831853071795864769,lut[luti&2047]); }
and we can once again use our int conversion trick:
for double:
fastmem.fastSetDouble((a*325.94932345220164765467394738691)+6755399441055744.0,0) //325.949 = 2048/2PI sinangle = fastmem.fastGetDouble(((fastmem.fastGetI32(0)&2047)<<3)+10000); //doubles are 8bytes , so we shift by <<3, and re-add the hardcoded offset of 10000 of the LUT
for float:
fastmem.fastSetDouble((a*325.94932345220164765467394738691)+6755399441055744.0,0) //325.949 = 2048/2PI sinangle = fastmem.fastGetFloat(((fastmem.fastGetI32(0)&2047)<<2)+1024); //floats are 4 bytes, so we shift by <<2, and re-add the hardcoded offset of 1024 of the LUT
other solutions?
this was the recommended solution in Jackson post, here with the modulo
static public function sin2(theta:Number):Number { theta += 3.14159265; theta %= 6.283185307179586476925286766559; theta -= 3.14159265; //always wrap input angle to -PI..PI if (theta < -3.14159265) theta += 6.28318531; if (theta < 0) theta *= 1.27323954 + .405284735 * theta; else theta *= 1.27323954 - 0.405284735 * theta; if (theta < 0) theta *= -0.225 * theta + 0.775; else theta *= 0.225 * theta + 0.775; return theta; }
and here an alternate solution using the [0-1] range to work with any values:
static public function sin3(angle:Number):Number { var ax:Number = 0.15915494309189533576888376337251*angle; //angle*1/(2PI) var x:Number = ax - (ax>>0); var s:Number; if(ax<0) x += 1.0; if(x >= .5) { x = 2.0*x - 1.5; s = -3.6419789056581784278305460054775;//4.0/(1.048*1.048); } else { x = 2.0*x - 0.5; s = 3.6419789056581784278305460054775;//4.0/(1.048*1.048); } x*=x; return s*(x - .25)*(x - 1.098304);//1.048*1.048 }
Profiling
moment of truth!!!
and the winner is ... LUT !
the full code:
private function sinTest():void { //show curve: visibleitems.addElement(curve); initSinLut(); resetCurve(); drawCurve("Math.sin",Math.sin,-18,18,0.15); drawCurve("sinTaylor7N",sinTaylor7_Normalized,-18,18,0.15); //drawCurve("sinTaylor7",sinTaylor7,-18,18,0.15); drawCurve("sinTaylor9",sinTaylor9,-18,18,0.15); drawCurve("sin2",sin2,-18,18,0.15); drawCurve("sin3",sin3,-18,18,0.15); drawCurve("sinLutAlchemyDouble",sinLutAlchemyDouble,-18,18,0.15); drawCurve("sinLUT",sinLut,-18,18,0.15); var i:int; var angle:Number; var angleMin:Number = -140; var angleMax:Number = 140; var angleInc:Number = 0.1; var a:Number = 0; var rad:Number; var p2:Number; const REPS:int = 3000; var time1:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += Math.sin(angle); } } var time2:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { //a = TestPerf.sinTaylor7_Normalized(angle); rad= angle * 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; p2 = rad * rad; a += rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); } } var time3:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { //a = TestPerf.sinTaylor9(angle); rad= angle * 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; p2 = rad * rad; a += rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 + p2 * ( -76.7058597530 + p2* 42.058693944897653144986811111526)))); } } var time4:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += TestPerf.sin2(angle); } } var time5:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += TestPerf.sin3(angle); } } var time6:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { //a = TestPerf.sinTaylor7(angle); rad = angle + 3.14159265; rad %= 6.283185307179586476925286766559; rad -= 3.14159265; //always wrap input angle to -PI..PI if (rad < -3.14159265) rad += 6.28318531; p2 = rad * rad; a += rad + p2 * rad * ( -0.16666666666 + p2 * ( 0.008333333333333333 - p2 * 0.0001984126984126984)); } } var time7:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { //a = TestPerf.sinTaylor7_2(angle); rad= angle * 0.15915494309189533576888376337251; rad-= (rad>>0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; p2 = rad * rad; a += rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); } } var time8:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += lut[(angle*325.94932345220164765467394738691)&2047] } } var time9:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += fastmem.fastGetFloat((((angle*325.94932345220164765467394738691)&2047)<<2)+1024); } } var time10:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { a += fastmem.fastGetDouble((((angle*325.94932345220164765467394738691)&2047)<<3)+10000); } } var time11:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { fastmem.fastSetDouble((angle*325.94932345220164765467394738691)+6755399441055744.0,0) a += fastmem.fastGetDouble(((fastmem.fastGetI32(0)&2047)<<3)+10000); } } var time12:uint = getTimer(); for(i=0;i<REPS;i++) { for(angle=angleMin;angle<angleMax;angle+=angleInc) { fastmem.fastSetDouble((angle*325.94932345220164765467394738691)+6755399441055744.0,0) a += fastmem.fastGetFloat(((fastmem.fastGetI32(0)&2047)<<2)+1024); } } var time13:uint = getTimer(); startTest(); setTest( "Math.sin" , "sin", time2-time1 ); setTest( "sinTaylor7_Normalized" , "sin", time3-time2 ); setTest( "sinTaylor7" , "sin", time7-time6 ); setTest( "sinTaylor7_2" , "sin", time8-time7 ); setTest( "sinTaylor9" , "sin", time4-time3 ); setTest( "sin2" , "sin", time5-time4 ); setTest( "sin3" , "sin", time6-time5 ); setTest( "sinLUTVector" , "sin", time9-time8 ); setTest( "sinLUTAlchemyFloat" , "sin", time10-time9 ); setTest( "sinLUTAlchemyFloat2" , "sin", time13-time12 ); setTest( "sinLUTAlchemyDouble" , "sin", time11-time10 ); setTest( "sinLUTAlchemyDouble2" , "sin", time12-time11 ); endTest(); }
static public function sin2(theta:Number):Number { theta += 3.14159265; theta %= 6.283185307179586476925286766559; theta -= 3.14159265; //always wrap input angle to -PI..PI if (theta < -3.14159265) theta += 6.28318531; if (theta < 0) theta *= 1.27323954 + .405284735 * theta; else theta *= 1.27323954 - 0.405284735 * theta; if (theta < 0) theta *= -0.225 * theta + 0.775; else theta *= 0.225 * theta + 0.775; return theta; } static public function sinTaylor7(rad:Number): Number { rad += 3.14159265; rad %= 6.283185307179586476925286766559; rad -= 3.14159265; //always wrap input angle to -PI..PI if (rad < -3.14159265) rad += 6.28318531; const p2:Number = rad * rad; return rad + p2 * rad * ( -0.16666666666 + p2 * ( 0.008333333333333333 - p2 * 0.0001984126984126984)); } static public function sinTaylor7_2(rad:Number): Number { rad*= 0.15915494309189533576888376337251; rad-= (rad>>0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); } //fastmem.fastSetDouble(val+6755399441055744.0,0); //fastmem.fastGetI32(0); static public function sinTaylor7_Normalized(rad:Number): Number { rad*= 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 - p2 * 76.7058597530))); } static public function sinTaylor9(rad:Number): Number { rad*= 0.15915494309189533576888376337251; fastmem.fastSetDouble(rad+6755399441055744.0,0) rad-= fastmem.fastGetI32(0); if(rad>0.5) rad-=1.0; else if(rad<-0.5) rad+=1.0; const p2:Number = rad * rad; return rad* (6.2831853071795864769 + p2 * ( -41.341702240399760 + p2 * ( 81.60524927 + p2 * ( -76.7058597530 + p2* 42.058693944897653144986811111526)))); } /** optimized sin cos function */ static public function sin3(angle:Number):Number { var ax:Number = 0.15915494309189533576888376337251*angle; //angle*1/(2PI) var x:Number = ax - (ax>>0); var s:Number; if(ax<0) x += 1.0; if(x >= .5) { x = 2.0*x - 1.5; s = -3.6419789056581784278305460054775;//4.0/(1.048*1.048); } else { x = 2.0*x - 0.5; s = 3.6419789056581784278305460054775;//4.0/(1.048*1.048); } x*=x; return s*(x - .25)*(x - 1.098304);//1.048*1.048 /* ax += 0.25; x = ax - int(ax); if(ax < 0.0) x += 1.0; if(x >= .5) { x = 2.0*x - 1.5; c = -B_FACT; } else { x = 2.0*x - 0.5; c = B_FACT; } x*=x; c *= (x - .25)*(x - A_FACT); */ }
LUT Vector is fastest for me by a little on Windows 7 64-bit, 2.67 GHz Intel Xeon X550, Flash Player 11.8. It’s nice to not have to use any assembly or “alchemy” opcodes. 🙂
looks like the Vector got surprisingly faster with recent (last?) flash update !