In a previous post, (here), I presented a fast way to compute 2^x, what about a^b , or Math.pow(a,b), can we create an alternate faster method ?
Math.pow(a,b) = a^b = e^(log(a^b)) = e^(log(a)*b)
we have a fast version of Math.log and a fast Math.exp (here)
We can combine both functions to get our Math.pow(a,b):
1) Math.log
// Math.log fastmem.fastSetDouble(v,0); var log:Number = (fastmem.fastGetI32(4)-1072632447.000000000000000001)*6.61036836277701574920262431624e-7;
2) Math.exp
//Math exp fastmem.fastSetDouble(((p*1.4426950408889634073599246810019 + 126.94269504)*8388608/*(1 << 23)*/)+6755399441055744.0,0); var exp:Number = fastmem.fastGetFloat(0);
now lets compute exp(log(a)*b)
static public function powab( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); var lna:Number = (fastmem.fastGetI32(4)-1072632447.000000000000000001)*6.61036836277701574920262431624e-7; fastmem.fastSetDouble(((lna*b*1.4426950408889634073599246810019 + 126.94269504)*8388608/*(1 << 23)*/)+6755399441055744.0,0); return fastmem.fastGetFloat(0); }
we can inline lna:Number directly and avoid the local variable:
static public function powab_2( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble((((fastmem.fastGetI32(4)-1072632447.000000000000000001)*9.5367456554276968310550126820042e-7*b + 126.94269504)*8388608/*(1 << 23)*/)+6755399441055744.0,0); return fastmem.fastGetFloat(0); }
finally, we can develop and re-factor some of the multiplications and additions with the constant numbers:
static public function pow_ab_v0( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble( ((fastmem.fastGetI32(4)-1072632447.000000000000000001)*8.0000020899086021058562727824362*b + 1064872507.15410432)+6755399441055744.0,0); return fastmem.fastGetFloat(0); }
Here another version from martin ankerl which I adapted to as3
static public function pow_ab_initialize():void { fastmem.fastSetI32(0,8); } static public function pow_ab_v1( a:Number , b:Number ):Number { //ttp://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/ /* union { double d; int x[2]; } u = { a }; u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447); u.x[0] = 0; return u.d; */ fastmem.fastSetDouble(a,0); fastmem.fastSetI32(b * (fastmem.fastGetI32(4) - 1072632447.000000000000000000001) + 1072632447.0000000000000000001,12); return fastmem.fastGetDouble(8); }
And as we have a number to int conversion, here with an additional optimization using Sree FPU fast Number to int conversion:
static public function pow_ab_v2( a:Number , b:Number ):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble((b * (fastmem.fastGetI32(4) - 1072632447.0000000000000001) + 1072632447.0000000000000000001)+6755399441055744.0,12); return fastmem.fastGetDouble(8); }
Results
how fast are our 3 versions compared to Math.pow(a,b)?
okay, slower …
I had a bug in the profiled code, so here updated :
MUCH FASTER !!!!
What if
now, what if you need for example to compute lots of pow(10,b) , so the “a” parameter (a=10 here) is fixed ?
that means we can pre compute ln(a), and we are more or less back to the exp, this gives:
static public function lna_init(a:Number):Number { fastmem.fastSetDouble(a,0); return (fastmem.fastGetI32(4) - 1072632447); } static public function pow_lna_b_v1( lna:Number , b:Number ):Number { fastmem.fastSetI32(b * lna + 1072632447,12); return fastmem.fastGetDouble(8); } static public function pow_lna_b_v2( lna:Number , b:Number ):Number { fastmem.fastSetDouble((b * lna + 1072632447)+6755399441055744.0,12); return fastmem.fastGetDouble(8); } static public function lna_v0_init(a:Number):Number { fastmem.fastSetDouble(a,0); return (fastmem.fastGetI32(4)- 1072632447.000000000000000000000000001)*8.0000020899086021058562727824361; } static public function pow_lna_b_v0( lna_v0:Number , b:Number ):Number { fastmem.fastSetDouble((b*lna_v0+ 1064872507.15410432)+6755399441055744.0,0); return fastmem.fastGetFloat(0); } static public function lna_v3_init(a:Number):Number { return Math.log(a)/Math.LN2; } static public function pow_lna_b_v3( lnaOOLn2:Number , p:Number ):Number { //2^x = 2^(i+f) = 2^i * 2^f , i is integer part , f is fractional part, with x=i+f //2^i is (1<<i), //2^f use (1+x*ln(2)/n)^n = 2^x with n=8 bellow: p *= lnaOOLn2; var i:int = p; //i is the integer part of p p = (p-i) * 0.00270760617406228636491106297445+1; //LN2/256 = 0.00270760617406228636491106297445 , (p-i) is the fractional part p *=p; p *=p; p *=p; p *=p; p *=p; p *=p; p *=p; return p*p*(1<<i); }
What are the results ?
faster!
What about accuracy ?
what do you get ?
full code:
static public function pow_ab_initialize():void { fastmem.fastSetI32(0,8); } static public function pow_ab_v1( a:Number , b:Number ):Number { //ttp://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/ /* union { double d; int x[2]; } u = { a }; u.x[1] = (int)(b * (u.x[1] - 1072632447) + 1072632447); u.x[0] = 0; return u.d; */ fastmem.fastSetDouble(a,0); fastmem.fastSetI32(b * (fastmem.fastGetI32(4) - 1072632447.000000000000000000001) + 1072632447.0000000000000000001,12); return fastmem.fastGetDouble(8); } static public function powab_1( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); var lna:Number = (fastmem.fastGetI32(4)-1072632447.000000000000000001)*6.61036836277701574920262431624e-7; fastmem.fastSetDouble(((lna*b*1.4426950408889634073599246810019 + 126.94269504)*8388608/*(1 << 23)*/)+6755399441055744.0,0); return fastmem.fastGetFloat(0); } static public function powab_2( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble((((fastmem.fastGetI32(4)-1072632447.000000000000000001)*9.5367456554276968310550126820042e-7*b + 126.94269504)*8388608/*(1 << 23)*/)+6755399441055744.0,0); return fastmem.fastGetFloat(0); } static public function pow_ab_v0( a:Number , b:Number):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble( ((fastmem.fastGetI32(4)-1072632447.000000000000000001)*8.0000020899086021058562727824362*b + 1064872507.15410432)+6755399441055744.0,0); return fastmem.fastGetFloat(0); } static public function pow_ab_v2( a:Number , b:Number ):Number { fastmem.fastSetDouble(a,0); fastmem.fastSetDouble((b * (fastmem.fastGetI32(4) - 1072632447.0000000000000001) + 1072632447.0000000000000000001)+6755399441055744.0,12); return fastmem.fastGetDouble(8); } static public function lna_init(a:Number):Number { fastmem.fastSetDouble(a,0); return (fastmem.fastGetI32(4) - 1072632447); } static public function pow_lna_b_v1( lna:Number , b:Number ):Number { fastmem.fastSetI32(b * lna + 1072632447,12); return fastmem.fastGetDouble(8); } static public function pow_lna_b_v2( lna:Number , b:Number ):Number { fastmem.fastSetDouble((b * lna + 1072632447)+6755399441055744.0,12); return fastmem.fastGetDouble(8); } static public function lna_v0_init(a:Number):Number { fastmem.fastSetDouble(a,0); return (fastmem.fastGetI32(4)- 1072632447.000000000000000000000000001)*8.0000020899086021058562727824361; } static public function pow_lna_b_v0( lna_v0:Number , b:Number ):Number { fastmem.fastSetDouble((b*lna_v0+ 1064872507.15410432)+6755399441055744.0,0); return fastmem.fastGetFloat(0); } static public function lna_v3_init(a:Number):Number { return Math.log(a)/Math.LN2; } static public function pow_lna_b_v3( lnaOOLn2:Number , p:Number ):Number { //2^x = 2^(i+f) = 2^i * 2^f , i is integer part , f is fractional part, with x=i+f //2^i is (1<<i), //2^f use (1+x*ln(2)/n)^n = 2^x with n=8 bellow: p *= lnaOOLn2; var i:int = p; //i is the integer part of p p = (p-i) * 0.00270760617406228636491106297445+1; //LN2/256 = 0.00270760617406228636491106297445 , (p-i) is the fractional part p *=p; p *=p; p *=p; p *=p; p *=p; p *=p; p *=p; return p*p*(1<<i); } static public function log2_v1( v:Number ):Number { fastmem.fastSetFloat(v,0); return (fastmem.fastGetI32(0)*1.1920928955078125e-7)-126.94269504; // = (i>>23) = i/8388608 = i*1.1920928955078125e-7 } static public function log2_v2( v:Number ):Number { fastmem.fastSetDouble(v,0); return (fastmem.fastGetI32(4)- 1072632447.000000000000000000000000001)* 9.5367456554276968310550126820041e-7; //1/1512775 = } static public function log_v1( v:Number ):Number { fastmem.fastSetFloat(v*0.69314718055994530941723212145818,0); //v*ln(2) return (fastmem.fastGetI32(0)*1.1920928955078125e-7)-126.94269504; // = (i>>23) = i/8388608 = i*1.1920928955078125e-7 } static public function log_v2( v:Number ):Number { fastmem.fastSetFloat(v,0); //8.2629582881927490e-8 = 1.1920928955078125e-7 * ln2 //87.9899711596 = 126.94269504*ln2 return (fastmem.fastGetI32(0)*8.2629582881927490e-8)-87.989971159656953852430137315158; } static public function log_v3( v:Number ):Number { fastmem.fastSetDouble(v,0); return (fastmem.fastGetI32(4)- 1072632447.000000000000000001)* 6.61036836277701574920262431624e-7; //1/1512775 = } private function powabTest():void { TestPerf.pow_ab_initialize(); var INC:Number = 0.01; var REPS:Number = 20; var i:Number=0,j:Number = 0; var powab:Number; var calibrateTime:Number = AccurateTimer.calibrateTimer(); var t:int = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += Math.pow(i,j); } var timeRef:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_ab_v0(i,j); } var time0:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_ab_v1(i,j); } var time1:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_ab_v2(i,j); } var time2:Number = AccurateTimer.endTimer(t,calibrateTime); t= AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += Math.pow(10,j); } var time3:Number = AccurateTimer.endTimer(t,calibrateTime); var lna10:Number = TestPerf.lna_init(10); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_lna_b_v1(lna10,i); } var time4:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_lna_b_v2(lna10,i); } var time5:Number = AccurateTimer.endTimer(t,calibrateTime); var lna10ooLn2:Number = TestPerf.lna_v3_init(10); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_lna_b_v3(lna10ooLn2,i); } var time6:Number = AccurateTimer.endTimer(t,calibrateTime); var lna10_v0:Number = TestPerf.lna_v0_init(10); t = AccurateTimer.startTimer(); for(i=0;i<REPS;i+=INC) for(j=0;j<REPS;j+=INC) { powab += TestPerf.pow_lna_b_v0(lna10_v0,i); } var time7:Number = AccurateTimer.endTimer(t,calibrateTime); startTest(); setTest( "Math.pow" , "pow", timeRef ); setTest( "pow v0" , "pow", time0 ); setTest( "pow v1" , "pow", time1 ); setTest( "pow v2" , "pow", time2 ); setTest( "Math.pow10" , "pow10", time3 ); setTest( "pow10 v0" , "pow10", time7 ); setTest( "pow10 v1" , "pow10", time4 ); setTest( "pow10 v2" , "pow10", time5 ); setTest( "pow10 v3" , "pow10", time6 ); endTest(); //show curve: visibleitems.addElement(curve); resetCurve(); drawCurve("Math.pow(10,i)",math_pow10,0,9,0.002); drawCurve("Math.pow v0",powabv0_10,0,9,0.002); drawCurve("Math.pow v1",powabv1_10,0,9,0.002); drawCurve("Math.pow v2",powabv2_10,0,9,0.002); drawCurve("Math.pow v0_lna",powlnabv0_10,0,9,0.002); drawCurve("Math.pow v1_lna",powlnabv1_10,0,9,0.002); drawCurve("Math.pow v2_lna",powlnabv2_10,0,9,0.002); drawCurve("Math.pow v3_lna",powlnabv3_10,0,9,0.002); } private function math_pow10( v:Number):Number { return Math.pow(10,v); } private function powabv0_10( v:Number):Number { return TestPerf.pow_ab_v0(10,v); } private function powabv1_10( v:Number):Number { return TestPerf.pow_ab_v1(10,v); } private function powabv2_10( v:Number):Number { return TestPerf.pow_ab_v2(10,v); } private function powlnabv0_10( v:Number):Number { var lna10:Number = TestPerf.lna_v0_init(10); return TestPerf.pow_lna_b_v0(lna10,v); } private function powlnabv1_10( v:Number):Number { var lna10:Number = TestPerf.lna_init(10); return TestPerf.pow_lna_b_v1(lna10,v); } private function powlnabv2_10( v:Number):Number { var lna10:Number = TestPerf.lna_init(10); return TestPerf.pow_lna_b_v2(lna10,v); } private function powlnabv3_10( v:Number):Number { var lna10:Number = TestPerf.lna_v3_init(10); return TestPerf.pow_lna_b_v3(lna10,v); }
On Windows 7 with an Intel Xeon W550 I’m seeing Math.pow equal the “v2” version and and narrowly beat out “v3” in the 10 version. I’m still interested in seeing how a lookup table would fare. 🙂
Jackson, I updated the test, so please try again,
Here what I changed: doing a loop:
for(var i:int=0;i<1000;i++) val = Math.pow(i,i);
is not a good test, I think the flash player has some code that detect that the local variable is assigned for nothing, and I suspect the Math code to by bypassed.
instead, I changed the test to do that:
for(var i:int=0;i<1000;i++) val += Math.pow(i,i);
notice the += … and I updated the results and the swf in the post !