The rounded power of 2 value of an integer is:
var powerof2:int = 1<<Log2(n)
In the previous post, we have seen few methods to fast compute the Log2.
So we can modify our fastest way to compute the log2 to get the rounded power of 2:
public static function roundPowerOf2_v0( v:int ):int { fastmem.fastSetFloat(v-1,0); return ((v-3)>>>31) ? v : 1<<((fastmem.fastGetI32(0) >>> 23)-126); }
But lets see if we can do it faster !
So the basic idea here is to fill all bits on the right up to the last bit set. we do that by 5 masking/set bit twiddling.
Then at the end we increment by one to get the next power of 2.
And because we are looking for the rounded power of 2 number, by doing that, if the entry value is a power of 2, we get the next power of 2 value.
To fix that, we simply need to decrement the value v–; at the beginning.
public static function roundPowerOf2_v1( v:int ):int { v--; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return v+1; }
We can rewrite that code in AS3 Byte code, by trying to keep v in the stack instead of saving and loading it multiple times during intermediate steps.
public static function roundPowerOf2_v2( v:int ):int { __asm( GetLocal(v), //+1|-0 DecrementInt, Dup, //+1|-0 PushUInt(1), //+1|-0 ShiftRight, //+1|-2 BitOr, Dup, PushUInt(2), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(4), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(8), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(16), ShiftRight, //+1|-2 BitOr, IncrementInt, SetLocal(v) ); return v; }
which version is the fastest?
What do you get?
Full code
package { import apparat.asm.*; import apparat.inline.Inlined; import com.buraks.utils.fastmem; public class TestPerf extends Inlined { public static function roundPowerOf2_v0( v:int ):int { fastmem.fastSetFloat(v-1,0); return ((v-3)>>>31) ? v : 1<<((fastmem.fastGetI32(0) >>> 23)-126); } public static function roundPowerOf2_v1( v:int ):int { v--; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return v+1; } public static function roundPowerOf2_v2( v:int ):int { __asm( GetLocal(v), //+1|-0 DecrementInt, Dup, //+1|-0 PushUInt(1), //+1|-0 ShiftRight, //+1|-2 BitOr, Dup, PushUInt(2), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(4), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(8), ShiftRight, //+1|-2 BitOr, Dup, PushUInt(16), ShiftRight, //+1|-2 BitOr, IncrementInt, SetLocal(v) ); return v; } } private function roundPowerOf2():void { var i:int = 0; var j:int = 0; var a:int; var b:int; var c:int; var m:int; const REPS:int = 20000000; for(i=1;i<REPS;i++) { var l1:int = TestPerf.roundPowerOf2_v0(i); var l2:int = TestPerf.roundPowerOf2_v1(i); var l3:int = TestPerf.roundPowerOf2_v2(i); if(l1!=l2) throw("error"+i+" v0:"+l1+" v1:"+l2); if(l1!=l3) throw("error"+i+" v0:"+l1+" v2:"+l3); } var time1:uint = getTimer(); for(i=1;i<REPS;i++) { a = TestPerf.roundPowerOf2_v0(i); } var time2:uint = getTimer(); for(i=1;i<REPS;i++) { a = TestPerf.roundPowerOf2_v1(i); } var time3:uint = getTimer(); for(i=1;i<REPS;i++) { a = TestPerf.roundPowerOf2_v2(i); } var time4:uint = getTimer(); startTest(); setTest( "v0 alchemy" , "roundPowerOf2", time2-time1 ); setTest( "v1 bittwiddling" , "roundPowerOf2", time3-time2 ); setTest( "v2 bittwiddling asm", "roundPowerOf2", time4-time3 ); endTest(); }
I get similar results to yours, except that “bittwiddling” and “bittwiddling asm” are reversed. Environment:
Mac OS X 10.8
2.3 GHz Intel Core i7
Flash Player 11.8 (release, Chrome Pepper plugin)
Interesting, I just tried on my Mac, and I also get the asm version to be slower.