How to fast perform the euclidean division of two integers and compute the rest the of the division at the same time ?
The naive approach is to simply get the result using the / and the % operator:
q = i/j; r = i%j;
As we know, modulo is very slow, so instead, we can re-multiply q with j, and compute the difference:
q = i/j; r = i-q*j;
Another way is to implement a division algorithm, such as the Egyptian Division, which use a serie of multiplications by 2 and divisions by 2 (which are bit shifts)
if (i < j) { q = 0; r = i; } else { n = 0; aux = j<<1; while (aux <= i) { aux <<= 1; ++n; } p = (aux >>= 1); q = (1 << n); while (--n >= 0) { if ((aux + (p >>= 1)) <= i) { q += (1 << n); aux += p; } } r = i-aux; }
We can re-work the egyptian division, which is in fact similar to a binary dichotomy, and change some of the computations with the Ruffini–Horner’s method:
r = i; q = 0; if (i < j) { } else { aux = j<<1; n = 1; while (aux <= i) { aux <<= 1; ++n; } do { aux >>= 1; q <<= 1; if (r >= aux) { r -= aux; ++q; } } while (--n) }
Which method is the fastest ?
What do you get ?
Profiling code
protected function euclide():void { var i:int = 0; var j:int = 0; var q:int; var r:int; var c:int; var m:int; var t:int; var p:int; var aux:int; var n:int; const REPS:int = 3000; var left:int = 0, right:int; var calibrateTime:Number = AccurateTimer.calibrateTimer(); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=1;i<REPS;i++) { q = i/j; r = i%j; } } var time1:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=1;i<REPS;i++) { q = i/j; r = i-q*j; //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time2:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=1;i<REPS;i++) { if (i < j) { q = 0; r = i; } else { n = 0; aux = j<<1; while (aux <= i) { aux <<= 1; ++n; } p = (aux >>= 1); q = (1 << n); while (--n >= 0) { if ((aux + (p >>= 1)) <= i) { q += (1 << n); aux += p; } } r = i-aux; } //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time3:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=1;i<REPS;i++) { r = i; q = 0; if (i < j) { } else { aux = j<<1; n = 1; while (aux <= i) { aux <<= 1; ++n; } do { aux >>= 1; q <<= 1; if (r >= aux) { r -= aux; ++q; } } while (--n) } //q = i/j //r contain the rest of i/j //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time4:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=j;i<REPS;i++) { q = i/j; r = i%j; } } var time5:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=j;i<REPS;i++) { q = i/j; r = i-q*j; //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time6:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=j;i<REPS;i++) { n = 0; aux = j<<1; while (aux <= i) { aux <<= 1; ++n; } p = (aux >>= 1); q = (1 << n); while (--n >= 0) { if ((aux + (p >>= 1)) <= i) { q += (1 << n); aux += p; } } r = i-aux; //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time7:Number = AccurateTimer.endTimer(t,calibrateTime); t = AccurateTimer.startTimer(); for(j= 1;j<REPS;j++) { for(i=j;i<REPS;i++) { r = i; q = 0; aux = j<<1; n = 1; while (aux <= i) { aux <<= 1; ++n; } do { aux >>= 1; q <<= 1; if (r >= aux) { r -= aux; ++q; } } while (--n) //q = i/j //r contain the rest of i/j //verify results //if(q != int(i/j)) throw("error"); //if(r != i%j) throw("error"); } } var time8:Number = AccurateTimer.endTimer(t,calibrateTime); startTest(); setTest( "v1" , "euclide", time1 ); setTest( "v2" , "euclide", time2 ); setTest( "v3" , "euclide", time3 ); setTest( "v4" , "euclide", time4 ); //test when the divider is allways smaller: // setTest( "v5" , "euclideSmallerDivider", time5 ); // setTest( "v6" , "euclideSmallerDivider", time6 ); // setTest( "v7" , "euclideSmallerDivider", time7 ); // setTest( "v8" , "euclideSmallerDivider", time8 ); endTest(); }
2.3 GHz Intel Core i7 (quad core) on Mac OS X 10.8 running Flash Player 11.8:
v1: 94.9
v2: 45.4
v3: 51.29
v4: 43.75
It’s amazing that v3 and v4 can be so fast with so much AS3, but I’d still take v2 on this machine for simplicity’s sake or v4 for maximum performance across a wider range of devices (at least yours and mine).
The naïve performance for v1 is horrifying…
Intel Core i5 3470 (3.2 Ghz) on Win 7, 64 bit with Flash Player 11.8:
v1: 97.48 ms
v2: 41.99 ms
v3: 44.93 ms
v4: 42.57 ms
The v2’s performance is fair enough here, but v4 is maybe the winner. The difference is small, and every test it is almost the fastest.
3.3 GHz Intel Xeon (8 cores) on Win 8 running Flash Player 11.8:
v1: 97
v2: 58
v3: 53
v4: 46
Thank you so much for this. I was looping through thousands of points in a 2D grid per-frame and obtaining the x/y indices using the naive method. I now have 8-10% of my frame time back 🙂