Thursday, February 24, 2011

(x % 2), What's all this and-ing and or-ing?

While trying to understand some x86 disassembly, I came across one particularly confusing bit of code (modified for this example):

mov   eax, edx
  L0:
     and   eax, -2147483647        ; 80000001H
  L1:
     jns   SHORT L4
     dec   eax
  L2:
     or    eax, -2                 ; fffffffeH
  L3:
     inc   eax
  L4:
     mov   DWORD PTR _r$[ebp], eax

I couldn't intuitively look at this one and understand what it meant. I could tell that it was primarily looking at only the highest (sign) and the lowest bits. I ended up putting together a table, showing eax at each line (ie. after the result of the previous line)

L0      L1       L2        L3       L4       result
----------------------------------------------------
-ODD    10..01   10..00    11..10   11..11    -1
-EVEN   10..00   01..11    11..11   00..00     0
+ODD    00..01                      00..01     1
+EVEN   00..00                      00..00     0

Oh, now it's painfully obvious; it's just a modulo-2 (%2) operation. But why was that so complicated? After some searching, it appears that one could just use IDIV (Signed Divide) instruction, which places the quotient in (E)AX and the remainder in (E)DX. One instruction, and there's your result, what was so hard about that?

Turns out that IDIV is really slow. IDIV on a Pentium processor takes a whopping 46 clock cycles! Let's compare that to the worst case of this other funky algorithm:
and (r,i)   = 1
   jns (short) = 1  (4 if mispredicted)
   dec (r)     = 1
   or  (r,i)   = 1
   inc (r)     = 1
Wow, only 5 clock cycles, (8 if the jns was mispredicted). No wonder they go through all that hassle!


So next time you see this bizarre x86 assembly, hopefully you can identify it as just a %2 !

References: