mov eax, edx L0: and eax, -2147483647 ; 80000001H L1: jns SHORT L4 dec eax L2: or eax, -2 ; fffffffeH L3: inc eax L4: mov DWORD PTR _r$[ebp], eax

I couldn't intuitively look at this one and understand what it meant. I could tell that it was primarily looking at only the highest (sign) and the lowest bits. I ended up putting together a table, showing eax at each line (ie. after the result of the previous line)

L0 L1 L2 L3 L4 result ---------------------------------------------------- -ODD 10..01 10..00 11..10 11..11 -1 -EVEN 10..00 01..11 11..11 00..00 0 +ODD 00..01 00..01 1 +EVEN 00..00 00..00 0

Oh, now it's painfully obvious; it's just a modulo-2 (%2) operation. But why was that so complicated? After some searching, it appears that one could just use IDIV (Signed Divide) instruction, which places the quotient in (E)AX and the remainder in (E)DX. One instruction, and there's your result, what was so hard about that?

Turns out that IDIV is

*really*slow. IDIV on a Pentium processor takes a whopping 46 clock cycles! Let's compare that to the worst case of this other funky algorithm:

and (r,i) = 1 jns (short) = 1 (4 if mispredicted) dec (r) = 1 or (r,i) = 1 inc (r) = 1Wow, only 5 clock cycles, (8 if the jns was mispredicted). No wonder they go through all that hassle!

So next time you see this bizarre x86 assembly, hopefully you can identify it as just a %2 !

References: