Bit mathematics cookbook

By Joel Yliluoma, January 2014

How to implement various arithmetic and logical operations on platforms without native support for those operations.

Two's complement is assumed.

Table of contents [expand all] [collapse all]

ADD and SUB

Addition without carry

Using subtraction

Addition can be synthesized from subtraction by negating the source operand.

Requirements:

  • Subtraction
  • Negation

	uint_type a      = original_1;
	uint_type b      = original_2;
	b = -b;
	uint_type result = a - b;

Using bitwise operations, with XOR

Addition can be synthesized bit by bit using the logical XOR operation.

Requirements:

  • Logical bit-wise XOR
  • Binary bit-shift to left by 1
  • Logical bit-test

	uint16 a      = original_1;
	uint16 b      = original_2;
	uint16 result = 0;
	uint16 mask   = 1;
	while(mask != 0)
	{
	    uint16 nextmask = mask << 1;
	    if(a & mask) // If A has that bit set?
	    {
	        // If result also has that bit set, set carry (next bit in result)
	        if(result & mask) result ^= nextmask; // This could also be |= instead of ^= .
	        // Flip this bit in result
	        result ^= mask;
	    }
	    if(b & mask) // If B has that bit set?
	    {
	        // If result also has that bit set, set carry (next bit in result)
	        if(result & mask) result ^= nextmask; // This could also be |= instead of ^= .
	        // Flip this bit in result
	        result ^= mask;
	    }
	    mask = nextmask;
	}
The following version does potentially fewer iterations by applying carry calculations to all bits simultaneously.

Requirements:

  • Logical bit-wise XOR
  • Logical bit-wise AND
  • Binary bit-shift to left by 1

	uint_type a      = original_1;
	uint_type b      = original_2;
        uint_type carry  = a & b;
        uint_type result = a ^ b;
        while(carry != 0)
        {
            // If you need the mathematical carry from addition,
            // check the overflow from this shift.
            uint_type shiftedcarry = carry << 1;
            carry  = result & shiftedcarry;
            result = result ^ shiftedcarry;
        }

Other

If you don't have the logical bit-wise XOR operation, see the examples with ADC, below.

ADC: Addition with carry

Using addition & comparison

For platforms without add-with-carry, carry can be synthesized by comparing the result to the original.

Requirements:

  • Addition without carry
  • Unsigned less-than comparison

	uint_type a      = original_1;
	uint_type b      = original_2;
	uint_type result = a + b;
	bool      carry  = result < a; // result < b works too.

Using addition with larger operands

Alternatively, you can get carry by calculating with integer sizes larger than the source operands are.

Requirements:

  • Addition without carry
  • Logical bit shift right & bit-and, or addressing individual byte components

	uint16 a      = original_1;
	uint16 b      = original_2;
	uint32 temp   = a + b;

	// Alternative 1:
	uint16 tempc[] = temp;       // Treat temp as two 16-bit components
	uint16 result  = tempc[0];   // Take low word
	bool   carry   = tempc[1];   // Take high word

	// Alternative 2: Bit-wise operations
	bool   carry  = temp >> 16;    // Bit-wise logical shift right
	uint16 result = temp & 0xFFFF; // Bit-wise AND operation 

Using bitwise operations, without XOR

Alternatively, if you don't have ADD or bit-wise XOR, you can use bit-wise OR and some comparison logic instead. This produces a carry flag, too.

Requirements:

  • Logical bit-wise OR
  • Binary bit-shift to left by 1
  • Logical bit-test
  • Logical bit-wise OR and AND (either as a test or calculation)

	uint16 a      = original_1;
	uint16 b      = original_2;
	uint16 result = 0;
	uint16 mask   = 1;
	bool carry = false;
	while(mask != 0)
	{
            bool bit0 = a & mask;
            bool bit1 = b & mask;

            if(carry)
                { if(bit0 == bit1) result |= mask; carry = bit0 | bit1; }
            else
                { if(bit0 != bit1) result |= mask; carry = bit0 & bit1; }
        
            mask = mask << 1;
        }

Using bitwise shift-with-carry or rotate-with-carry operations

If you don't have any OR or AND operations, but you do have a shift-to-right-with-carry or a rotate-to-right-with-carry, you can implement addition using those.

Requirements:

  • Rotation-to-right with carry
  • Either a logical right-shift that produces a carry, or a rotation-to-right that produces a carry

	uint16 a      = original_1;
	uint16 b      = original_2;
	uint16 result = 0;
        bool math_carry = false;

        for(int bit=0; bit<16; ++bit)
        {
            bool bit0 = a&1; a >>= 1; // Shift-to-right, get carry
            bool bit1 = b&1; b >>= 1; // Shift-to-right, get carry

            if(math_carry)
            {
                bool shift_carry = bit0 == bit1;
                // Shift-to-right with carry: sets msb=carry
                result >>= 1; if(shift_carry) result |= 0x80;

                if(bit0) math_carry = true;
                else if(bit1) math_carry = true;
                else math_carry = false;
                // ^ Same as: math_carry = (bit0 || bit1);
            }
            else
            {
                bool shift_carry = bit0 != bit1;
                // Shift-to-right with carry: sets msb=carry
                result >>= 1; if(shift_carry) result |= 0x80;

                math_carry = false;
                if(bit0)
                    if(bit1)
                        math_carry = true;
                // ^ Same as: math_carry = (bit0 && bit1);
            }
        }
Alternatively, as a state machine:

	uint16 a      = original_1;
	uint16 b      = original_2;
	uint16 result = 0;
        bool math_carry = false;
        for(int bit=0; bit<16; ++bit)
            switch(math_carry)
            {
                case false:
                    { bool bit0 = a&1; a>>=1; // Rotate-right A, get carry
                      if(bit0) goto sum1_x; }
                sum0_x:
                    { bool bit1 = b&1; b>>=1; // Rotate-right B, get carry
                      if(bit1) goto sum1; }
                sum0:
                    result >>= 1; // Clear math_carry, rotate-right result
                    math_carry=false; continue;
                sum1_x:
                    { bool bit1 = b&1; b>>=1; // Rotate-right B, get carry
                      if(bit1) goto sum2; }
                sum1:
                    result >>= 1; result |= 0x8000; // Set carry, rotate-right result
                    math_carry=false; continue;
                sum2:
                    result >>= 1; // Clear math_carry, rotate-right result
                    math_carry=true; continue;
                case true:
                    { bool bit0 = a&1; a>>=1; // Rotate-right A, get carry
                      if(bit0) goto sum2_x; }
                    goto sum1_x;
                sum2_x:
                    { bool bit1 = b&1; b>>=1; // Rotate-right B, get carry
                      if(bit1) goto sum3; }
                    goto sum2;
                sum3:
                    result >>= 1; result |= 0x8000; // Set carry, rotate-right result
                    math_carry=true; continue;
            }
Which would be equivalent to this 6502 code (8-bit operands), assuming the 6502 didn't already have an addition operation:

	; Inputs: a_operand = operand 1
	;         b_operand = operand 2
	;	  C         = carry
	; Outputs:
	;         result    = operand 1 + operand 2 + carry
	;         C         = carry from addition
	;         A         = 8
	;         a_operand = 0
	;         b_operand = 0
	;         bits_remaining = 0
	lda	#8
	sta	bits_remaining
	bcs	next_bit_carryset
	bcc	next_bit_carryclear
sum0_x:
	lsr	b_operand	; Sum=0. Get next LSB from B_operand
	; Arrived here when math_carry=0, A=0, B=x. (sum = 0 or 1)
	; Put B-bit (whatever it is) in result, and keep math_carry=false.
	ror	result	; Rotate-right result
	;jmp	next_bit_carryclear	; Unconditional jump.
next_bit_carryclear:
	dec	bits_remaining	; while(--bits_remaining > 0)
	beq	done_add_clc	; {
	;bne	carry_false
;carry_false:
	lsr	a_operand	; Sum=0. Get next LSB from A_operand
	bcc	sum0_x	        ; Goto sum1_x if set, sum0_x if clear
sum1_x:
	lsr	b_operand	; Sum=1. Get next LSB from B_operand
	bcs	sum2a	        ; Goto sum2a if set, sum1 if clear
	;bcc	sum1
;sum1:
	; Arrived here when math_carry=0, A=1, B=0 (sum=1)
	; Arrived here when math_carry=1, A=0, B=0 (sum=1)
	sec		; Set carry
	ror	result	; Rotate-right result
	bne	next_bit_carryclear	; Unconditional jump.
sum2a:
	; Arrived here when math_carry=0, A=1, B=1 (sum=2)
	; Arrived here when math_carry=1, A=0, B=1 (sum=2)
	clc		; Clear carry
	ror	result	; Rotate-right result
	bpl	next_bit_carryset	; Unconditional jump.

carry_true:
	lsr	a_operand	; Sum=1. Get next LSB from A_operand
	bcc	sum1_x	        ; Goto sum2_x if set, sum1_x if clear
	;bcs	sum2_x
;sum2_x:
	lsr	b_operand	; Sum=2. Get next LSB from B_operand
	; Arrived here when math_carry=1, A=1, B=x. (sum= 2 or 3)
	; Put B-bit (whatever it is) in result, and keep math_carry=true.
	ror	result	; Rotate-right result
	;jmp	next_bit_carryset	; Unconditional jump.
next_bit_carryset:
	dec	bits_remaining	; if(!--bits_remaining)
	bne	carry_true
	;beq	done_add_sec    ;
;done_add_sec:
	sec
	bcs	done_add
done_add_clc:
	clc
	;bcc	done_add
done_add:			; }

Addition of larger integers than native register size

When you need to add together integers that are larger than your native register size, you need to utilize the carry.

Requirements:

  • Addition with carry

	uint_type a[]     = original_1[];
	uint_type b[]     = original_2[];
	uint_type result[];
	
	bool carry = false;    // Clear carry
	for(n = 0; n < 4; ++n) // example if you're doing 32-bit addition using 8-bit integers
	{
	    result[n]     = a[n] + b[n] + carry;
	    carry         = /* save carry from this calculation*/;
	}

Subtraction without carry

Using addition

Subtraction can be synthesized from addition by negating the source operand.

Requirements:

  • Addition
  • Arithmetic negation

	uint_type a      = original_1;
	uint_type b      = original_2;
	b = -b;
	uint_type result = a + b;

SBC: Subtraction with carry

For platforms without sub-with-carry, carry can be synthesized by comparing the result to the original.

Requirements:

  • Subtraction without carry
  • Unsigned less-than comparison

	uint_type a      = original_1;
	uint_type b      = original_2;
	uint_type result = a - b;
	bool      carry  = a < result;

SGN / Testing if value is negative

Testing if value is negative can be done with a comparison:

	int_type a    = original;
	bool negative = a < 0;
or by testing the sign bit:

	uint16 a = original;
	bool negative = a & 0x8000; // The highest-order bit of 16-bit integer
Testing the sign bit can also be done on some platforms with a left-shift operation, if the operation produces a carry.

	uint16 a = original;
	a = a << 1; // Binary left-shift by 1
	bool carry    = /* save carry from this calculation*/;
	bool negative = carry;
And some platforms, such as the 6502, produce a is-negative flag simply by loading the value.

	lda value ; Load the highest-order byte of the value.
	; The N flag now contains the sign bit.
	bmi got_negative ;branch if N set
	bpl got_positive ;branch if N clear
Similarly on i386:

	mov eax, value   ; Load the highest-order word of the value.
	or eax, eax      ; Test its bits
	; test eax, eax  ; Alternative way of testing the bits
	js  got_negative ; branch if SF set
	jns got_positive ; branch if SF clear

NEG

Arithmetic negation

Two's complement negation can be synthesized as follows:

Requirements:

  • Logical negation (NOT operation / inversion)
  • Unsigned addition of 1

	uint_type a      = original;
	uint_type result = ~a;       // NOT operation
	result = result + 1;
Alternatively, you can implement it by subtracting from zero:

Requirements:

  • Subtraction

	uint_type a      = original;
	uint_type result = 0 - a;

NOT

Logical negation (NOT / invert operation)

Bit-wise NOT, i.e. logical negation, can be synthesized in various ways.

Using XOR

XOR the operand with an integer where every bit is 1.

	uint16 a      = original;
	uint16 result = a ^ 0xFFFF; // 0xFFFF has 16 bits

Using subtraction

Subtract the operand from an integer where every bit is 1.

	uint16 a      = original;
	uint16 result = 0xFFFF - a; // 0xFFFF has 16 bits

Using arithmetic negation

Negate the operand and subtract one. (Depends on two's complement arithmetic.)

	uint_type a      = original;
	uint_type result = -a;
	result = result - 1;

AND and OR

Logical bit-wise AND

Bit-wise AND, i.e. logical conjunction, can be synthesized in various ways.

Using NOT and OR

a & b = ~(~a | ~b)

Requirements:

  • Logical negation (NOT operation / inversion)
  • Logical disjunction (OR operation)

	uint_type a      = original_1;
	uint_type b      = original_2;
	a = ~a; // Logical NOT
	b = ~b; // Logical NOT
	uint_type result = a | b; // Logical OR
	result = ~result; // Invert result

Doing it bit-by-bit

If you have an operation that tests whether a particular bit is set, you can implement AND using a loop.

Requirements:

  • Logical bit-test
  • Logical bit-set

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	for(n = 0; n < 16; ++n)
	{
	    // Test whether the bit is set in both operands
	    if(a & (1 << n)) // Test whether bit n is set in a
	    if(b & (1 << n)) // Test whether bit n is set in a
	        result |= (1 << n); // Set bit N in target value
	}
Alternative:

  • Logical bit-test
  • Binary bit-shift to left by 1
  • Addition

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	for(uint16 mask = 1; mask != 0; mask <<= 1)
	{
	    // Test whether the bit is set in both operands
	    if(a & mask) // Test A and mask have common bits
	    if(b & mask) // Test B and mask have common bits
	        result = result + mask;
	}

Bit-by-bit using arithmetic operations only

  • Unsigned integer division (by 2), round towards zero
  • Unsigned integer modulo (by 2) (division remainder)
  • Integer addition
  • Integer equality comparison against zero

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	uint16 mask   = 1;
	for(int bit=0; bit<16; ++bit)
	{
	    if((a % 2) != 0 && (b % 2) != 0) result = result + mask;
	    a = a / 2; // Integer division, round towards zero
	    b = b / 2; // Integer division, round towards zero
	    mask = mask + mask; // Multiply mask by 2
	}

Logical bit-wise OR

Bit-wise OR, i.e. logical conjunction, can be synthesized in various ways.

Using NOT and AND

a | b = ~(~a & ~b)

Requirements:

  • Logical negation (NOT operation / inversion)
  • Logical conjunction (AND operation)

	uint_type a      = original_1;
	uint_type b      = original_2;
	a = ~a; // Logical NOT
	b = ~b; // Logical NOT
	uint_type result = a & b; // Logical AND
	result = ~result; // Invert result

Doing it bit-by-bit

If you have an operation that tests whether a particular bit is set, you can implement OR using a loop.

Requirements:

  • Logical bit-test
  • Logical bit-set

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	for(n = 0; n < 16; ++n)
	{
	    // Test whether the bit is set in either operand
	    if(a & (1 << n)) // Test whether bit n is set in a
	        result |= (1 << n); // Set bit N in target value
	    if(b & (1 << n)) // Test whether bit n is set in a
	        result |= (1 << n); // Set bit N in target value
	}
Alternative:

  • Logical bit-test
  • Binary bit-shift to left by 1
  • Subtraction

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0xFFFF;
	for(uint16 mask = 1; mask != 0; mask <<= 1)
	{
	    // Deduct bit, if it's CLEAR in both operands
	    if(!(a & mask)) // Test A and mask don't have common bits
	    if(!(b & mask)) // Test B and mask don't have common bits
	        result = result - mask;
	}

Bit-by-bit using arithmetic operations only

  • Unsigned integer division (by 2), round towards zero
  • Unsigned integer modulo (by 2) (division remainder)
  • Integer addition
  • Integer equality comparison against zero

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	uint16 mask   = 1;
	for(int bit=0; bit<16; ++bit)
	{
	    if((a % 2) != 0 || (b % 2) != 0) result = result + mask;
	    a = a / 2; // Integer division, round towards zero
	    b = b / 2; // Integer division, round towards zero
	    mask = mask + mask; // Multiply mask by 2
	}

XOR

Logical bit-wise XOR

Bit-wise XOR, i.e. logical bit-wise exclusive-OR, can be synthesized in various ways.

Using NOT, OR and AND

a ^ b = (a & ~b) | (b & ~a)
Two unary operations, three binary operations

	uint_type a = original_1;
	uint_type b = original_2;
	uint_type nota = ~a; // Logical NOT
	uint_type notb = ~b; // Logical NOT
	uint_type result = (a & notb) | (b & nota);
a ^ b = (a | b) &~ (a & b)
One unary operation, three binary operations

	uint_type a = original_1;
	uint_type b = original_2;
	uint_type result = (a | b) & ~(a & b);

Doing it bit-by-bit

You can test bit by bit whether the bit is different in both operands.

Requirements:

  • Logical bit-test
  • Logical bit-set

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	for(n = 0; n < 16; ++n)
	{
	    // Test whether the bit is set in a but not b
	    if(a & (1 << n))    // Test whether bit n is set in a
	    if(!(b & (1 << n))) // Test whether bit n is not set in b
	        result |= (1 << n); // Set bit N in target value

	    // Test whether the bit is set in b but not a
	    if(b & (1 << n))    // Test whether bit n is set in b
	    if(!(a & (1 << n))) // Test whether bit n is not set in a
	        result |= (1 << n); // Set bit N in target value
	}
Alternative:

  • Logical bit-test
  • Binary bit-shift to left by 1
  • Subtraction

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0xFFFF;
	for(uint16 mask = 1; mask != 0; mask <<= 1)
	{
	    // Deduct bit, if it's CLEAR in both operands
	    if(!(a & mask)) // Test A and mask don't have common bits
	    if(!(b & mask)) // Test B and mask don't have common bits
	        result = result - mask;

	    // Deduct bit, if it's SET in both operands
	    if(a & mask) // Test A and mask have common bits
	    if(b & mask) // Test B and mask have common bits
	        result = result - mask;
	}

Bit-by-bit using arithmetic operations only

  • Unsigned integer division (by 2), round towards zero
  • Unsigned integer modulo (by 2) (division remainder)
  • Integer addition
  • Integer equality comparison

	uint16 a = original_1;
	uint16 b = original_2;
	uint16 result = 0;
	uint16 mask   = 1;
	for(int bit=0; bit<16; ++bit)
	{
	    if(((a % 2) + (b % 2)) == 1) result = result + mask;
	    a = a / 2; // Integer division, round towards zero
	    b = b / 2; // Integer division, round towards zero
	    mask = mask + mask; // Multiply mask by 2
	}

SHL/LSL/ASL/SL: Logical/binary/arithmetic bit-shift to left by 1

Logical bit-shifting to the left can be accomplished by multiplying the value by 2, i.e. adding the value to itself.

	uint_type a      = original;
	uint_type result = a + a;
	// This result may also produce a carry. It's often needed.

Larger-size bit-shifting to left

	uint8 a[] = original[];
	
	bool carry = false;    // Clear carry
	for(n = 0; n < 4; ++n) // example if you're doing 32-bit shift using 8-bit integers
	{
	    bool new_carry = (a[n] & 0x80) ? 1 : 0; // Check the highest-order bit
	    result[n] = (a[n] << 1) | carry;
	    carry     = new_carry; /* save carry from this calculation*/
	}

SHR/LSR/SR: Logical/binary bit-shift to right by 1

Logical bit-shifting to the right can be accomplished by dividing the value by 2, assuming division always rounds towards zero.

	uint_type a      = original;
	uint_type result = a / 2;
Of course, often that operation is not available, so...

Without divisions

Logical bit-shifting to the right can also be accomplished through logical bit-shifting to the LEFT by inverse the amount. This requires the use of a register that is double the original's size.

Requirements:

  • Logical bit shift LEFT
  • Addressing individual byte components

	uint16 a        = original;
	uint_type amount  = shift_amount;

	amount = 16-amount; // Invert the shift amount

	uint32 temp = original << amount; // Shift LEFT

	uint16 tempc[] = temp;       // Treat temp as two 16-bit components
	uint16 result  = tempc[1];   // Take high word

Larger-size bit-shifting to right

	uint8 a[] = original[];
	
	bool carry = false;    // Clear carry
	for(n = 0; n < 4; ++n) // example if you're doing 32-bit shift using 8-bit integers
	{
	    bool new_carry = a[n] & 1; // Check the lowest-order bit
	    result[n] = (a[n] >> 1) | (carry ? 0x80 : 0x00);
	    carry     = new_carry; /* save carry from this calculation*/
	}

When you have arithmetic bit-shift to the right

If you have an arithmetic bit-shifting operator but not a logical one, you can synthesize the logical one by clearing the top-order bits.

Requirements:

  • Arithmetic bit-shift to right
  • Logical AND operation

	uint16 a      = original;
	uint16 result = a >> 1;
	result = result & 0x7FFF;  // Keep all bits except the topmost one.
Generalized to shifting by any amount:

Requirements:

  • Logical bit-shift to right
  • Logical bit-shift to left
  • Logical AND operation
  • Subtract by 1

	uint16 a       = original;
	uint_type amount = shift_amount;
	uint16 result  = a >> amount;
	uint16 mask    = (1 << (16-amount)) - 1; // Change 16 to the bit-width of result
	result = result & mask;

ASR/SRA: Arithmetic bit-shift to the right

Whereas logical bit-shiting to the right clears the most significant bits, i.e. it shifts in zeroes, arithmetic bit-shifting to the right will make copies of the most significant bit.

Examples:

When shifting by 1 bit

When shifting uses carry

When your platform's right-shift operation shifts in the carry flag instead of a zero-bit, you can use this to your advantage.

Example on 6502

	cmp #0	; Compare value to zero. Sets carry if it's negative.
	ror a   ; Shift to the right.
	        ; Carry becomes the new highest bit,
	        ; lowest bit becomes the new carry.

Using bit-test

Requirements:

  • Binary bit-shift to right by 1
  • Logical bit-test
  • Logical bit-set

	uint16 a      = original;
	uint16 result = a >> 1; // Logical bit-shift to the right
	// Copy the sign bit from the original.
	result = result | (a & 0x8000);
	// ^ This can also be done with a bit-test & bit-set operation
	//   if it is cheaper, or if OR and AND operations are not available.

When shifting by N bits

	uint16 a         = original;
	uint_type amount = shift_amount;

	uint16 result = a >> amount;
	if(a & 0x8000) // If the number was signed
	{
	    // This shift depends on leftshiting discarding
	    // the excess bits:
	    uint16 signbits = 0xFFFF << (16-amount);

	    // Alternative way:
	    //uint16 signbits = ~((1 << (16-amount)) - 1);

	    // Patch in the sign bits:
	    result = result | signbits;
	}

MUL: Unsigned multiplication

Multiplication without multiplication operations

Unsigned multiplication can be performed with additions and logical bit-shifts.

Requirements:

  • Binary bit-shift to left by 1
  • Binary bit-shift to right by 1 (Note: This must be LOGICAL, not arithmetic shift.)
  • Logical bit-test
  • Addition

	uint_type a      = original_1;
	uint_type b      = original_2;
	uint_type result = 0;
	while(a != 0)
	{
	    if(a & 1) // If the lowest order bit is set in A?
	    {
	        result = result + b;
	    }
	    a = a >> 1; // Note: This must set highest order bit ZERO. It must not copy the sign bit.
	    b = b << 1; 
	}
If you need to do multiplication with larger integer sizes, just implement this algorithm using additions and shifts for larger integer sizes.

Multiplication of larger integers

You can construct larger-size multiplications by hand-crafting the multiplication using bitshifts and additions for larger integer sizes. Alternatively, you can use actual multiplications in the same way as you would do it on paper:

      15              wx        EA=x*z   GF=w*z
     *74             *yz        JH=x*y   LK=w*y
    ----            ----
   =  20           =  EA        B = E+F+H
   +  4            + GF         C = G+J   + carry from B
   + 35            + JH         D = L     + carry from C
   + 7             +LK
    ----            ----
    1110            DCBA
Example in x86_64 assembler, producing 128-bit * 128-bit = 256-bit multiplication with 64-bit registers:

	mov rsi, 0  ; use a zero as a constant sometimes.
	; DCBA now contains ????.
	; store 00EA:
	mov rax, x  ; low 64 bits of operand 1
	mul      z  ; low 64 bits of operand 2
	mov A, rax  ; store low 64 bits of result (A)
	mov rbx, rdx  ; store high 64 bits of result (E)
	mov rcx, rsi  ; clear C and D
	mov rdi, rsi
	; add 0GF0:
	mov rax, x  ; low 64 bits of operand 1
	mul      y  ; high 64 bits of operand 2
	add rbx, rax  ; store low 64 bits of result (F)
	adc rcx, rdx  ; store high 64 bits of result (G)
	;adc rdi, rsi ; not needed here, EA+GF0 never overflows to fourth digit
	; add 0JH0:
	mov rax, w  ; high 64 bits of operand 1
	mul      z  ; low 64 bits of operand 2
	add rbx, rax  ; store low 64 bits of result (H)
	adc rcx, rdx  ; store high 64 bits of result (J)
	adc rdi, rsi
	; add LK00:
	mov rax, w  ; high 64 bits of operand 1
	mul      y  ; high 64 bits of operand 2
	add rcx, rax  ; store low 64 bits of result (K)
	adc rdi, rdx  ; store high 64 bits of result (L)
	; store to actual DCB
	mov B, rbx
	mov C, rcx
	mov D, rdi

MLI/IMUL/MULI: Signed multiplication

Signed multiplication can be accomplished by doing an unsigned multiplication and taking manually care of the negative-signs.

Requirements:

	int_type    a = original_1;
	int_type    b = original_2;
	bool sign = false;
	if(a < 0)
	{
	    sign = !sign; // Toggle the signed flag
	    a = -a; // Negate a
	}
	if(b < 0)
	{
	    sign = !sign; // Toggle the signed flag
	    b = -b; // Negate b
	}
	// a and b are both positive now
	int_type result = a * b; // Do unsigned multiplication
	if(sign)
	{
	    // Negate the result
	    result = -result;
	}

DIV and MOD: Division and modulo

Unsigned integer division and modulo can be performed with additions, subtractions and logical bit-shifts.

Requirements:

	uint_type remain  = dividend; // The left-hand side of division, i.e. what is being divided
	uint_type part1   = divisor;  // The right-hand side of division
	uint_type result  = 0;
	uint_type mask    = 1;
	
	if(part1 == 0)
	{
	    // TODO: Do whatever should happen when dividing by zero.
	    //       Don't run the rest of this algorithm if part1 = 0.
	    return;
	}

	while(part1 < remain)
	// Alternative: while(!(part1 & 0x8000)) // For 16-bit, test highest order bit.
	// Alternative: while(not_signed(part1)) // Same as above: As long as sign bit is not set in part1.
	{
	    part1 = part1 << 1; // Logical bit-shift left.
	    mask  = mask  << 1; // Logical bit-shift left.
	}
	do {
	    if(remain >= part1)
	    {
	        remain = remain - part1;
	        result = result + mask;
	    }
	    part1 = part1 >> 1;  // Bit-wise logical shift right
	    mask  = mask  >> 1;
	} while(mask != 0);
	
	// Now: result = division result (quotient)
	//      remain = division remainder (modulo)
If you need to do division or modulo with larger integer sizes, just implement this algorithm using additions and shifts for larger integer sizes.

DVI/IDIV/DIVI: Signed division

Signed division can be accomplished by doing an unsigned division and taking manually care of the negative-signs.

Requirements:

  • Unsigned division
  • Testing if value is negative
  • Arithmetic negation
  • Logical inversion

	int_type    a = original_1;
	int_type    b = original_2;
	bool sign = false;
	if(a < 0)
	{
	    sign = !sign; // Toggle the signed flag
	    a = -a; // Negate a
	}
	if(b < 0)
	{
	    sign = !sign; // Toggle the signed flag
	    b = -b; // Negate b
	}
	// a and b are both positive now
	int_type result = a / b; // Do unsigned division
	if(sign)
	{
	    // Negate the result
	    result = -result;
	}

MDI/IMOD/MODI: Signed modulo (remainder)

Signed modulo can be accomplished by calculating an unsigned modulo and taking manually care of the negative-signs.

Requirements:

  • Unsigned modulo
  • Testing if value is negative
  • Arithmetic negation
  • Logical inversion

	int_type    a = original_1;
	int_type    b = original_2;
	bool sign = false;
	if(a < 0)
	{
	    sign = !sign; // Toggle the signed flag
	    a = -a; // Negate a
	}
	if(b < 0)
	{
	    // Exception: Do NOT toggle the signed flag.
	    b = -b; // Negate b
	}
	// a and b are both positive now
	int_type result = a % b; // Do unsigned modulo
	if(sign)
	{
	    // Negate the result
	    result = -result;
	}

Comparisons

Equality comparison

Equality can be tested in various ways:

Non-equality and inversion:

   result = !(a != b);
Less-than operation:
   result = !(a < b) && !(b < a);
Subtraction and compare to zero:
   result = a-b; // Zero if equal
XOR and compare to zero:
   result = a^b; // Zero if equal

Non-equality comparison

Non-equality can be tested in various ways:

Equality and inversion:

   result = !(a == b);
Less-than operation:
   result = a < b || b < a;
Subtraction and compare to zero:
   result = a-b; // Non-zero if equal
XOR and compare to zero:
   result = a^b; // Non-zero if equal

Greater-than comparison

Less-or-equal-than and inversion:
   result = !(a <= b);
Less-than operation:
   result = b < a;

Greater-or-equal-than comparison

Less-than operation:
   result = !(a < b);

Less-than comparison

Greater-or-equal-than and inversion:
   result = !(a >= b);

Less-or-equal-than comparison

Greater-than and inversion:
   result = !(a > b);
Less-than operation:
   result = !(b < a);

Appendix: Floating-point mathematics cookbook

Often you also need floating point mathematics, and if the hardware does not have an integer multiplication unit, it most certainly does not have floating point hardware either.

And if it does, it probably cannot calculate trigonometry at hardware level.

The list of standard trigonometric functions provided by a floating point math library may look intimidating, but a careful study reveals fascinating equivalencies, that enable reducing nearly everything into a set of just few functions: sqrt, log, exp and sin.

Reducing complicated functions into simpler components using complex arithmetic

When you operate using complex-number mathematics, everything can be reduced into sqrt, log and exp.

Operation Reduction Other equivalent functions
asinh(x) log(x + sqrt(x² + 1)) = −asinh(−x)
= −1 i* asin(−1i*x)
= 1 i*asin(1i*x)
acosh(x) log(x + sqrt(x² − 1)) = 1 i*acos(x) when tan(arg(x)) > 0
= −1 i*acos(x) when tan(arg(x)) < 0
asin(x) 1i *log(1i*(x + sqrt(x²−1) = −asin(−x)
= −1 i*asinh(−1i*x)
= 1 i*asinh(1i*x)
acos(x) −1 i *log(x + sqrt(x²−1)) or −1i*log(x − sqrt(²−1)) = 1 i*acosh(x) when tan(arg(x)) < 0
= −1 i*acosh(x) when tan(arg(x)) > 0
atanh(x) log( (1+x) / (1−x)) / 2 = i * atan(−1i * x)
= sometimes also −1i*atan(1i*x)
atan(x) −0.5 i*log((1+1i*x) / (1−1i*x)) = −1 i*atanh(1i*x)
= sometimes also 1 i*atanh(−1i*x)
sinh(x) 0.5 * (exp(x) − exp(−x)) = −sinh(−x)
= 1i*sin(−1i*x)
= −1i*sin(1i*x)
cosh(x) 0.5 *(exp(x) + exp(−x)) = cosh(−x)
= cos(1i*x)
= cos(−1i*x)
sin(x) 0.5 i*(exp(x*−1i) − exp(x*1i)) = −sin(−x)
= 1i*sinh(−1i *x)
= −1i*sinh(1i*x)
cos(x) 0.5 *(exp(x*−1i) − exp(x*1i)) = cos(−x)
= cosh(1i*x)
= cosh(−1i*x)
tanh(x) sinh(x) / cosh(x)
= −1 * (1−exp( 2*x)) / (1+exp( 2*x))
= −tanh(−x)
= tan(1i*x)
= −tan(−1i*x)
tan(x) sin(x) / cos(x)
= 1i * (1−exp(2i*x)) / (1+exp(2i*x))
= −tan(−x)
= tanh(1i*x)
= −tanh(−1i*x)
sqrt(x) exp(0.5 * log(x)) = pow(x,0.5)
exp(x) = pow(e, x) (where e is euler's constant)
pow(x,y) exp(y * log(x)) Note: If x is negative, log(x) will produce complex numbers.
log10(x) log(x) / log(10)
x/y = pow(x,-1)
= x * exp(-log(y))

Complex number functions and arithmetic

Implementing complex functions through real functions:

Operation Equivalent function using real functions only What happens in special cases
sin(x) sin(Xr)*cosh(Xi) + i*cos(Xr)*sinh(Xi) sin(Xr) if Xi=0 (duh)
i*sinh(Xi) if Xr=0
cos(x) cos(Xr)*cosh(Xi) − i*sin(Xr)*sinh(Xi) cos(Xr) if Xi=0
cos(Xi) if Xr=0
sinh(x) sinh(Xr)*cos(Xi) + i*cosh(Xr)*sin(Xi) sinh(Xr) if Xi=0
i*sin(Xi) if Xr=0
cosh(x) cosh(Xr)*cos(Xi) + i*sinh(Xr)*sin(Xi) cosh(Xr) if Xi=0
cos(Xi) if Xr=0
tan(x) (sin(2*Xr) + i*sinh(2*Xi)) / (cos(2*Xr) + cosh(2*Xi)) tan(Xr) if Xi=0
i*tanh(Xi) if Xr=0
tanh(x) (sinh(2*Xr) + i*sin(2*Xi)) / (cos(2*Xr) + cosh(2*Xi)) tanh(Xr) if Xi=0
i*tan(Xi) if Xr=0
exp(x) exp(Xr)*cos(Xi) + i*exp(Xr)*sin(Xi) exp(Xr) if Xi=0
i*sin(Xi) if Xr=0
log(x) log(abs(X)) + i * arg(X) log(Xr) if Xi=0 and Xr>=0
log(-Xr) − πi if Xi=0 and Xr<0 (doesn't reduce into real)
-iπ/2 if Xr=0 and Xi<0
iπ/2 if Xr=0 and Xi>0.
abs(x) sqrt(Xr² + Xi²) Xr if Xi=0
Xi if Xr=0
arg(x) atan2(Xi,Xr) 0 if Xi=0 and Xr>0
π if Xi=0 and Xr<0
-iπ/2 if Xr=0 and Xi<0
iπ/2 if Xr=0 and Xi>0.
atan(x) 0.25i*log((Xr²+Xi²+2*Xi+1) / (Xr²+Xi² −2*Xi+1)) + 0.5 *(atan2(Xr,Xi+1)+atan2(Xr,1−Xi)) atan(Xr) if Xi=0
atanh(x) 0.25 *log((Xr²+Xi²+2*Xr+1) / (Xr²+Xi² −2*Xr+1)) + 0.5i *(atan2(Xi,Xr+1)+atan2(Xi,1−Xr)) atanh(Xr) if Xi=0
x+y Xr + i*Xi + Yr + i*Yi
x−y Xr + i*Xi − Yr − i*Yi
x*y (Xr*Yr − Xi*Yi) + i*(Xi*Yr + Xr*Yi) i*i = −1
x/y (Xr*Yr + Xi*Yi) / (Xr² + Yi²) + i*((Xi*Yr − Xr*Yi) / (Yr² + Yi²))
sqrt(x) sqrt((abs(x) + Xr) / 2) + i*sgn(Xi)*sqrt((abs(x) − Xr) / 2) where sgn(n)
= -1, 0 or 1 depending on n's sign.

It is especially interesting to see that complex-number sqrt, log and exp can all be implemented in terms of real-number sqrt, log, exp, sin/cos and atan. This defines the minimum set of floating point functions that must be implemented.

Real-number functions using other functions

Operation Alternatives Operation Alternatives
sin(x) −sin(−x)
cos(π/2 − x)
tan(x) * cos(x)
sinh(x) −sinh(−x)
exp(x) − cosh(x)
cosh(x) − exp(−x)
2*sinh(x/2)*cosh(x/2)
1/2 * (exp(x) − exp(−x))
(exp(2*x)−1) / (2*exp(x))
tanh(x) * cosh(x)
cos(x) cos(−x)
sin(π/2 − x)
sin(x) / tan(x)
cosh(x) cosh(−x)
exp(x) − sinh(x)
exp(−x) + sinh(x)
1/2 * (exp(x) + exp(−x))
(exp(2*x)+1) / (2*exp(x))
sinh(x) / tanh(x)
tan(x) −tan(−x)
sign(x) * sqrt(1 / cos(x)²-1)
sin(x) / cos(x)
tanh(x) −tanh(−x)
(exp(2*x)−1) / (exp(2*x)+1)
sinh(x) / cosh(x)
asin(x) −asin(−x)
π/2 − acos(x)
atan(x / sqrt(1 − x²))
asinh(x) log(x + sqrt(x² + 1))
1/2 * acosh(2*x² + 1)
acos(x) π − acos(−x)
π/2 − asin(x)
π/2 − atan(x / sqrt(1 − x²))
asin(sqrt(1−x²)) if 0<=x<=1
acosh(x) log(x + sqrt(x² − 1))
1/2 * acosh(2*x² − 1)
atan(x) −atan(−x)
asin(x² / sqrt(x²+1))
2 * atan((sqrt(1+x²) − 1) / x)
2 * atan(x / (1 + sqrt(1 + x²)))
atan(y) + atan((x−y)/(1+x*y)) when sign(y)=sign(x)
atanh(x) 1/2 * log(x+1) − 1/2 * log(1−x)
log( (1+x) / (1−x)) / 2
exp(x) pow(e, x)
cosh(x) + sinh(x)
cosh(−x) − sinh(−x)
pow(1 + x/y, y) when y approaches infinity
exp(x−y) * exp(y) for any y
log(x) log10(x) / log10(e)
log2(x) / log2(e)
asinh((x²+1) / (2*x))
acosh((x²−1) / (2*x))
2*atanh((x−1) / (x+1))
log(x/y) + log(y) for any y>0
log2(x) log10(x) / log10(2)
log(x) / log(2)
log10(x) log2(x) / log2(10)
log(x) / log(10)
pow(x,y) 1 / pow(x,−y)
exp(y*log(x))
if y is integer, a special algorithm may be applicable
always pay attention to special cases
sqrt(x) pow(x,0.5)
exp(0.5 * log(x))

Floating point addition

TODO

Floating point subtraction

TODO

Floating point multiplication

	int_type a_exponent = original_1_exponent;
	int_type a_mantissa = original_1_mantissa;
	bool     a_sign     = original_1_sign;
	int_type b_exponent = original_2_exponent;
	int_type b_mantissa = original_2_mantissa;
	bool     b_sign     = original_2_sign;

	int_type result_exponent = original_1_exponent + original_2_exponent;
	int_type result_mantissa = original_1_mantissa * original_2_mantissa;
	bool     result_sign     = a_sign ^ b_sign; // XOR
	
	// After this, comes renormalization, which is the adjusting of
	// the exponent & mantissa until the mantissa is in proper range.

Floating point division

	int_type a_exponent = original_1_exponent;
	int_type a_mantissa = original_1_mantissa;
	bool     a_sign     = original_1_sign;
	int_type b_exponent = original_2_exponent;
	int_type b_mantissa = original_2_mantissa;
	bool     b_sign     = original_2_sign;

	int_type result_exponent = original_1_exponent - original_2_exponent;
	int_type result_mantissa = original_1_mantissa / original_2_mantissa;
	bool     result_sign     = a_sign ^ b_sign; // XOR
	
	// After this, comes renormalization, which is the adjusting of
	// the exponent & mantissa until the mantissa is in proper range.

Floating point sqrt(x) : square root

Floating point log(x) : natural logarithm

Floating point exp(x) : exponential function

Floating point sin(x) and cos(x)

You only need to implement one of these, and derive the other.

http://upload.wikimedia.org/math/5/4/6/546ecab719ce73dfb34a7496c942972b.png

Reading:

Floating point atan(x)

Other floating point functions

Even though the other trigonometric functions can be implemented in terms of the basic functions, it may still be beneficial to implement them separately using Taylor series for better accuracy.

http://upload.wikimedia.org/math/1/1/a/11a93a7c7a5d23f74cdbab5a1e513123.png

http://upload.wikimedia.org/math/4/7/3/47383305b98c57bef4dad00d1490dd84.png

http://upload.wikimedia.org/math/4/3/0/43047f57ab3e2183d7875764566580f4.png

http://upload.wikimedia.org/math/8/4/a/84aa1c9392c1d950fc6ab5a51043fc02.png

http://upload.wikimedia.org/math/a/1/6/a1635f7fd0e9e031ad7f92b27271226a.png


Last edited: 2014-06-03 22:39:06