Assembly 8086 - Implementation of any multiplication and division without MUL and DIV commands

Question

Assembly 8086 - Implementation of any multiplication and division without MUL and DIV commands

I would like to know if there is a way to perform any multiplication or division without using MUL or DIV instructions as they require a lot of CPU cycles. Can I use SHL or SHR instructions for this purpose? How can I implement the assembly code?

+3

assembly division multiplication cpu-usage

Anth 13 jan. 15 at 12:48

source to share

4 answers

Johan · Answer 1 · 2016-08-25T18:03:08+0000

Like everything else in an assembly, there are many ways to do propagation and division.

Division by multiplication by the response value .
Use shifts and add / subs instead of multiplying.
Use the address calculation options lea

(only for multiplication).

The busting myth

as they require a lot of cpu cycles

MUL

and are IMUL

incredibly fast on modern processors, see http://www.agner.org/optimize/instruction_tables.pdf
DIV

and IDIV

are and have always been extremely slow.

Example for Intel Skylake (p. 217):

MUL, IMUL r64: Delay 3 cycles, reverse bandwidth 1 cycle.

Note that this is the maximum latency to double 64! bit. The CPU can do one of these multiplications every CPU cycle if it's all done by multiplication.
If you think the above example using shifts and additions to multiply by 7 has a 4 cycle delay (3 using lea). There is no real way to beat conventional multiplication with a modern processor.

Multiplication by response

According to Agner Fog asm lib instruction page 12 :

On most microprocessors, separation is slow. In floating point calculations, we can make multiple divisions with the same divisor faster by multiplying by the inverse, for example:
float a, b, d;  
a /= d; b /= d;   

      

        
        
        
      

    
can be changed to:
float a, b, d, r;   
r = 1.0f / d;   
a *= r; b *= r;   

      

        
        
        
      

    
If we want to do something like this with integers, then we must scale the reciprocal divisor by 2n and then shift n places to the right after multiplying.

Multiplication by inverse works well when you need to divide by a constant or if you are dividing the same variable many times in a row.
You can find some really cool assembly code demonstrating the concept in Agner Fog assembly. ...

Shifts and add / subs
Right shift is a division by two shr

- ( R educe).
Left shift is multiplied by two shl

- ( L arger).
You can add and subtract to fix non-two ways errors.

//Multiply by 7
mov ecx,eax
shl eax,3    //*8
sub eax,ecx  //*7

Separations other than degrees of 2 using this method quickly become difficult.
You may be wondering why I am doing the operations in a strange order, but I am trying to keep the dependency chain as short as possible to maximize the number of commands that can be executed in parallel.

Usinglea

Lea is an instruction for calculating address offsets.
You can calculate the folds 2,3,4,5,8 and 9.
For example:

lea eax,[eax+eax]     //*2     1 cycle
lea eax,[eax*2+eax]   //*3     2 cycles
lea eax,[eax*4]       //*4     2 cycles
lea eax,[eax*4+eax]   //*5     2 cycles 
lea eax,[eax*8]       //*8     2 cycles
lea eax,[eax*8+eax]   //*9     2 cycles

Note, however, that lea

the multiplier is considered a "complex" instruction for many processors and takes 2 CPU cycles, whereas a simple shift takes only one cycle. * 4 and * 8 can be performed faster using shl

.
On the plus side, lea

does not change flags and allows you to freely move to another destination register. Since it lea

can shift left by 0, 1, 2, or 3 bits, these are the only gaps you get.

SomeNYCGuy · Answer 2 · 2015-01-13T12:58:09+0000

Things like SHL / SHR, SAL / SAR, ADD / SUB are faster than MUL and DIV, but MUL and DIV work better for dynamic numbers. For example, if you know you just need to divide by two, then it's a one-bit right shift. However, if you do not know the quantity in advance, you may be tempted to re-enter the values. For example, to determine AX divided by BX, you can simply subtract BX from AX until BX is> AX, keeping track of the score. But if you divided by 200, by 1, that would mean 200 loops and SUB operations.

MUL and DIV will perform better in most cases where the digits involved are not hardcoded and known in advance. The only exceptions I can think of is if you know something like multiple / division by 2, 4, 8, etc., where the Shift operators will work fine.

user719596 · Answer 3 · 2016-11-22T17:40:46+0000

Here's an example:

mov bx, 1000b
shl bx, 5
mov cx, bx
shr cx, 2
add bx, cx
add bx, 1000b

RicoRicochet · Answer 4 · 2015-01-13T12:54:31+0000

The implementation of the multiplication is simpler, if you recall, the shl operation performs the same operation as multiplying the specified operand by two. Left shifting two bit positions multiplies the operand by four. Left shifting three bit positions multiplies the operand by eight. In general, shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and additions, shifts and subtractions. For example, to multiply the register of an ax by ten, you only need to multiply it by eight and then add twice the original value. That is, 10 * ax = 8 * ax + 2 * ax. The code for this is -

            shl     ax, 1           ;Multiply AX by two
            mov     bx, ax          ;Save 2*AX for later
            shl     ax, 1           ;Multiply AX by four
            shl     ax, 1           ;Multiply AX by eight
            add     ax, bx          ;Add in 2*AX to get 10*AX

The ax register (or just any register for that matter) can be multiplied by most constant values much faster using shl than using the mul command. This may sound difficult to believe, since it only takes two instructions to calculate this product:

            mov     bx, 10
            mul     bx

However, if you look at the timings, the jump and add example above requires fewer clocks on most 80x86 processors than the mul command. Of course, the code is slightly larger (by a few bytes), but usually it's worth it. Of course, on later 80x86 processors, the mul command is quite a bit faster than previous processors, but the swap and add scheme is usually faster on these processors.

You can also use shift subtraction to perform the multiplication operation. Consider the following multiplication by seven:

            mov     bx, ax          ;Save AX*1
            shl     ax, 1           ;AX := AX*2
            shl     ax, 1           ;AX := AX*4
            shl     ax, 1           ;AX := AX*8
            sub     ax, bx          ;AX := AX*7

This follows directly from the fact that ax * 7 = (ax * 8) -ax.

A common mistake made by early learners of assembly language is subtracting or adding one or two, not ax * 1 or ax * 2. Below ax * 7 is not computed:

            shl     ax, 1
            shl     ax, 1
            shl     ax, 1
            sub     ax, 1

It computes (8 * ax) -1, something completely different (assuming ax = 1, of course). Beware of this error when using shifts, additions, and subtractions to perform multiplication operations.

The division is a little more complicated, you have to think ...

Assembly 8086 - Implementation of any multiplication and division without MUL and DIV commands

More articles: