Assembly 8086 - Implementation of any multiplication and division without MUL and DIV commands
Like everything else in an assembly, there are many ways to do propagation and division.
- Division by multiplication by the response value .
- Use shifts and add / subs instead of multiplying.
- Use the address calculation options
lea
(only for multiplication).
The busting myth
as they require a lot of cpu cycles
MUL
and are IMUL
incredibly fast on modern processors, see http://www.agner.org/optimize/instruction_tables.pdf DIV
and IDIV
are and have always been extremely slow.
Example for Intel Skylake (p. 217):
MUL, IMUL r64: Delay 3 cycles, reverse bandwidth 1 cycle.
Note that this is the maximum latency to double 64! bit. The CPU can do one of these multiplications every CPU cycle if it's all done by multiplication.
If you think the above example using shifts and additions to multiply by 7 has a 4 cycle delay (3 using lea). There is no real way to beat conventional multiplication with a modern processor.
Multiplication by response
According to Agner Fog asm lib instruction page 12 :
On most microprocessors, separation is slow. In floating point calculations, we can make multiple divisions with the same divisor faster by multiplying by the inverse, for example:
float a, b, d; a /= d; b /= d;
can be changed to:
float a, b, d, r; r = 1.0f / d; a *= r; b *= r;
If we want to do something like this with integers, then we must scale the reciprocal divisor by 2n and then shift n places to the right after multiplying.
Multiplication by inverse works well when you need to divide by a constant or if you are dividing the same variable many times in a row.
You can find some really cool assembly code demonstrating the concept in Agner Fog assembly. ...
Shifts and add / subs
Right shift is a division by two shr
- ( R educe).
Left shift is multiplied by two shl
- ( L arger).
You can add and subtract to fix non-two ways errors.
//Multiply by 7
mov ecx,eax
shl eax,3 //*8
sub eax,ecx //*7
Separations other than degrees of 2 using this method quickly become difficult.
You may be wondering why I am doing the operations in a strange order, but I am trying to keep the dependency chain as short as possible to maximize the number of commands that can be executed in parallel.
Usinglea
Lea is an instruction for calculating address offsets.
You can calculate the folds 2,3,4,5,8 and 9.
For example:
lea eax,[eax+eax] //*2 1 cycle
lea eax,[eax*2+eax] //*3 2 cycles
lea eax,[eax*4] //*4 2 cycles
lea eax,[eax*4+eax] //*5 2 cycles
lea eax,[eax*8] //*8 2 cycles
lea eax,[eax*8+eax] //*9 2 cycles
Note, however, that lea
the multiplier is considered a "complex" instruction for many processors and takes 2 CPU cycles, whereas a simple shift takes only one cycle. * 4 and * 8 can be performed faster using shl
.
On the plus side, lea
does not change flags and allows you to freely move to another destination register. Since it lea
can shift left by 0, 1, 2, or 3 bits, these are the only gaps you get.
source to share
Things like SHL / SHR, SAL / SAR, ADD / SUB are faster than MUL and DIV, but MUL and DIV work better for dynamic numbers. For example, if you know you just need to divide by two, then it's a one-bit right shift. However, if you do not know the quantity in advance, you may be tempted to re-enter the values. For example, to determine AX divided by BX, you can simply subtract BX from AX until BX is> AX, keeping track of the score. But if you divided by 200, by 1, that would mean 200 loops and SUB operations.
MUL and DIV will perform better in most cases where the digits involved are not hardcoded and known in advance. The only exceptions I can think of is if you know something like multiple / division by 2, 4, 8, etc., where the Shift operators will work fine.
source to share
The implementation of the multiplication is simpler, if you recall, the shl operation performs the same operation as multiplying the specified operand by two. Left shifting two bit positions multiplies the operand by four. Left shifting three bit positions multiplies the operand by eight. In general, shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and additions, shifts and subtractions. For example, to multiply the register of an ax by ten, you only need to multiply it by eight and then add twice the original value. That is, 10 * ax = 8 * ax + 2 * ax. The code for this is -
shl ax, 1 ;Multiply AX by two
mov bx, ax ;Save 2*AX for later
shl ax, 1 ;Multiply AX by four
shl ax, 1 ;Multiply AX by eight
add ax, bx ;Add in 2*AX to get 10*AX
The ax register (or just any register for that matter) can be multiplied by most constant values ββmuch faster using shl than using the mul command. This may sound difficult to believe, since it only takes two instructions to calculate this product:
mov bx, 10
mul bx
However, if you look at the timings, the jump and add example above requires fewer clocks on most 80x86 processors than the mul command. Of course, the code is slightly larger (by a few bytes), but usually it's worth it. Of course, on later 80x86 processors, the mul command is quite a bit faster than previous processors, but the swap and add scheme is usually faster on these processors.
You can also use shift subtraction to perform the multiplication operation. Consider the following multiplication by seven:
mov bx, ax ;Save AX*1
shl ax, 1 ;AX := AX*2
shl ax, 1 ;AX := AX*4
shl ax, 1 ;AX := AX*8
sub ax, bx ;AX := AX*7
This follows directly from the fact that ax * 7 = (ax * 8) -ax.
A common mistake made by early learners of assembly language is subtracting or adding one or two, not ax * 1 or ax * 2. Below ax * 7 is not computed:
shl ax, 1
shl ax, 1
shl ax, 1
sub ax, 1
It computes (8 * ax) -1, something completely different (assuming ax = 1, of course). Beware of this error when using shifts, additions, and subtractions to perform multiplication operations.
The division is a little more complicated, you have to think ...
source to share