Does one loop instruction execute one loop even if the RAM is slow?

I am using the built-in RISC processor. There is one main thing that I have to deal with.

The processor manual clearly states that the instruction ld r1, [p1]

(in C: r1 = * p1) takes one cycle. Register size r1 is 32 bits. However, the memory bus is only 16 bits wide. So how can it fetch all the data in one cycle?

+2


source to share


2 answers


The timing assumes full memory of the wait state with full width. The time taken for the kernel to execute this instruction is one clock cycle.

There was a time when each instruction took a different number of ticks. The memory was also relatively fast, usually a zero wait state. There was a time before the pipelines where you had to record the clock cycle count, then decode the clock cycle, then execute the clock cycle, as well as extra clock cycles for variable length instructions and extra clock cycles if the instruction had a memory operation.



Clock speeds are high today, chip real estate is relatively cheap, so one clock cycle adds or multiplies the rate, just like pipelines and caches. CPU clock speed is no longer a determining factor in performance. Memory is relatively expensive and slow. Thus, caches (configuration, number and size), bus size, memory speed, peripheral speed determine the overall system performance. Typically, an increase in the clock speed of the processor, but not the memory or peripherals, will be minimal for any increase in performance, in some cases it can slow things down.

Memory size and wait states are not part of the specification for running clocks in the reference manual, they only tell you that the kernel itself costs you in units of hours for each instruction. If this is a Harvard architecture, where the command and the data bus are separated, then there can be one clock cycle with the memory cycle. The instruction is retrieved at least in the previous clock cycle, if not earlier, therefore, at the beginning of the clock cycle, the command is ready, decoded and executed (read cycle) occurs within one clock cycle at the end of one clock cycle, the result of reading is fixed in the register. If the command and the data bus are separated, then you can argue that it still ends in one clock cycle, but you cannot get the next instruction, so there is a bit of a stall in there, they can cheat and call it one hour cycle.

+6


source


My understanding: when they say that some instruction takes one cycle, it does not mean that the instruction will be completed in one cycle. We have to take into account the pipe-line statement . Suppose your processor has a 5-stage pipeline, this instruction will take 5 cycles if it was output sequentially.



+1


source







All Articles