I have a risc v multi cycle core picorv32, each instruction passes through 3 stages fetch, load registers and execute these are three main other operations are also being performed so i wanna pipeline it that stages of 1 instruction overlap with stages of next instruction. Can anyone guide me how can i do it
I have successfully run core with gcc and verilator and viewed its waves using gtkwavethis image is not mine just using as reference at first i was not sure how and where to start to pipeline it even if its just one pipelining register as the whole core code is written in a single module with everything mixed up i was just so lost whatever i tried it just kept failing....any ideas
In a multicycle design like you're describing, much of the hardware is there for 3 pipeline stages (fetch, decode, execute), though the controller selectively de-activates units as an instruction moves from stage to stage.
A hardware design is a union of all the necessary sub-designs to accomplish any and all instructions in the instruction set. Hardware is difficult to turn off (modulo advanced low power designs), so the common approach when some aspect of the hardware design doesn't apply is to simply ignore its results.
For example, an ALU can add, subtract, compare, and, or, etc.. But we only need it to do one of those at a time, so what happens to the other circuits, i.e. if we want add, then what of the and, or circuitry etc? The answer is that they are not turned off but rather that they do their respective operations and are only later ignored. So, inside the ALU, all those operations are being performed in parallel, even though in the end only one of those operations is used — only one is accepted as the ALU output (while the others are discarded i.e. simply lost).
So, deactivation is generally a matter of ignoring results and not changing state even as the internal circuitry actually does things. For the fetch unit, deactivation might mean not issuing an instruction fetch to the instruction cache, and also not accepting PC+4 as the next PC value. For the decode unit, deactivation could mean ignoring register and control values — most likely these are ignored in the register values that are feed to execute. For the execution unit, deactivation means not reading or writing the data cache, and also not doing register write.
(If your multicycle design has a shared instruction & data cache, that might make it hard to overlap instruction fetch with data memory read/write, so you might overlap fetch with decode instead, or maybe decode with execute.)
So, the idea is that the multicycle controller is activating one unit after another, while deactivating the others — but the hardware for doing each stage is still there and might be used, as long as the same circuitry isn't being used/shared for something else in another stage (in particular here, think internal registers, or the same adder being used in two different stages — the solution would be to duplicate these to tease apart the sharing.)