I am not sure if this is the correct place to ask this but here it goes:
I was wondering if it is possible to implement pipelining stages into a Von-Neuman Architecture that uses an accumulator to hold a value along with a PC, memory buffer register, instruction register, and memory address register? There would also be a output register to hold the output and a input register to hold input.
I was wondering this and was thinking that a 3-stage pipeline(Fetch,Decode,Execute) would be more feasible than a 5-stage pipeline as that would introduce the need of extra registers.
Are there any examples of this and is it possible to implement theoretically?
Sure, of course, at least pipelining fetch/decode, and probably data-load would be extra helpful since every instruction will have a data memory address embedded in it.
(Some instructions might allow a memory-indirect addressing mode, i.e. load a pointer from memory and then dereference it, allowing indirect addressing without self-modifying code. An in-order pipeline would probably have to stall while doing the 2nd load. Self-modifying code sucks a lot for pipelining and especially OoO exec, if you want to keep the pipeline coherent. You could keep it simple and only guarantee coherence after a jump or something, and discard fetch/decode results on jumps to make sure you're observing newly-stored instructions, if you want to support self-modifying code like some toy accumulator ISAs need to e.g. loop over an array. (e.g. Little Man Computer) )
Register-renaming with out-of-order execution would probably be valuable, given that software has only 1 architectural register to play with. A store buffer with store forwarding would provide the equivalent for memory / cache, given effective memory disambiguation (store-forwarding detection). See this Q&A for more about what store buffers do, and links from there including https://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/.
Note that modern x86 CPUs are able to pipeline instructions like
add eax, [mem]
memory-source add into the accumulator, andmov [mem], eax
stores of the accumulator. (x86 has other registers, but you could in theory use it with just the one). Modern x86 CPUs decode memory-source ALU instructions to 2 uops, the load and the add, which execute separately in the out-of-order back-end. See Modern Microprocessors A 90-Minute Guide! for a gentle intro that builds up to more complex CPUs, including modern x86 decoding to "RISC-like" micro-ops.You could build a pipeline pretty much exactly like Intel Sandybridge-family (see David Kanter's deep dive with block diagrams of different parts of the front and back ends in Haswell - https://www.realworldtech.com/haswell-cpu/) or AMD Zen, that runs an accumulator ISA instead of x86.
Or to keep it a bit simpler, like P5 Pentium (dual issue in-order without decoding to RISC-like uops, so it can't pipeline memory-source ALU instructions as well), or 486 (in-order pipelined single issue).
I doubt there are any commercial examples; AFAIK no pure accumulator ISAs are relevant enough that anyone would want to buy a high-performance implementation of an inherently inefficient ISA, instead of buying a CPU that could run a register ISA faster for the same cost in dollars, power, silicon, and design effort.
(i.e. you could do this, or you could spend similar effort to pipeline a good ISA like ARM, MIPS, or RISC-V. Or even something more CISCish like m68k.)
No reason to assume a toy microarchitecture with a single MAR/MBR/IR, though. Each instruction needs its own IR through the pipeline, assuming it's an ISA with fixed-width instructions that even makes sense to have an IR, rather than control signals based on decode results.
x86 registers: MBR/MDR and instruction registers explains why real-world x86 CPUs don't have just one of those, and don't have an "IR" at all. And that a MAR/MBR is too simple a model for pipelined cache access, especially on a CPU with virtual memory.