I am trying to understand this basic notion in DSP architecture and instruction execution:
"Based on Harvard architecture, the CPU can concurrently fetch the data and instruction words...- Instruction fetches can take place during previous instruction execution and not wait for either a finish of instruction execution or have to stop the processor's operation while the next instruction is being fetched."
However due to my limited knowledge of computer architecture, this question arises to me: "If the data (operands) to be manipulated are destined by the instruction word, how is it possible!? imagine by iteration from very first cycle, the instr. is loaded from prog. memory, then the two operands should be loaded in next cycle and here is the ambiguity: now it is the execution time/cycle turn, so if while loading the data, the next instr. was loading simultaneously, the previous loaded instr. was lost and thus what could happen to the execution of that!? Or am i wrong and the execution is done immediately by loading the data from memory to data register!?"
** code example: MPYF3 *(AR0)++, *(AR1)++, R0
*addendum: I think, Since there is no register file so there is no load of any data into any register - directly done through memory!! So in my opinion, after very first instr. has fetched, in next cycle the required data (operands) destined by the prev. instr. are manipulated (instr. exec.) through memory by functional unit and meanwhile the next instr. word is fetched, and the address of operands are also updated (as a result of exceution, through address register ALU); All because each of this operations (data access, arithmetic operation, address update, instruction fetch) are processed via distinctive - physical - architecture.
Is there anyone who can assure me about this my interpretation!? Explanation of a typical instruction iterated over cycles making use of concurrent data and instruction access in DSP Harvard architecture is greatly appreciated.
Thanks in advance
The Harvard architecture is one that has two distinct memories, caches, busses, etc. One is for the instructions and the other is for the data. This is in stark contrast to the Von Neumann architecture which has a single unified memory for instructions and data.
An aside: A common problem with C/C++ software is the buffer overflow, where you (the bad guy) maliciously write lots and lots of instructions as the "input data" to a program in the hopes that someone didn't check input length and will accidentally allow your "data" (which is actually a program, disguised as data!) to overwrite the "instructions" portion of memory. Then when the program runs into those new instructions: BAM! Your "data" (which is actually a program) now has control over the original program. Harvard architectures don't suffer from this problem by virtue of their two separate memory spaces.
So how does this DSP CPU accomplish multiple things at a time? Is it magic? Not really. What this means is simply that the CPU is pipelined. Pipelined means that a CPU can be executing the start of one instruction, the middle of another, and the end of a third all at the same time. How? By keeping a set of intermediate result registers which serve as the "output" for one stage of the pipeline and the "input" to another.
It's worth noting that pipelining has nothing to do with Harvard/Von Neumann architecture. Both can be pipelined.