10-14-2014, 12:01 AM
i can only speak for my self when i say that i see and understand the drawbacks that your idea would suggest, however when i ask about them i'm more so asking as to how you intend to minimize these effects.
Fwd in our terminology seams different then that of your, as does write back, so between what i had tryed to point out and what you have pointed out we have gotten lost in translation.
So by write back i mean the writing back to the GPRs, and for Fwd i mean writing back to ALU input from ALU output, these arnt particualy the most advanced meanings, if anything overly simplistic and you may find it a little awkward at first, but truly no two institutes have the esact same technical terminology.
As for the "compiler" as i'm a litle unsure as weather this device you discribe technicly counts as a full compiler, (I'm aware that it's not far out) could do this over a single pass, given a unlimited amount of tags if you will.
My only suggestion about this that i can make is to just have it conscientiously have it cycle over until it comes up with no more logical improvements, in other words stops optimizing
I'll hand it to you that you handle the bulk stuff quite well, as in allowing for maximum throughput, however i must leave the comment of it under the described setups you have does seem to have a few times where it will struggle.
Also i don't quite think you followed on the instruction stuff fully, i was assuming that you could pulse that as you would other stuff and have a throughput of 3 tick/instruction, and then expanding upon that by asking how you could make much effective use out of a 2nd PE via how much time you had between having to fetch instructions for the first PE, as well having to deal with the issues of keeping the data streams between PE1 and PE2 coherent.
Or correct me if i'm wrong but you do wish to only pulse data through the device and not use buffers, but control the logic via timing thus regardless to the stage size of something, you could always maintain a instruction every 3 ticks.
Fwd in our terminology seams different then that of your, as does write back, so between what i had tryed to point out and what you have pointed out we have gotten lost in translation.
So by write back i mean the writing back to the GPRs, and for Fwd i mean writing back to ALU input from ALU output, these arnt particualy the most advanced meanings, if anything overly simplistic and you may find it a little awkward at first, but truly no two institutes have the esact same technical terminology.
As for the "compiler" as i'm a litle unsure as weather this device you discribe technicly counts as a full compiler, (I'm aware that it's not far out) could do this over a single pass, given a unlimited amount of tags if you will.
My only suggestion about this that i can make is to just have it conscientiously have it cycle over until it comes up with no more logical improvements, in other words stops optimizing
I'll hand it to you that you handle the bulk stuff quite well, as in allowing for maximum throughput, however i must leave the comment of it under the described setups you have does seem to have a few times where it will struggle.
Also i don't quite think you followed on the instruction stuff fully, i was assuming that you could pulse that as you would other stuff and have a throughput of 3 tick/instruction, and then expanding upon that by asking how you could make much effective use out of a 2nd PE via how much time you had between having to fetch instructions for the first PE, as well having to deal with the issues of keeping the data streams between PE1 and PE2 coherent.
Or correct me if i'm wrong but you do wish to only pulse data through the device and not use buffers, but control the logic via timing thus regardless to the stage size of something, you could always maintain a instruction every 3 ticks.