magazorb Wrote:Assumed you wouldn't do it then because that's to easy XD, would give a overhead though that maybe annoying.I really don't care about how long compile time takes. Also, I don't think compile time is considered overhead.
magazorb Wrote:you don't need to be philosophical about this, if you wanted a 3tick pipeline for instance, 1tick would be on the repeater lock as a stage buffer and other 2 would be logic between stage, or say a 8tick it would be 1 tick on locks and 7 on logic, it's just quicker then using Sr-latches, if you wanted to make a extream data loop with simple pipelining and 1 Fwd, you can theorticly achive a effective dataloop of 5 ticks, 1 tick for buffer, 4 for ULG or Adder/Subtractor with a CU that looks ahead to see if Fwd is required and redirects Data Fwd into the Buffer before the stage, a little messy and large, but still holds in theory and also the PE would require two computes to the final half of the logic for the ALU stage, half acting as Fwd which output is naturally disabled by CU unless Fwd is required, the other is unobstructed and continues to the GPRs, though good luck to anyone trying to implement a 8bit version of that.
But it's a good idea to have stage buffers as sometimes interrupts can disrupt the flow of data and currupt things without it being noticeable by the CU it's self, in tern invalid result and calculations there on that are unknown.
Probably not so much of a worrie for a more simple system without them.
If you really want you can go without stage buffers in MC if all is sync enough, and you know where everything will be and when, but it can make it trickier to implement and get working.
I argue that we did need to ask the question. In fact, I argue we need to go on because you didn't consider where that could have gone.
We'll start small with your counter claim and ask ourselves where buffers and locking repeaters are needed. (I think you'll be mildly surprised as to where this goes.)
You already answered this. They allow for the pipeline to properly forward. Out of order execution and register renaming are ways around a forward.
So, where do pipelines forward at? ALU? CPU? What about out of order execution? Register renaming?
Well, a forward occurs if incoming instructions RAW conflict with one of the instructions inside the pipeline or an incoming instruction WAR conflicts with a later instruction (it's possible in a for loop). Out of order execution triggers for the same reason, and register renaming triggers if a RAW hazard presents with the incoming instruction and the first instruction in the pipeline. If we add that the CPU always does the forward, then we can also say that the ALU will never forward. Going back, this means no buffers or repeater locks are needed anywhere in the ALU because the ALU will never forward. We see that only the CPU forwards in a data conflict, but a forward can be avoided by the other two methods.
With that out of the way, let us then consider a pipelined out of order execution CPU with register renaming where we are guaranteed there will never be a data conflict in the instructions. We see that the renaming and order changing algorithms never trigger, we see the pipeline never forwards, and we see no pipeline bubbles at any point. What does this mean? All the additions to the CPU are unnecessary, they can be removed. All the buffers, all the locking repeaters, all the hardware for the register renaming and the out of order execution disappear. There is no wasted time in the CPU and nothing inside is idle or unused. There is 100% hardware efficiency in the pipeline, with none of the clock cycle lost to buffer and all of the clock cycle in the logic. The only problem with the consideration is that guarantee. There is no way to guarantee all program will be written without data conflicts unless we use magic. Obviously that hypothetical was entirely useless.
Wait, a minute, though. Was that a caveat I used? Indeed it was. Let us explore that caveat. "Without the use of magic," Does that imply this is possible with magic? I would argue it does. With the use of magic, it is possible to guarantee a program could always be written without conflicts. What about secret sauce? I think someone here already equated these two, so that makes this a legal move, as long as we adjust the scope of the claim slightly. We can assume that with secret sauce or magic, it should be equally possible to guarantee a processor will only receive programs with no data conflicts. Note what that scope change was. We moved from guaranteeing a program can be written without conflicts to a program can be received without conflicts. This is because secret sauce has no scope over the programmer, only the computer. In fact the magic can even be removed from the statement because it has been stated before that secret sauce is not magic.
Now (home stretch), let's look at what out of order execution and register renaming do. I will use identical terminology for a third time. These operations are intended to rearrange and alter the lines of code in order to remove data hazards in runtime. So if data hazards are avoided by rearranging or altering data, and secret sauce removes data hazards, by the transitive law of hypothetical models (note this law still a hypothesis), we could say the secret sauce must rearrange or alter data... But when?
Just to hammer in the fact that I have been reusing terminology this whole time, I'm going to quote it.
TSO Wrote:Let's have an extremely short thought experiment. If you don't rearrange or alter data in runtime, and you don't rearrange or alter data at programming time, where would it happen? What other time could possibly exist in the program development flow chart where the data could be rearranged or altered?
How on earth will we solve this conundrum? Is it even possible? I guess we will never know...
But wait, (there's more), we already have answered this question in a completely different experiment! We already have discovered the compiler and the magical compile time! (and there was much rejoicing)
With this magical compiler, we could entirely remove all data hazards. What does that mean? All we need is a compiler that can rearrange and alter instructions whenever a conflict occurs. And what does that mean? It means we can guarantee a program with no data hazards, which means...*Uses copy paste*
TSO Wrote:All the additions to the CPU are unnecessary, they can be removed. All the buffers, all the locking repeaters, all the hardware for the register renaming and the out of order execution disappear. There is no wasted time in the CPU and nothing inside is idle or unused. There is 100% hardware efficiency in the pipeline, with none of the clock cycle lost to buffer and all of the clock cycle in the logic.
Unlike last time, though, we don't need magic to make that guarantee. At the same time, two different points in the discussion have suddenly been fused because it was always the same. One of the secret sauces has been universal to the design from the beginning, always hiding in plain sight for people to infer. You just never took the time to see it.
And finally, with that in mind, an answer arises to the whole of the argument.
TSO Wrote:Is it the way to go? Do you honestly need that repeater to even lock? If we multiply by accumulation, and place 16 3 tick adders with output connected to input incorrectly so the shift happens on it's own, do you really need to hold the other operand's data in a locking repeater, or do we just need it to pause for three ticks at the front of each add? If they are 1 tick adders sixteen blocks long, does the data for the second operand need to have any pause at all, or does the repeater every fifteen blocks suffice?
Indeed, all that was needed was a repeater of the same delay as the adder, not a buffer, and nothing was ever needed between the adders. You only need to ensure signals that originated at the same time propagate together. You do not need to lock them in place.
TSO Wrote:If they are 1 tick adders sixteen blocks long, are they really 1 tick... or are they zero tick because redstone needs a repeater every fifteen blocks?Again, yes, the adder is effectively zero tick because it is no slower than the bus of equivalent length, meaning the adder can be used as the return bus for the operation.
If you direct this multiplier back toward the registers, did you remove the output bussing, or is the bus performing the operations?
Could that be applied to other elements of the processor?
Now, with much preamble (and a shoehorn), we get where I wanted you to go. If the bus operates on the numbers, and opcodes are really just pointers to hardware, could we decode en route? What about the decoders for the register addresses? Could the entire CPU be zero ticks relative to the bus of equal length? I think you know the answer already. If you had read the PM I forwarded to you, you would have already seen all of this post in a much shorter form.
You should all know the answer to that final question, though, because I always give you a, "Yes," for my thought experiments.