10-14-2014, 02:47 PM
"Resembles pipelineing"
It's pipelined or not pipelined, no grey area.
And your system IS TIMING based, u wanted to keep control wires and data in synce so u don't need buffers,. Is like the literal definition of pipelineing via timing base system, since u have to focus A LOT of time to timing it all PERFECT.
A compiler won't be able to rename registers effectively, as renaming usually used a pool of "tags" that is a lot larger then ur actuall register count.
And it will only be able to rearrange things so much,, u have to Fwd or do some tricky CU magic to get it fully optimized, a good computer system is a mix of A BUNCH of individually complex systems all working together, not just a simple data loop and a smart compiler.
Compiler also can not help get ur branch conditions sooner, as ur branching effectiveness is determined by your architecture and ur CU/branch hardware.
Memory allocation only helps with preventing storage corruption and inter program security, it doesn't speed up a system unless u plan to access a lot of data quickly and only use it once or twice each. If u have a way for it to speed up processing, due share In better refined explainations please.
Your whole "make logic same as buss time" isn't that effective.. At all., buss time in a well thought out system can be near zero, and ur parts are usually smaller then like 90 blocks (6tick ALU with each tick of logic being the 15block apart)
U would end up with the same speed but a MUCH MUCH bigger system, and the bussing delay would actually be greater.
Also what are u referring to as chunks? The game chucks of blocks loaded? (16x16)
A fwd router.. Doesn't necisarrily take any ticks at all, use a 1tick shifter at output of ALU that inverts signal, then have it invert again before the fwd, in which u can disaple the torches by giving them a 2nd control power source. So u can have in essence a 0tick output demux (it's a demux not router)
All Boolean should be sync, and take the same, if you have anything except a shit ALU, then add, sub and all Boolean will ALWAYS take same time frame.
And FWD is only for ALU, not Mult/div, ur right on that. And for a Mult div sequential system to work, you HAVE TO have a buffer or u will get MC glitches fucking up ur data at a 3 tick through put.. I have tried 3 ticks,, it DOESNT WORK. AT ALL.. Minecraft has a bug with torches, your data tick width will not always be 3 u will find that the bug will randomly give u data tick widths between 1 and 4.. Which isn't consistent enough at all to be feasible, a 4 tick clok MAY work.. But 5 is the only garentee it would function at all. But good luck wit FWD on a 5 tick, u would need at least a Dual Fwd system (so u can not only give data to the immediately following inst, but the one following that one as well... This isn't TOO hard, but if u don't know pipelining and hazard detection well (no offense, it doesn't seem like u are grasping those concepts that well) it can e a BITCH AND A HALF to implement. I suggest lengthining ur clock.
A slower clock can increase speed and preformance due to lower stage counts. Of u don't believe me.. I'll post a picture of my math I did that tests 3 clock speed, 2 different counts of stages.. The faster clock speed only helps if you like hardly wet branch.. It avg out that a 8tick clock with 8 stages.. Is 0.56% slower then a 10tick 5 stages on avgerage over a corse of tests I ran
The test uses a static branch prediction of true, which is actually a worse case scenario. Static False is better but fuck it.
I used these scenarios
(The format is %false branches-%true branches-1branch/(per x inst)
0-100-1/2
30-70-1/2
50-50-1/2
70-30-1/2
0-100-1/4
30-70-1/4
50-50-1/4
70-30-1/4
0-100-1/8
30-70-1/8
50-50-1/8
70-30-1/8
0-100-1/16
30-70-1/16
50-50-1/16
70-30-1/16
Note that IRL programs average like a 30-70-1/8 for calculation heavy programs
And like 50-50-1/16 for memory intensive programs, and with the normal programs I see ppl running in MC, it's like 30-70-1/4 sometimes 30-70-1/2
My 10tick 5 stages handles more frequent branches A LOT better over the course of 100inst.
It's pipelined or not pipelined, no grey area.
And your system IS TIMING based, u wanted to keep control wires and data in synce so u don't need buffers,. Is like the literal definition of pipelineing via timing base system, since u have to focus A LOT of time to timing it all PERFECT.
A compiler won't be able to rename registers effectively, as renaming usually used a pool of "tags" that is a lot larger then ur actuall register count.
And it will only be able to rearrange things so much,, u have to Fwd or do some tricky CU magic to get it fully optimized, a good computer system is a mix of A BUNCH of individually complex systems all working together, not just a simple data loop and a smart compiler.
Compiler also can not help get ur branch conditions sooner, as ur branching effectiveness is determined by your architecture and ur CU/branch hardware.
Memory allocation only helps with preventing storage corruption and inter program security, it doesn't speed up a system unless u plan to access a lot of data quickly and only use it once or twice each. If u have a way for it to speed up processing, due share In better refined explainations please.
Your whole "make logic same as buss time" isn't that effective.. At all., buss time in a well thought out system can be near zero, and ur parts are usually smaller then like 90 blocks (6tick ALU with each tick of logic being the 15block apart)
U would end up with the same speed but a MUCH MUCH bigger system, and the bussing delay would actually be greater.
Also what are u referring to as chunks? The game chucks of blocks loaded? (16x16)
A fwd router.. Doesn't necisarrily take any ticks at all, use a 1tick shifter at output of ALU that inverts signal, then have it invert again before the fwd, in which u can disaple the torches by giving them a 2nd control power source. So u can have in essence a 0tick output demux (it's a demux not router)
All Boolean should be sync, and take the same, if you have anything except a shit ALU, then add, sub and all Boolean will ALWAYS take same time frame.
And FWD is only for ALU, not Mult/div, ur right on that. And for a Mult div sequential system to work, you HAVE TO have a buffer or u will get MC glitches fucking up ur data at a 3 tick through put.. I have tried 3 ticks,, it DOESNT WORK. AT ALL.. Minecraft has a bug with torches, your data tick width will not always be 3 u will find that the bug will randomly give u data tick widths between 1 and 4.. Which isn't consistent enough at all to be feasible, a 4 tick clok MAY work.. But 5 is the only garentee it would function at all. But good luck wit FWD on a 5 tick, u would need at least a Dual Fwd system (so u can not only give data to the immediately following inst, but the one following that one as well... This isn't TOO hard, but if u don't know pipelining and hazard detection well (no offense, it doesn't seem like u are grasping those concepts that well) it can e a BITCH AND A HALF to implement. I suggest lengthining ur clock.
A slower clock can increase speed and preformance due to lower stage counts. Of u don't believe me.. I'll post a picture of my math I did that tests 3 clock speed, 2 different counts of stages.. The faster clock speed only helps if you like hardly wet branch.. It avg out that a 8tick clock with 8 stages.. Is 0.56% slower then a 10tick 5 stages on avgerage over a corse of tests I ran
The test uses a static branch prediction of true, which is actually a worse case scenario. Static False is better but fuck it.
I used these scenarios
(The format is %false branches-%true branches-1branch/(per x inst)
0-100-1/2
30-70-1/2
50-50-1/2
70-30-1/2
0-100-1/4
30-70-1/4
50-50-1/4
70-30-1/4
0-100-1/8
30-70-1/8
50-50-1/8
70-30-1/8
0-100-1/16
30-70-1/16
50-50-1/16
70-30-1/16
Note that IRL programs average like a 30-70-1/8 for calculation heavy programs
And like 50-50-1/16 for memory intensive programs, and with the normal programs I see ppl running in MC, it's like 30-70-1/4 sometimes 30-70-1/2
My 10tick 5 stages handles more frequent branches A LOT better over the course of 100inst.