10-10-2014, 08:26 PM
(10-07-2014, 02:42 AM)TSO Wrote: Or you can rename registers as you go... but that's for noobscough cough, i can do that, and it only takes 3 ticks.
(10-07-2014, 11:52 PM)Magazorb Wrote:(10-06-2014, 05:40 AM)TSO Wrote: Full Quote lol
EE knowledge can be applied very well in MC, just not the fundamentals.
Look on my plot and you'll see nothing interesting, my brain is my library of knowledge, not my plot i probably can't compare to Decapo but i do definitely simulate just about everything in my head, so we are much alike if that's how you truly are (Yay for dyslexia power )
Thanks Maga xD lol i have a problem with learning,, i must always be learning more, and CPU arch. stuff has been my thing for like a year so ya xD
and i am severely ADHD, and i do the same thing, everything i do, i have already ran like 100times in my head to make sure it works.
(10-08-2014, 02:04 PM)greatgamer34 Wrote: umm, you'd be surprised how much better Solid state is. Just ask decapo, his stack is super small and super fast!
just saying,,, that stack isnt mine, i fell asleep one night talking about wanting to make one,, i woke up and dylan was like I HAS U A STACKS!,,,, was like fuck ya,, then the next night him and freeman compacted the fuck out of it even more xD so lol its in the CPU but its Dylan's design... The CPU is also pretty much a colab, my arch, dylans implementation and compacting as well as general part design, and i did the architecture, assembly, IS, and technical maths stuff, as well as build a good amount of the pieces.
(10-09-2014, 10:18 PM)greatgamer34 Wrote: Most people(including me) in minecraft have a Compare instruction then followed by a Flag register read.
The Flag register read can be BEQ(branch if equal to), BGT(branch if greater than), BLT(branch if less than), BNEQ(branch if not equal to)....etc, along these lines of any amount of possible conditions.
This then reads from the proper flags register and performs a jump to a specified address(usually found in the IMM or a pointer).
Ok so my branching works a bit different, cause my crazy ass cRISC or CISC or what ever u want to consider it Architecture.
I have 2 modes in my IS, Memory/System is mode0 and ALU functions are Mode1
Branching is a Mode0, And it is also a multi line
I have a specially made 3 deep Queue with mutliread so when a branch is detected, it reads Locations (Inst)0, (Inst)1, and (Inst)2 from the Queue, and routes there data to specific parts
Inst0 is the Main Inst, it tells what conditions to look for, weather it is a Call or a Return, weather it is conditional or not, if its direct, relitivePos. or RelitiveNeg.
Inst1 is the Destination address (so i can have 65535 or what ever lines of code on just a PROM, more if i access external memory, which is easy to do) that is only loaded into the PC if the condition is true
Inst2 is the function that defines where the flag can arise from, and this inst MUST be a Mode1, so u can add, sub, or, xor, compare, ect to generate any flag you want.
All of that gets decoded and sorted out in about 7ticks, then the branch is determined on the next cycled wether the conditions are mett, it has static prediction of False, so u only get hit with a 1 cycle penalty after a True flag comes through, leaving the penalty of branching not that devastating.
I will be making a forum post this weekend with pics and such of my MC CPU, since u cant join server, and will explain the IS in detail in it for those who are interested.
(10-10-2014, 01:48 AM)TSO Wrote: So you would be saving a clock cycle per instruction.
I spoke with him, and yes you do exactly what I described when programming assembly for the 386, with one slight exception. The instruction set does not carry the conditional with it, there is a branching operation and the CPU uses some kind of hardware look ahead in order to set the flags in one clock cycle so that the next cycle will pipeline correctly.
Also, when optimizing for speed on an ALU where not every operation takes the same amount of time but multiple simultaneous operations are possible, is it better to put the fast operations close to the CPU and have them be a hell of a lot faster than the slow ones, or put the fast farthest away and have it all kinda balance out? For example, my current instruction set, which I have discussed with LD would allow for a bit shift to occur three ticks after being instructed, and repeatable every six ticks, with the ability to run all the other operations at such speeds as well (the CPU can have a three tick clock). The Boolean operators are four ticks out, but also repeatable every six ticks. At the other end, the MOD function is 106 ticks out, so that's like 34 near operations for every far operation.
No please no, do not do a 3 tick clock its "theoretically" the fastest u can get with torches,,, but NO just NO! MC bugs are so disgraceful that ur clock cycles will be come uneven and will corrupt data in ways u never knew were possible... trust me, i had a huge project, and backed off the complexity and simplified the logic i was gonna use in my CU to get it to be a little longer clock,, well more then a little,, 3 ticks to 10 ticks, but the through put and penalty %ages are ridiculously less now as well. so it gives you better performance under normal operating conditions. Clock speed DOESNT mean more Power,, u have to take into consideration the IS, and the possible penalties the CPU could suffer from such small pipeline stages,,, and a 3 tick clock, leave 2 ticks for logic, 1 tick to store, so its really dumb xD i learned this the hard way... PC was the thing that we found killed it the fastest.