So you would be saving a clock cycle per instruction.
I spoke with him, and yes you do exactly what I described when programming assembly for the 386, with one slight exception. The instruction set does not carry the conditional with it, there is a branching operation and the CPU uses some kind of hardware look ahead in order to set the flags in one clock cycle so that the next cycle will pipeline correctly.
Also, when optimizing for speed on an ALU where not every operation takes the same amount of time but multiple simultaneous operations are possible, is it better to put the fast operations close to the CPU and have them be a hell of a lot faster than the slow ones, or put the fast farthest away and have it all kinda balance out? For example, my current instruction set, which I have discussed with LD would allow for a bit shift to occur three ticks after being instructed, and repeatable every six ticks, with the ability to run all the other operations at such speeds as well (the CPU can have a three tick clock). The Boolean operators are four ticks out, but also repeatable every six ticks. At the other end, the MOD function is 106 ticks out, so that's like 34 near operations for every far operation.
I spoke with him, and yes you do exactly what I described when programming assembly for the 386, with one slight exception. The instruction set does not carry the conditional with it, there is a branching operation and the CPU uses some kind of hardware look ahead in order to set the flags in one clock cycle so that the next cycle will pipeline correctly.
Also, when optimizing for speed on an ALU where not every operation takes the same amount of time but multiple simultaneous operations are possible, is it better to put the fast operations close to the CPU and have them be a hell of a lot faster than the slow ones, or put the fast farthest away and have it all kinda balance out? For example, my current instruction set, which I have discussed with LD would allow for a bit shift to occur three ticks after being instructed, and repeatable every six ticks, with the ability to run all the other operations at such speeds as well (the CPU can have a three tick clock). The Boolean operators are four ticks out, but also repeatable every six ticks. At the other end, the MOD function is 106 ticks out, so that's like 34 near operations for every far operation.