I think I've finally got it! The full analogy you can all understand as to how it works. Okay, It is a single CPU that only takes two bit opcodes. This then gets sent out to multiple ALU type components that are individually pipelined, but are independent of each other. One is a shift processor, one is a Boolean processor, one is an add/subtract/xor processor, one is a multiplication/square processor, one whole division/remainder processor, and one is a square root processor. (I think division approximation is going to have to go.), but there is one other ALU processor, a conditional statement prediction processor. It determines, using inequalities, the conditional results for the multiplication, division, and remainders before they are actually calculated so that the system doesn't have to wait on their longer process times.
These processors all share register memory, but that is all they have in common, their opcodes and syntax are different in each unit, and the time they take to complete is different, with the shift being the fastest (1 tick), but their mutual independence is critical to the understanding that this system may have a three tick throughput without issue because they also have priorities for register memory based upon when the operations were ordered. This prevents any writing conflict in the registers, but their actual priority and delay was predetermined in compile time so that the registers in use when a unit wrights are not conflicting with the registers being read or set aside by other units. These units therefore do not need their actual priority coded with them, they just need to have it already accounted for. The conditional predictor is the only one that is actually integrated into the CPU, and it controls the instruction pointer for the next line of code. Together, these two can predict a branch in two cycles. The other external device is a virtual memory unit that is only addressable by the CPU, and any external device you want can be added on the serial out if the proper driver is installed in main memory and linked in the program being run so that the compiler can integrate the driver into the program (yeah, I just went there). This does limit the number of devices you can connect considerably.
The main memory can also control itself, being more accurately described as a mainframe server, and is able to send and receive instructions from other computers attached to it's various output ports.
These processors all share register memory, but that is all they have in common, their opcodes and syntax are different in each unit, and the time they take to complete is different, with the shift being the fastest (1 tick), but their mutual independence is critical to the understanding that this system may have a three tick throughput without issue because they also have priorities for register memory based upon when the operations were ordered. This prevents any writing conflict in the registers, but their actual priority and delay was predetermined in compile time so that the registers in use when a unit wrights are not conflicting with the registers being read or set aside by other units. These units therefore do not need their actual priority coded with them, they just need to have it already accounted for. The conditional predictor is the only one that is actually integrated into the CPU, and it controls the instruction pointer for the next line of code. Together, these two can predict a branch in two cycles. The other external device is a virtual memory unit that is only addressable by the CPU, and any external device you want can be added on the serial out if the proper driver is installed in main memory and linked in the program being run so that the compiler can integrate the driver into the program (yeah, I just went there). This does limit the number of devices you can connect considerably.
The main memory can also control itself, being more accurately described as a mainframe server, and is able to send and receive instructions from other computers attached to it's various output ports.