09-10-2016, 11:32 PM
/warp nubcpu1
Records broken:
1. Deepest pipeline (4 stages)
2. Fastest clock (8 ticks)
After 10 unfininished CPUs, cray finally did it...
A couple weeks ago (didn't notice until capo pointed it out), I designed and implemented a very simple 4-bit CPU architecture, which I call NubCPU I.
Pls ignore the great wall of sans.
It features 4 instructions (add, nand, load immediate, branch if zero flag). Forwarding and all that fancy stuff that is usually associated with pipelining was omitted for implementation simplicity, as well as memory access. There's 4 registers (including an implicit zero register), with reg1's contents mapped to a slightly hidden set of lamps. The CPU itself is 4bit.
Because there's no forwarding, you'll have to be smart with how you program it. A delay slot (or another independent instruction) must be put in between dependent operations, which is reflected in a program I put on its ROM, counting down from 5 to 0 then stopping.
Stages:
1. Fetch
- Increment/branch PC
- Load next instruction from ROM
2. Decode
- Parse opcode to generate control signals
- Read neccessary data
- Check flags if applicable
- Branch at the end of this cycle
3. Execute
- ALU does stuff or just does nothing
4. Writeback
- Write results (if applicable) to the specified register
Records broken:
1. Deepest pipeline (4 stages)
2. Fastest clock (8 ticks)
After 10 unfininished CPUs, cray finally did it...
A couple weeks ago (didn't notice until capo pointed it out), I designed and implemented a very simple 4-bit CPU architecture, which I call NubCPU I.
Pls ignore the great wall of sans.
It features 4 instructions (add, nand, load immediate, branch if zero flag). Forwarding and all that fancy stuff that is usually associated with pipelining was omitted for implementation simplicity, as well as memory access. There's 4 registers (including an implicit zero register), with reg1's contents mapped to a slightly hidden set of lamps. The CPU itself is 4bit.
Because there's no forwarding, you'll have to be smart with how you program it. A delay slot (or another independent instruction) must be put in between dependent operations, which is reflected in a program I put on its ROM, counting down from 5 to 0 then stopping.
Stages:
1. Fetch
- Increment/branch PC
- Load next instruction from ROM
2. Decode
- Parse opcode to generate control signals
- Read neccessary data
- Check flags if applicable
- Branch at the end of this cycle
3. Execute
- ALU does stuff or just does nothing
4. Writeback
- Write results (if applicable) to the specified register