Forums - Open Redstone Engineers
My asynchronous pipelined CPU, aka AsPipe - Printable Version

+- Forums - Open Redstone Engineers (https://forum.openredstone.org)
+-- Forum: ORE General (https://forum.openredstone.org/forum-39.html)
+--- Forum: Projects & Inventions (https://forum.openredstone.org/forum-19.html)
+---- Forum: In Progress (https://forum.openredstone.org/forum-20.html)
+---- Thread: My asynchronous pipelined CPU, aka AsPipe (/thread-4837.html)



My asynchronous pipelined CPU, aka AsPipe - Magic :^) - 10-07-2014

Hello, MG here. I'm working on what functions to include in the ALU stage right now. I want to reduce the number of ops, but honestly I'm not sure what to get rid of. e.g. the nimplies ops.

BTW the following is a rough draft. In no way is this the finished list XP

Code:
Key: [word], {opcodes/arguments}, (generated words)


        [{if}{3-bit flag address}]      [{jump address}]
        [{jump}]                        [{jump address}]
        [{call}]                        [{function address}]
        [{prog}]                        [{program address}]
        [{return}]
        [{halt}]
        [{read}{3-bit peripheral addr}] [{8-bit address}]
        [{ptr}{3-bit ptr address}]
        [{register}{reg address}]
        [{const}]                       [{8-bit immediate}]
        [{pop}]
        [{add}{to reg address}]
        [{a-b}{to reg address}]
        [{b-a}{to reg address}]
        [{xor}{to reg address}]
        [{xnor}{to reg address}]
        [{or}{to reg address}]
        [{nor}{to reg address}]
        [{and}{to reg address}]
        [{nand}{to reg address}]
        [{a nimplies b}{to reg address}]
        [{b nimplies a}{to reg address}]
        [{!a}{to reg address}]
        [{SHR}{to reg address}]
        [{ptrW}{ptr address}]
        [{write}{address}]

------------------------------------------------


general pipeline:


1. Branch processing block:

    [{if}{3-bit flag address}] [{jump address}]
    [{jump}] [{jump address}]
    [{call}] [{function address}]
    [{return}]
    [{halt}]

2. Read processing block:
    [{read}{3-bit peripheral address}] [{8-bit address}]
    [{ptr}{3-bit ptr address}]

3. ALU processing block:

    Loading to ALU:
        [{register}{reg address}]
        [{const}] [{8-bit immediate}]
        ({const}) ({8-bit immediate}) // converted from [{read}{}] instruction
        [{pop}]

    ALU ops:
        [{add}{to reg address}]
        [{a-b}{to reg address}]
        [{b-a}{to reg address}]
        [{xor}{to reg address}]
        [{xnor}{to reg address}]
        [{or}{to reg address}]
        [{nor}{to reg address}]
        [{and}{to reg address}]
        [{nand}{to reg address}]
        [{a nimplies b}{to reg address}]
        [{b nimplies a}{to reg address}]
        [{!a}{to reg address}]
        [{SHR}{to reg address}]

    Misc:
        ({if}{3-bit flag address})

    Write ops:
    
        Push to write block:
            [{ptrW}{ptr address}] (a reg as physical address) (b reg as peripheral address)

        if a & b registers loaded, and no other ops have been executed:
            Push to write block:
                [{write}{peripheral address}] (a reg as value) (b reg as address)

        else:
            [{write}{to reg address}]
        
4. Write processing block:

    ({write}{peripheral address}) (value) (address)
    ({ptrW}{ptr address}) (physical address) (peripheral address)



RE: My asynchornous pipelined CPU, aka ASSPIPE - greatgamer34 - 10-07-2014

pick one subtraction op perferably a-b, get rid of the implies and get rid of the invert !a op.

if you want to invert a number going through the cpu, perform xnor on it, it will just invert it.

other than that it looks good!


RE: My asynchornous pipelined CPU, aka ASSPIPE - jxu - 10-07-2014

Asynchronous? But why??


RE: My asynchornous pipelined CPU, aka ASSPIPE - Magic :^) - 10-07-2014

(10-07-2014, 08:47 PM)͝ ͟ ͜ Wrote: Asynchronous? But why??

for the lolz ;P

Also, I like the challenge. Even if it ends up being ridiculously slow (hopefully not), I'll have learned something from it.


RE: My asynchornous pipelined CPU, aka ASSPIPE - TSO - 10-07-2014

Theoretically (again, one of my thought expiriments), one should be able to make a pipelined CPU where each clock cycle is actually triggered by the incoming program data, meaning it can run as fast as the data comes in. Thus, a clock is not needed, and it is therefore asynchronous.

Honestly, this sounds easier to build.


RE: My asynchornous pipelined CPU, aka ASSPIPE - Magic :^) - 10-08-2014

(10-07-2014, 11:10 PM)TSO Wrote: Theoretically (again, one of my thought expiriments), one should be able to make a pipelined CPU where each clock cycle is actually triggered by the incoming program data, meaning it can run as fast as the data comes in. Thus, a clock is not needed, and it is therefore asynchronous.

Honestly, this sounds easier to build.

That's exactly how I'm implementing it! Program data moves through the stages until the correct one is reached, and when the instruction resolves, the data is deleted and allows more instructions to fill the space left behind. In practice, that's a handshake protocol that requests and passes data along the pipeline as needed.


RE: My asynchornous pipelined CPU, aka ASSPIPE - TSO - 10-08-2014

Hmm... Sounds still a bit complicated. If data just kinda falls through the system, the data can force the system to process it at the exact rate it comes in. No saving is necessary if the instruction moves down the line with its data. All you need to do is use an edge detector and some delay equal to the delay of the operation being performed so that the detector just forwards the information to the next step once it has been calculated. When the instruction is completed, output data goes to memory, and instruction data just gets forwarded to nothing.


RE: My asynchornous pipelined CPU, aka ASSPIPE - Magic :^) - 10-08-2014

Here, I'll give you some of the relevant stuff I've looked up that will give you an idea of how the system will work:

I'm using a buffered asynchronous pipeline: http://en.wikipedia.org/wiki/Pipeline_(computing)#Implementations

A document that discusses the handshake protocols (Mine are MC optimized, not exactly the same):
http://www.cs.columbia.edu/~nowick/nowick-singh-ieee-dt-11-published.pdf

Here's another wikipiedia heading specifically on Asynchronous CPUs:
http://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU

My processor will be along the lines of what is described there.
By the way, these were looked up post-inception of this idea, I basically learned the terminology from there more than anything else, and helped me make some small improvements to my original designs.

EDIT: Yeah, I used to use terms like 'self-governing queue' and other random terms. No-one understood me DX


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-09-2014

I updated the IS a bit:
Code:
Key: [8-bit word], {op field}
        [{ifT}{3-bit flag adr}] [{jmp adr}] 01001xxx aaaaaaaa
        [{ifF}{3-bit flag adr}] [{jmp adr}] 01000xxx aaaaaaaa
        [{jump}] [{jump address}]           00001--- aaaaaaaa
        [{call}] [{function address}]       00010--- aaaaaaaa
        [{prog}] [{program address}]        00011--- aaaaaaaa
        [{return}]                          00100---
        [{halt}]                            00000---
        [{read}{periph adr}] [{adr}]        00110xxx aaaaaaaa
        [{ptrR}{ptr address}]               00111xxx
        [{regread}{reg address}]            10000xxx
        [{const}] [{imm}]                   10001--- aaaaaaaa
        [{pop}]                             10010---
        [{add}{to reg address}]             10100xxx
        [{a-b}{to reg address}]             10101xxx
        [{b-a}{to reg address}]             10110xxx
        [{xor}{to reg address}]             10111xxx
        [{xnor}{to reg address}]            11000xxx
        [{or}{to reg address}]              11001xxx
        [{nor}{to reg address}]             11010xxx
        [{and}{to reg address}]             11011xxx
        [{nand}{to reg address}]            11100xxx
        [{!a}{to reg address}]              11101xxx
        [{SHR}{to reg address}]             11110xxx
        [{regwrite}{to reg adr}]            11111xxx
        [{ptrE}{ptr address}]               10011xxx
        [{ptrW}{ptr address}]               01100xxx
        [{write}{periph adr}] [{adr}]       01101xxx aaaaaaaa



RE: My asynchronous pipelined CPU, aka As-Pipe - greatgamer34 - 10-09-2014

Why have b-a along with !a as OPs?


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-10-2014

It just seems easier to program with b-a and !a available. They aren't necessary, but are useful for reducing the amount of lines of code. The alu is hooked up to be order sensitive, as otherwise I'd have to double the amount of load-type instructions.
So a is loaded first, then b is set up to be loaded next.
If you write to reg 000, you actually write the result back to the a register, and then b is set up to be updated next if required.
Because reshuffling the a and b registers for a sub takes time, and because clearing the b register and using NOR also takes time, using b-a and !a seems like a faster alternative to me.


RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-10-2014

What is !a? Is that like a'


RE: My asynchronous pipelined CPU, aka As-Pipe - greatgamer34 - 10-10-2014

! == not

ie; !a == not a == invert a


RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-10-2014

So it is exactly like a'

Many systems either NOR against the null flag, or have a hardware register inverter. They can also have distinct opcodes that are the same operations just with inversions of certain inputs if a really large instruction set is fine. The x86 set has a mix of all of the above.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-10-2014

TSO that (second half) is how i implement it. The alu can do all logic ops using 3 possible outputs (xor, or, nor) and inverters in front of the a and b registers. !a is just inverting the a register and routing it into the main bus.

Anyways, I'm using microcode to translate 4-bit alu opcodes to the 5-bit codes it can interpret (I use the msb in my IS to indicate an alu op while in the main pipeline, so essentially 4-bit alu opcodes), so I can always replace one op with another with the slap of a torch


RE: My asynchronous pipelined CPU, aka As-Pipe - gera279 - 10-12-2014

NOR and NAND are the same thing, so one of those can be gotten rid of… Also, I would suggest choosing either XOR or XNOR, as they are normally used for testing and answers can be deduced from results of either.


RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-12-2014

NOR and NAND are far from the same thing.
AND is ab, but it is also (a' + b')'
NAND is (ab)', but it is also a' + b'
OR is a + b, but is also (a'b')'
NOR is (a + b)', but is also a'b'

Now your statement is that a'b'=(ab)'. Unfortunately, the inversion operator is not distributive across the unification of sets, this is easily shown with a Venn diagram.


I also remembered, you can get rid of a' by coding a NOR b and just making a and b read the same register. This operation would be no faster than a'. You can also put a' back into the instruction set, situated adjacent to the NOR function, and have a multiplexer at the NOR operator which either connects b or a to the second input. This only allows for a' to be found, though.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-12-2014

due to how the alu loads values, your a NOR b method would take an extra cycle. It loads one value at a time. I know that method isn't ideal, but if I go back and change it it will take too much time and I can't be changing every little thing.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-15-2014

I have the dataloop stage running now, I've tested most of the ops. I don't have any prom to plug in yet though, so the most i can do is loop one instruction. Namely: a=a+1

I wrote out the Fibonacci sequence in the meantime though:
Code:
#initial reg loads, calculate first 2 terms
0000|   const
0001|   00000000
0010|   const
0011|   00000001
0100|   add 001
0101|   regread 001
0110|   add 010

#main loop
0111|   regread 001
1000|   regread 010
1001|   add 001
1010|   regread 010
1011|   regread 001
1100|   add 010

1101|   jump
1110|   00000111



RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-15-2014

WTF? How do you already have this thing built? It's only been like... one week


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-15-2014

I only have the data loop made, so alu ops, regs, and stack.
no ram, no branching, no prom


RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-15-2014

Yeah, like memory is the part that takes forever. You'll be done with that crap in like two hours. I time lined my computer at six months to a year.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-15-2014

Yeah, I still have to finalise the protocols I'm going to use for my memory system. I'm thinking of dealing with hazards on a per-peripheral basis. So if a request is sent to a peripheral, it has its own hazard checking logic and will hold back the result until it is safe to do so. I figure it will be easier than having one BIG hazard detection unit. It also makes addressing via pointers easier, as the system I'm imagining only cares about physical addresses.

*shrug*

I may change that plan. It'll work itself out once I start building stuff.


RE: My asynchronous pipelined CPU, aka As-Pipe - LordDecapo - 10-15-2014

(10-10-2014, 01:10 PM)The Magical Gentleman Wrote: Anyways, I'm using microcode to translate 4-bit alu opcodes to the 5-bit codes it can interpret (I use the msb in my IS to indicate an alu op while in the main pipeline, so essentially 4-bit alu opcodes), so I can always replace one op with another with the slap of a torch

Hey I do that Big Grin I love my 4bit Op and 1 bit OpMode :>


Also magical, Dylan Freemanz, and I looked at ur progress so far,, and we circle jerked on how much we liked it Big Grin keep up the good work! Can't wait to see this done!


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-15-2014

I am grinning like an idiot

Oh yea, and the instructions are now being converted to 11 bit opcodes for the alu, it includes mode toggling and timing bits. The conversion still takes the same amount of time though (3 tick sync). I factor the decode/encode delay in to the timing section though, so the overall speed should still be pretty good.

IS update:
Code:
[{ifT}{3-bit flag adr}] [{jmp adr}] 01001xxx aaaaaaaa
[{ifF}{3-bit flag adr}] [{jmp adr}] 01000xxx aaaaaaaa
[{jump}] [{jump address}]           01010--- aaaaaaaa
[{call}] [{function address}]       01011--- aaaaaaaa
[{prog}{prog number}]               00010xxx
[{return}]                          00011---
[{halt}]                            00000---

[{read}{periph adr}] [{adr}]        01100xxx aaaaaaaa
[{ptrR}{ptr address}]               00111xxx

[{regread}{reg address}]            10000xxx
[{const}] [{imm}]                   01111--- aaaaaaaa
[{pop}]                             10010111
[{add}{to reg address}]             10100xxx
[{a-b}{to reg address}]             10101xxx
[{b-a}{to reg address}]             10110xxx
[{SHR}{to reg address}]             10111xxx
[{xor}{to reg address}]             11000xxx
[{xnor}{to reg address}]            11001xxx
[{or}{to reg address}]              11010xxx
[{nor}{to reg address}]             11011xxx
[{and}{to reg address}]             11100xxx
[{nand}{to reg address}]            11101xxx
[{!a}{to reg address}]              11110xxx
[{a}{to reg adr}]                   11111xxx

[{ptrE}{ptr address}]               10011xxx
[{ptrW}{ptr address}]               00100xxx
[{write}{periph adr}] [{adr}]       01101xxx aaaaaaaa

ALU microcode:

12 bit opcode: [if mode][3-bit control code][!a][!b][3-bit alu opcode][3-bit reg address]

the control codes with 0 as the msb are timing codes.
the other control codes are specific to the ops involved.

a/!a:       0 001 x0 001 aaa
nor/and:    0 001 xx 010 aaa
or/nand:    0 001 xx 011 aaa
xor/xnor:   0 010 x0 100 aaa
add/sub:    0 011 xx 101 aaa
SHR:        0 010 00 110 aaa

regread:    0 100 00 000 aaa
pop:        0 101 00 000 111
const:      0 110 00 000 ---
ptrE:       0 111 00 001 aaa

IFT/IFF:    1 000 00 000 aaa



RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-16-2014

I uploaded screens to imgur. I am tired. Here:
http://imgur.com/a/PsWOb


RE: My asynchronous pipelined CPU, aka As-Pipe - greatgamer34 - 10-16-2014

why no OCD pack?


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-16-2014

I'm getting on to that Wink Don't you worry


RE: My asynchronous pipelined CPU, aka As-Pipe - greatgamer34 - 10-16-2014

im working on a better resource pack based off of OCD, so it will be posted soon!


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-19-2014

Heyo, I got distracted on friday and wrote a compiler for my (not yet complete) processor. There are still a few bugs (like needing a \n at the end of the file, and only outputting to the console) but it is now compiling everything properly. I will post it once I give it some polish Wink

for now, here's the fibonacci sequence in assembly to show the capabilities of the compiler:
(I know having an if statement in fibonacci is stupid, but it's there for illustrative purposes XP)
Code:
# Fibonacci sequence in assembly code.

###############
#    MACROS   #
###############

# Use the ! symbol to name a constant.

!overflow = $111 # labeling the address for the overflow flag bit
                 # for if statements.

# The $ symbol marks a 3-bit argument.

!x = $\1 # register one being named as x
!y = $\2 # register two being named as y

# Using a backslash will tell the compiler
# to convert a number to binary. It will know
# how many bits to convert to, and will tell
# you at compile time if the number is above
# the bit limit.

###############
#   ASSEMBLY  #
###############

# The @ symbol marks an 8-bit immediate.

con @\0 # load constant 0 to reg a
con @\1 # load constant 1 to reg b

# The ? marker refers to its assigned constant.

add ?x  # reg x = a + b
regR ?x # a = reg x
add ?y  # reg y = a + b

# The > marker is an address pointer. It assigns
# the address of the following instruction to the
# constant name next to it. The constant can be
# referenced by any instruction that takes an
# 8-bit immediate. It is most useful for branching
# statements, obviously.

>loopStart

regR ?x # a = reg x
regR ?y # b = reg y
add ?x  # reg x = a + b
ifT ?overflow @00000000 # restarts program if overflow detected.
regR ?y # a = reg y
regR ?x # b = reg x
add ?y  # reg y = a + b

ifT ?overflow @00000000 # restarts program if overflow detected.
jump ?loopStart         # otherwise loop. Essentially an if/else statement.

Oh if you're interested in how I made the compiler, it's written in Python, and reads the source file one character at a time in two passes: once to log the macro/meta stuff, and then again for a proper compile and error checking.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-20-2014

The program's sufficiently polished for you to have a look at it. Yes, It's not object oriented, don't judge me!
Usage:
AsPasm.py <sourcefile> <targetfile>
or
AsPasm.py <sourcefile>
(that one just writes to an output.txt file.)


RE: My asynchronous pipelined CPU, aka As-Pipe - TSO - 10-21-2014

... You know, an assembly level compiler just maps strings to numbers using arrays and pointers, so the code for the compiler should be no longer than the instruction set.


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-21-2014

Ye, but this is a general compiler that I use. I customized for my assembly, and I added syntax checking. I also added macros stuff. Most of the functions in the compiler are being re-used for a higher-level language.

EDIT: Also, this is the biggest thing I have attempted besides your basic 'hello world' level stuff. I'm learning as I go xP


RE: My asynchronous pipelined CPU, aka As-Pipe - LordDecapo - 10-27-2014

Love. The progress. Big Grin I'll look at compiler when I get off www work: D


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-27-2014

I also made a bot to load the compiled output into minecraft. It uses a config file so I can keep it compatible with any changes to my memory designs, which is nice.
http://forum.openredstone.org/showthread.php?tid=4943


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 10-31-2014

I've finally gotten the branching stage of my cpu underway. Jump has been fully debugged now, so the rest of the ops should be fine.

It is looping nicely, and the instructions are being tagged correctly.

I also tested the virtual program space. You can select one of four 256 Byte sectors to run your program from. These sectors can be selected by an extended jump instruction. It's so protected that when you press reset, it doesn't reset the sector address XD

The extended jump instruction should really only be used in a bootloader. That's what it was intended for anyways.


RE: My asynchronous pipelined CPU, aka As-Pipe - LordDecapo - 10-31-2014

Nice!! Big Grin I like the idea of a bootloader too.
Hoping to do a similar thing for mine... Also your sector selection gave me a great idea... imma add a new perameter to my branching so u can jump between different sectors.. which would increase my theoretical Max inst memory to 65655(2^16 I believe is that Lol) sectors of 65655 words each. Big Grin


RE: My asynchronous pipelined CPU, aka As-Pipe - Magic :^) - 11-01-2014

Ye basically how I'm implementing my sector thing is to have a normal 8-bit pc, but on a 10 bit rom. I use the top two bits to select sectors. Those two bits are just edited register-style with repeater locks via the extended jump instruction.


RE: My asynchronous pipelined CPU, aka As-Pipe - LordDecapo - 11-01-2014

Schweets.
I have been driving like 12hours yesterday and today. And have been thinking a small bit about how the sector would work for mine. Imma have to wait till my logisim version to add it the exact way I would want to. As it would make the branch inst. 64bits long during a sector section branch... Ya it would be an insanely long branch. But it's the only way I see to have the response time stay consistent