Computer Architecture Lab/Summer2006/PitterDeinhart/TCMP2.0
The Concept
[edit | edit source]The design is based on a 16bit load/store architecture. All instructions need two bytes space. A rather unusal (but already existing in some chips) idea is, that every opcode can be flagged as conditional.
Memory
[edit | edit source]Instruction and data memory are separated. Instructions are read from ROM data can be read and written to the RAM.
Since all instructions consist of 2 bytes the ROM can adress 128 kilobytes. The instruction pointer counts words. Constants can be loaded from ROM using the LDI* opcodes (see below).
Since RAM can be accessed byte per byte it can store only 64 kilobytes of data.
Registers
[edit | edit source]There are 16 general purpose registers. Their size is 16 bits and they are called ax, bx, cx, .. px.
Additionally there exists the 1 bit sized conditional flag (CF).
Instruction Set
[edit | edit source]Instruction Set Description
[edit | edit source]Conditionals
[edit | edit source]A conditional bit is added to each instruction. If this bit is 1, then the instruction is considered conditional and is only executed if conditional flag is set. The conditional flag can be modified by CMP* or by calling clear conditional flag (CCF). Those can be conditional instructions, too.
Load & Store
[edit | edit source]MS: you will need an indierct load and store (LD (ax), ST (AX)). Without those instructions you will be VERY limited. Then you will think again about a register size which is less than the address size - one of the big issues in the 8086..80286 (the segementation was a real pain) ;-)
The heart of our load/store architecture are these two commands. Actually one could argue that they are 32 commands, because they have the register number hardcoded into the opcode.
LD loads a word from RAM into a register ST stores a word from a register into the RAM LDIL,LDIH loads an immediate byte from ROM into a register
Comparison
[edit | edit source]All comparison instructions set the conditional flag as a result.
CMPEQ tests if two registers are equal CMPNE tests if two registers are not equal CMPGT tests if register 1 > register 2 CMPLT tests if register 1 < register 2 CMPEZ tests if register is equal to zero CMPNZ tests if register is not equal to zero
Is there any way to set the conditional flag when register 1 <= register 2 ? Or do we need a CMPLE ?
Is there any difference at all between GMPGT R1, R2 or GMPLT R2, R1 ?
Branching
[edit | edit source]JMP loads a registers value into the instruction pointer
ALU operations
[edit | edit source]ADD adds registers 1 to register 2 ADDI adds immediate value to register SUB subtracts registers 1 to register 2 SUBI subtracts immediate value from register AND bitwise locigally ands register 1 to register 2 OR bitwise locigally ors register 1 to register 2 XOR bitwise locigally xors register 1 to register 2 SHL shifts the register specified bits left SHR shifts the register specified bits right NOT bitwise logically nots register
Others
[edit | edit source]CCF clears the conditional flag NOP null operation, justs waits a cycle
Instruction Set Encoding
[edit | edit source]a .. ALU operation number c .. conditional flag (CF) i .. immediate value s .. source register number d .. destination register number m .. memory pointer register number j .. jump pointer register number _ .. don't care
ALUOPS c1aaaaaa ssssdddd
LDIL c000iiii iiiidddd LDIH c001iiii iiiidddd
LD c0100000 mmmmdddd ST c0100001 mmmmdddd
JMP c0100010 ____jjjj
CCF c0100110 ________ NOP c0100111 ________
optional and reserved for future use:
LOOP c0101iii iiiiiiii LDX c011iiii mmmmdddd
ALU ops
[edit | edit source]operations that do not modify CF: 0xxxxx
000000 ADD 000001 SUB 000010 AND 000011 OR 000100 XOR 000101 NOT 000110 SHL 000111 SHR 001000 ASR
operations that do modify CF: 1xxxxx
100000 CMPEQ 100001 CMPNE 100010 CMPGT 100011 CMPLT 100100 CMPEZ (d = dont care) 100101 CMPNZ (d = dont care)
The Assembler and Simulator
[edit | edit source]The assembler/simulator package can be downloaded from http://www.nix.at/sw/tcmp/
Note that only versions >= 0.1 only support TCMP2.0 while versions 0.0.* only support TCMP1. The version that represents the state at the end of the computer architecture course is 0.2.4.
The pacakge consists of the assembler and 3 different simulator programs, which are all described below. Read the INSTALL file in the package for installation hints/requirements.
The package is written in Objective Caml, wich is a really greate computer language. It will generate byte code and object code executables (if supported on your architecture). So if you find an executable called e.g. asm.opt you may call it instead of just asm and it will do the same, but a lot faster.
The assembler: tcmp/asm
[edit | edit source]The assembler can output text (to verify parser and some calculations), binary output (mainly used for simulation) and vhdl code (can be used as or in a ROM implementation).
When u call it with no arguments or with -h it will give you instruction how to use it:
asm: usage: ./asm (-b|-r|-t) <inputfile> -b binary output (better not to stdout..) -r vhdl rom code output -t asm text output (to verify parser)
Line Layout
[edit | edit source]The assembler layout is similar to nasm, its lines consist of up to three parts:
label: instruction operands ; comment
All three components are optional. Operands are separated by a ','.
The main difference is that there is an optional '?' that can be written directly before the instruction. That marks the instruction to be a conditional one.
Mnemonics
[edit | edit source]The mnemonics are lower case variants of the instructions.
There are some macro like mnemonics that will expand to several instructions when used:
jump reg,label ; this will jmp to label using reg for address generation
Example Program 1: Blinking around
[edit | edit source]This little example program blinks with the boards LED.
;; blink led ; begin: ;; counter delta ldil cx,1 ldih cx,0 ;; high word ldil bx,70 ; change this to adjust frequency ldih bx,0 bigloop: ;; low word ldil ax,0x20 ldih ax,0xa1 smallloop: sub cx,ax cmpnz ax ?jump (fx),smallloop ; sub cx,bx cmpnz bx ?jump (fx),bigloop doblink: not ox,ox jump (fx),begin
Example Program 2: Blinking around with changing frequency
[edit | edit source]This is a modification to the previous example, with variable frequency.
;; blink led ; ;; counter delta ldil cx,1 ldih cx,0 begin: ;; meta freq ldil ex,70 ldih ex,0 metaloop: ;; high word ldil bx,0 ; change this to adjust frequency ldih bx,0 bigloop: ;; low word ldil ax,0x20 ldih ax,0xa1 smallloop: sub cx,ax cmpnz ax ?jump (fx),smallloop ; add cx,bx cmpne bx,ex ?jump (fx),bigloop doblink: not ox,ox sub cx,ex cmpnz ex ?jump (fx),metaloop jump (fx),begin
Example Program 3: Instruction tester
[edit | edit source]Not really useful, but it covers all instructions.
; comment ldil ax,10 ; low byte of ax := 10 nop ldih ax,0 ; high byte of ax := 10 st ax,(bx) ; Memory[ax] := bx ld ax,(bx) ; ax := Memory[bx] ccof jump (fx),l1 ; fx := AddressOf(l1); ip := fx jmp (ax) ; ip := ax ccof ; c := 0 nop ; no operation add ax,bx ; bx := ax + bx sub ax,bx ; bx := ax - bx and ax,bx ; bx := ax and bx not ax,bx ; bc : = not ax or ax,bx ; bx := ax or bx xor ax,bx ; bx := ax xor bx shl ax ; ax := ax shl 1 shr bx ; bx := ax shr 1 asr cx ; cx := cx asr 1 cmpeq ax,bx ; c := ax == bx cmpne ax,bx ; c := ax != bx cmpgt ax,bx ; c := ax > bx cmplt ax,bx ; c := bx < ax cmpez ax ; c := ax == 0 cmpnz ax ; c := ax != 0 l1: nop ?not bx cmpez ax l2: ccof l3: l3b: not ax sub cx,ax jump (fx),l2 l4: jump (fx),l4
The simulator: tcmp/sim
[edit | edit source]This simulator takes an assembler program and simulates it instruction by instruction. You can optionally specify how much cycles it will simluate. It has its own little help screen, too:
./sim: usage: ./sim <inputfile> [<max-steps>]
If you simulate the first instruction of blink.asm you will get this output:
./sim blink.bin 1 | awk '{ printf " ";print}' registers: ax=00 cx=00 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=0 bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false RAM: 0000: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 001a: 0000 0000 0000 ROM: 0000: 0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500 000d: 2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700 001a: 9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205 executing instruction: -> ldil cx,1 (0000000000010010=0x0012) registers: ax=00 cx=01 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=1 bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false RAM: 0000: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 001a: 0000 0000 0000 ROM: 0000: 0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500 000d: 2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700 001a: 9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205
Please note that you won't see any hazards here, as this simulator knows nothing about pipelining at all. This can be useful to verify the correctness of the pipeline engine (or the builtin hazard prevention freature of the assembler).
The pipelined simulator: tcmp/psim
[edit | edit source]This is another simulator, which is completly different from the previous one. While it should give the same results it simulates the complete TCMP pipeline. So it will corrupt registers or RAM if there are hazards. In short: It will (try to) act like the processor in hardware.
While its internals are completly different, the usage is equal to the non pipelined simulator:
./psim: usage: ./psim <inputfile> [<max-steps>]
If we simlate 5 clock cycles of blink.asm we get:
RST IF: ip=0001 ID: cof=false rom[0000]=0012 that is ldil cx,1 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=00 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000 CLK IF: ip=0002 ID: cof=false rom[0001]=2700 that is nop EX: s23op=LDI s23rda=2 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000 CLK IF: ip=0003 ID: cof=false rom[0002]=1002 that is ldih cx,0 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=0000 s34ldiout=0001 CLK IF: ip=0004 ID: cof=false rom[0003]=0011 that is ldil bx,1 EX: s23op=LDI s23rda=2 rd1=0000 rd2=0001 s23aluops=10 s23ldiv=00 s23ldiop=1 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=3039 s34ldiout=0070 CLK IF: ip=0005 ID: cof=false rom[0004]=2700 that is nop EX: s23op=LDI s23rda=1 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=3039 s34ldiout=0001 CLK IF: ip=0006 ID: cof=false rom[0005]=1001 that is ldih bx,0 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000
ox=0000
bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000
px=0000
WB: s34op=LDI s34rda=1 ram[0000]=0000 s34aluout=0000 s34ldiout=0001
RST and CLK are the reset and the clock signal.
The graphical pipelined simulator: tcmp/gpsim
[edit | edit source]This is a graphical frontend to the pipelined simulator (psim). It accepts no option, instead it will show you a file dialog to choose some binary code to load.
The interface is quite simple and self explaining. Just take a look at this screenshot:
Pipeline Architecture
[edit | edit source]The TCMP2.0 pipeline consists of 4 stages:
- Instruction fetch
- Instruction decode
- Execute
- Write back
If you have not by now, please take a look at the picture in the gpsim section to get an overview. The line colors are showing from which stage the signals are coming (Cyan, Bue, Red, Green).
At this time there is no processor builtin hazard prevention, but a smart nop generateion at assembler level. Due to our not too unclever design, there is no need to delay the processor for more than one nop at once. Of course one could implement e.g. bypassing but to our luck that was not in the mandatory scope of computer architecture.
But maybe by the time you read this, it is already implemented in the simulator or even at vhdl level. So be sure to check the download packages!
VHDL Implementation
[edit | edit source]The VHDL implementation was generated by first creating the black boxes like ALU and register file. Then we basically translated the code from the pipelined simulator (can be found in pipeline.ml) into VHDL. Once that seemed to be complete we simulated the whole thing in Modelsim.
After a few fixes, mainly dangling signals, we downloaded the processor to the FPGA. To our suprise it nearly instantly worked. Maybe it was a good idea to simulate a lot with tcmp/*sim and Modelsim. On the other side, maybe we just had lots of luck :)
The full vhdl files can be downloaded here: [http://www.nix.at/tcmp2/vhdl.zip vhdl.zip].
Additionally a couple of testbenches were created to simulate some building blocks of the processor in ModelSim. The following screenshot describes a simulation of the processor running the blink2.asm program.