Computer Architecture Lab/Winter2006/JeitMossFrühRamb/ISA

Introduction

MatPRO is a pipelined CoProzessor that handels 4x4 Matrix-Operations like addition or multiplication. It's equipped with a SimpCom-Interface to ensure an easy transport connection to existing SoC-Solutions.

key data

16 bit Coprocessor for handling Matrix-Arithmetics
Dimension of Matrix is 4x4
basic datatype is 16 bit signed Integer
3 Matrix-Registers
16 general purpose 16 bit Registers
SIMD (Single Instruction Multiple Data) - Architecture

Instruction formats

To keep things simple, our instructions are all codeable in 16 bit. Besides this, all OP-Codes are 4 bit coded.

We distinguish between 5 different Instruction formats:

3 Opperand - Instructions

width="400px" }

Bits	15-12	11-8	7-4	3-0
Content	OPCODE	DESTREG	SRCREG1	SRCREG2

2 Opperand - Instructions

width="400px" }

Bits	15-12	11-8	7-4	3-0
Content	OPCODE	DESTREG	SRCREG	0000

1 Opperand - Instructions

width="400px" }

Bits	15-12	11-0
Content	OPCODE	Address

Load/Store Word - Instructions

width="400px" }

Bits	15-12	11-8	7-0
Content	OPCODE	DSTREG	Address

Load/Store Matrix - Instructions

width="400px" }

Bits	15-12	11-10	9-0
Content	OPCODE	DSTREG	Address

Conditional - Instructions

width="400px" }

Bits	15-12	11-8	7-0
Content	OPCODE	DSTREG	Address

Instructions

At the moment, we plan to implement 14 different instructions:

width="500"}

Instruction	OPCode	Description
nop	0000	does nothing
jmp	0001	set the PC to desired value
brz	0010	branche if zero: if the sourceregister is zero --> go on, else, jump to address
sub	0011	subtracts one 16 bit integer from the other and stores the result in the destinationregister
loadm	0100	loads a Matrix from the Memory at a given Addressstartpoint and stores it in the destinationregister
loadw	0101	loads a 16 bit Integer from the Memory at a given Address and stores it in the destinationregister
storem	0110	stores a Matrix from the Sourceregister to the Memory at the given Address
storew	0111	stores a 16 bit Integer from the Sourceregister to the Memory at the given Address
mulm	1000	multiplies 2 Matrix and stores the result in the destinationregister
addm	1001	adds 2 Matrix and stores the result in the destinationregister
subm	1010	subtracs 1 Matrix from the other and stores the result in the destinationregister
	1011	still free
	1100	still free
	1101	still free
mulw	1110	multiplies a Matrix with a Scalar and stores the result in the destinationregister
	1111	still free

Special purpose of OPCode

Since we are using 4 bit to code the desired register, only 16 Registers are possible to address. As mentioned at the beginning, we are using 32 registers (16 Matrix-Registers and 16 "normal" registers). So how do we know which register is meant to be read?

Here comes the OPCode into play.

If you analyze the OPCode, you will notice, that the first 2 bits of it decide what to do:

00xx: those are operations that don't to anything with a Matrix
01xx: those are load or store instructions
10xx: the sourceregisters are the Matrix-Registers
11xx: The first sourceregister is a Matrix and the second sourceregister is a 16 bit value

Assembler

width="500"}

Assembler	Operation
nop	nothing
jmp addr	PC <- addr12
brz i0,imm8	true: PC <- PC+1; false: PC <- imm8
sub i2,i0,01	i3 <- i0-i1
loadm m0,addr10	m0 <- (addr10)
loadw i0,addr8	i0 <- (addr8)
storem m0,addr10	(addr10) <- m0
storew i0,addr8	(addr8) <- i0
mulm m2,m0,m1	m2 <- m0*m1
addm m2,m0,m1	m2 <- m0+m1
subm m2,m0,m1	m2 <- m0-m1
mulw m1,m0,i0	m1 <- m0*i0

Legend:

m0,m1,...,mF ... Matrix-Registers

i0,i1,...,iF ... 16 bit Integer - Registers

addr8 ... 8 bit Address

addr10 ... 10 bit Address

addr12 ... 12 bit Address

imm8 ... 8 bit signed immediate

Assembler:

Assembler.zip

Block Diagramm

The matrix processor (MatPro) is designed as a coprocessor an communicates with the main processor (JOP in this case) with the SimpCon Interface. The memory is separated into a data cache and a instruction cache to keep the memory access simple. The data cache is double buffered to allow a more efficient data transfer beween the processors. The main processor writes all data and instuctions in the caches and sets RUN, then MatPro starts it's program and sets READY when the program execution is done.

MatPro Schematic