Created: Sep 11, 2006
Updated: Sep 23, 2013
SVN Updated: Sep 26, 2013
Other project properties
Development status: Stable
Additional info: Design done , FPGA proven , Specification done
WishBone Compliant: No
8-bit RISC processor core based on the Vautomation uRISC
This is a "clean" reimplementation of the Vautomation uRISC processor core (aka the "V8", also named the Arclite core) based on ISA documentation only.
It implements the full v8 architecture with a few additions, most of which are optional:
* Thirty-six basic instructions (and four new instructions)
* 8-bit PSR(Program Status Register) with Zero, Carry, Negative, and Interrupt status bits, and 4 general purpose status bits.
* Eight 8-bit registers, R0 though R7.
* Accumulator register (R0)
* A 16-bit program counter
* Any two adjacent registers may be paired to create a 16-bit index register.
* Three basic addressing modes; addressed, indexed, and indexed with offset
The design adds a few new features, which can be enabled through generics:
* An optional auto-increment for indexed addressing modes ("LDX R4++" is equivalent to "LDX R4 ; UPP R4" )
* A new branching instruction, DBNZ (Decrement, and Branch if Not Zero)
* A new math instruction, MUL, uses on-board multipliers.
* The interrupt mask can now be set with the new instructions SMSK and GMSK
The Open8 is being designed to work optimally in newer FPGA architectures. It assumes 2 clocks for memory and register file latency.
This design has now fielded as a test stimulus controller hosted in an Altera 3C40, not once - but twice. It's primarily serving as a data acquisition controller / packet generator in those designs, and has performed trouble-free for well over a year. Additionally, as part of the test stimulus system, the Open8 is responsible for synchronizing output frequencies with the device under test. Due to the nature of these calculations, a 16-bit ALU/co-processor was written to "hardware accelerate" common math functions, rather than have to write emulations in assembly. This ALU has been included in the SVN repository. The Open8, and its ALU coprocessor, use about 2400 LE's in the FPGA.
It has also been fielded in several non-shipping test instruments and small emulators hosted in Altera 3C16's, where it performs a variety of tasks. I am presently looking to use it as a packet processor to bridge between a PC and a custom digital waveform generator design.
- Model is written in VHDL ('93)
- Simple RISC architecture and instruction set. All instructions fit in a single byte, with either 1 or 2 operands.
- 16-bit PC / address allows for 64kB of directly accessible memory (can be expanded with paging)
- Moderate number of general purpose registers
+ Eight byte-wide registers.
+ Any two registers may be paired as (Rn+1:Rn) to create an index register
+ R0 acts an the accumulator
- 8 interrupts, 1 NMI, 7 maskable. Interrupt controller is built into the core.
+ Interrupt controller keeps track of interrupt order and priority
+ Interrupt mask is controllable through two new instructions, SMSK and GMSK.
- Reasonably small gate-count, with strong fMax in "low-end" devices.
- Complete! The CPU has been synthesized and tested on an Altera DE2 board (Cyclone II 2C35).
- [UPDATE: the Hi-Tech compiler is no longer available.] Hi-Tech has now made their C compiler for the v8/Arclite architecture available as a demo. Note, the Open8 implements instructions that aren't in the stock v8/Arc core, so some of the generated code could probably be accelerated with a bit of hand optimization. (the DBNZ Rn instruction won't be used in loops for example)
- Source VHDL for the Open8 can be retrieved from either the "download" link, or from the SVN repository, above.
- An assembly language reference manual has been added to the source repository (March 20, 2011)
- A port of GNU binutils is in the SVN repository. This is a beta release, and has not yet been incorporated into the official binutils source base. Please report any bugs here, not at the binutils bugzilla.
- The Open8 is getting its first real use in a test set. It is implemented alongside a number of hardware accelerators, relegating it to primarily moving things around in memory, but so far it has performed well. There are some minor alterations, including an option to replace BRK with WAI - or WAit_for_Interrupt. When selected, there is no longer a true NOP available, but the ability to halt the processor waiting for an interrupt is a useful capability.
- BRK_Implements_WAI is tested, and shown to work correctly. An updated processor model has been checked in to SVN.
- The Open 8 has now successfully been fielded! The core in question used the new features recently checked in, and has worked remarkably well as a supervisory processor in a larger FPGA design. The whole system features a lot of hardware accelerators, including a 16-bit, bus-addressed ALU to handle some of the math, but using the Open8 has allowed the design to be a lot more flexible.
- A port of the GNU C/C++ compiler is underway, with no release date yet targeted. The calling conventions are still under design, and there will likely be changes to the instruction set to make it easier for the compiler to generate efficient code.
- A few bugs were found while regression testing an updated version of the Open8 processor core. Apparently the vectored interrupt controller didn't always obey priority. Also, it appears that auto-incrementing indexed loads and stores didn't complete execution of the UPP command. These have been both corrected.
- The ALU control signals were pipelined to improve fMax on smaller parts. This allowed a design targeting an Altera Cyclone 3C16 to go from ~60MHz to ~132MHz (without trying, the target frequency was 100MHz). Unfortunately, this also means that all math instructions (Opcodes 0 though 15 and GMSK) now take take 3 clock cycles to execute instead of one, like the MUL and UPP instructions. The only other instruction to suffer increased latency was the DBNZ instruction, which requires the status register to update before continuing. All other instructions retain their existing latencies. Unfortunately, this does imply that code should be regression tested on the model, as the total execution time in clock cycles will increase.
- As part of the update, a lot of superfluous code was stripped out. The model should be a lot easier to understand.