You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Superscalar Out-of-Order Processor with Speculative Dynamic Scheduling
Custom RISC ISA · IEEE 754 Floating Point · Fine-Grained Multithreading · Tomasulo's Algorithm
Overview
sharpEdge is a hardware superscalar processor implemented in Verilog, featuring speculative dynamic scheduling via a modified Tomasulo's algorithm, fine-grained multithreading (2 threads per core), and a custom 16-bit RISC ISA with floating-point support and 32-bit address space.
Two instructions are issued sequentially per cycle — one to each thread — via DDR (Double Data Rate) access to the instruction memory. This interleaved scheme yields zero inter-issue dependencies by design.
if rf[rega] == 0: rf[15] ← pc; pc ← pc + signed_imm
j
pc ← pc + signed_imm
Reserved Registers
Register
Purpose
r0 (0000)
Hard-wired zero
r13 (1101)
Standard stack pointer
r14 (1110)
Destination for li (load immediate); comparison register for llw/swc
r15 (1111)
Receives PC on bzl (branch-and-link)
Specifications
Core Resources
Resource
Count
ROB slots
32
Reservation station slots (each)
8
Reservation stations
4 (MA, FALU, AU, LU)
Integer arithmetic units
4
Integer logical units
4
FP arithmetic-logical units
2
GPRs per thread
16 × 32-bit
Memory
Memory
Type
Instruction Memory
Dual-port ROM, shared by both threads
Data Memory
Synchronous dual-write, asynchronous dual-read
Atomic Operations & Cache Coherence
The ISA includes llw (Load Linked Word) and swc (Store Word Conditional) to support lock-free synchronization primitives. These instructions expose the underlying reservation mechanism needed to implement spin-locks in concurrent, multi-threaded programs — a prerequisite for safe access to shared memory regions between threads.
Version History
This is the third iteration of the sharpEdge CPU series, evolving from a Logisim unicycle (v1) through a 5-stage pipeline (v2) to a fully out-of-order, speculative, multithreaded core.
Version
Description
v1
Simple unicycle CPU without memory, limited ISA. Built in Logisim.
v2
5-stage pipelined CPU with memory and a custom RISC ISA. Built in Verilog.
(planned) Dual-core with associative L1 cache (snooping), shared direct-mapped L2, instruction cache, directory coherence protocol, and an interconnection network.
Future Additions
Instruction buffer & cache — burst fetch support and reduced instruction memory latency.
[30%] Tournament branch predictor — local + global history tables with a saturation arbiter; fewer mispredictions than the current 2-bit local predictor.
Second core — enabling full use of the cache coherence protocols developed in companion repositories.