Skip to content

Fixes history

Running log of meaningful RTL, simulation, and tooling changes: what we changed, why it was needed, and where to look in the tree. Entries are newest first.

Use this alongside one-off write-ups (e.g. FIXES_APPLIED for the test-system batch) and reference docs such as PERF_PIPELINE_LOG.


How to add an entry

Copy the block below, paste it under the horizontal rule at the top (below “## YYYY-MM-DD”), and fill in:

### YYYY-MM-DD — Short title

| | |
|--|--|
| **Area** | hazard / memory / perf / sim / docs / build … |
| **Problem** | What was wrong or limiting? |
| **Change** | What we did (behaviour, not every line of diff). |
| **Rationale** | Why this approach; links to papers/issues if any. |
| **Files** | `path/one`, `path/two` |
| **Verify** | e.g. `make run_verilator TEST_CONFIG=isa TEST_NAME=rv32ui-p-lw` |

Keep each entry self-contained so a future reader does not need the chat log.


2026-03 — L2 miss-handling cycle counter (perf + pin)

Area L2, observability
Problem DMISS_STALL / IMISS_STALL mix L1 back-pressure, store buffer, uncached, and L2 fill; no isolated L2 miss service cycle metric.
Change nbmbmp_l2_cache drives l2_miss_busy_o in any cycle either I- or D-pipe is in miss service (MSHR alloc, victim WB, fill wait, or fill write to arrays). perf_stall_counters adds cnt_l2_miss_cycles (independent of stall_cause). NO_L2_CACHE: cpu ties l2_miss_busy low.
Rationale Wall-clock cycles where L2 is doing real miss work; does not replace stall accounting and is not additive with stall_cause buckets.
Files rtl/core/mmu/nbmbmp_l2_cache.sv, rtl/core/cpu.sv, rtl/tracer/perf_stall_counters.sv
Verify LOG_PERF_STALL=1 sim → L2 miss cycles line in log; hierarchy e.g. …i_soc.i_l2_cache.l2_miss_busy_o (wrapper names may differ).

2026-03-30 — SSR-style load-use (hazard unit)

Area hazard, pipeline correctness / performance
Problem Classic lw_stall stalled FE/DE and asserted flush_ex_o together with load-use detection. Flushing EX on that path removes the load from pipe2 without replay, and stalls blocked the pipeline pattern where the consumer can enter EX in the same cycle the producer load enters MEM.
Change Removed load-use FE/DE stall and load-use flush_ex. flush_ex_o is driven only by branch mispredict (pc_sel_ex_i). MEM→EX forwarding (fwd_* = 2'b10) plus ex_mem_bypass_data in cpu.sv supplies load/ALU/PC+4 data from pipe3 for operands. Dropped unused hazard_unit inputs (rd_addr_ex_i, rslt_sel_ex_0); updated wave.do and perf summary label (ex-only flush bucket).
Rationale Aligns with the “forward from EX/MEM register” idea in SSR (arXiv:1912.10663): dependent in EX, producer load in MEM (pipe3), bypass from the MEM-stage register instead of an extra decode stall + EX bubble.
Files rtl/core/hazard_unit.sv, rtl/core/cpu.sv, wave.do, rtl/tracer/perf_stall_counters.sv
Verify ISA tests with load/use and branches, e.g. rv32ui-p-lw, rv32ui-p-jalr, rv32ui-p-beq; CoreMark + LOG_PERF_STALLLOAD_RAW_STALL / load-use flushes should fall to ~0 for that class of hazard.

2026-03 — MEM-stage bypass data for execution (ex_mem_bypass_data)

Area forwarding, execute stage
Problem MEM→EX bypass mux used pipe3.alu_result for all pipe3 forwards. For loads, the value to write back / forward is read_data (and similarly pc_incr for JAL), not the address in alu_result.
Change Combinational ex_mem_bypass_data in cpu.sv: select by pipe3.result_src among read_data, pc_incr, alu_result; drive execution.alu_result_i.
Rationale Correct forwarding when the consumer in EX needs the load result (or JAL link value) from the instruction currently in MEM. Complements SSR timing; does not by itself remove LOAD_RAW counters if decode stall was still present.
Files rtl/core/cpu.sv
Verify Load-use and JAL-heavy tests; compare against Spike where applicable.

2026-03 — Pipeline performance logging (LOG_PERF_STALL)

Area observability, simulation
Problem No structured visibility into stall causes, flush overlap, and active-cycle denominators for benchmarks.
Change perf_stall_counters.sv: per–stall_cause buckets, flush-event buckets (trap / fence.i / BP / ex spill), overlap counters, $display summary at sim end. Makefile / Verilator / test runners pass +define+LOG_PERF_STALL or LOG_PERF_STALL=1; verilator_runner.py copies a snippet to perf_pipeline.log.
Rationale Separate UART from perf; reproducible text artifact next to verilator_run.log.
Files rtl/tracer/perf_stall_counters.sv, rtl/flist.f, rtl/core/cpu.sv, rtl/include/level_defines.svh, makefile, script/python/makefile/verilator_runner.py, script/python/makefile/test_runner.py, script/config/tests/coremark.conf, script/python/debug_logger.py, docs/PERF_PIPELINE_LOG.md
Verify make run_coremark (or any run with LOG_PERF_STALL=1); inspect results/logs/.../perf_pipeline.log and sim final banner.

(Older entries can be appended here when you land new fixes.)