## 形式化方法在硬件验证软件中的应用和展望 李旻 东南大学 集成电路学院 2025.08.15 ## 个人介绍 李旻 ### ■ 基本情况 • 职务 东南大学集成电路学院 上岗研究员 • 研究方向 硬件形式化验证、智能EDA #### ■ 工作 & 教育经历 • 2023.05-2025.06 华为诺亚方舟实验室 • 2018.08-2023.03 香港中文大学 (导师: 徐强教授) #### ■ 业绩情况 · 业界:参与海思国产自研形式化工具开发,帮助华为多个处理器芯片 验证收敛 • 论文: 近五年发表EDA领域高水平会议20+ • 科研项目:承担国家优青 (海外)、海思EDA、港府多项科研项目 - 1. 工具 Hardware Formal Verification Tools - 2. 建模 Verification Env. Modelling - 3. 综合 RTL/Synthesis for Verification - 4. 求解器 Hardware Formal Verification Solver - 5. 调度 Job scheduling - 6. 展望 Prospect ### 为什么需要硬件形式化验证? 浮点除法查找表5/1066 Cells出错, 九十亿分之一出错概率=>4.75亿美元损失 Back at the time, using 3DS Max for rendering broadcasting animations, I would run into the problem on a weekly basis. Animation rendering batches would fail at a specific frame which would cause the Pentium to freeze. The frame could not be rendered at all, so I would instead render the particular frame on a 486 while the Pentium skipped it and carried on with subsequent frames. Raytracing was particularly problematic. It wasn't a rare error at all iny experience. 用户反馈当时使用该处理器进行动画渲染,某一帧一定会出错! 算力芯片单Die规格提升 极致PPA提升 极致芯片设计效率提升 => Bug可能性越高 ### 数字电路前端流程 This section first describes the definition of the novel 8-bit floating-point data format HiF8, including the support for specia values. Then, some consideration and design issues for HiF8 will be explained. Natural Language Description We propose a new general-surprose floating-point encouding and decoding method for data expression, for which the field with, dynamic range, and significant precision can be scale based on scenario requirements. This paper focuses on the floating-point instance for deep learning usage. On the basis of the IEEE 754 [73], HiF8 defines an additional dat field. Therefore, HiF8 consists of the four fields as listed in Table [7] a sign field, a do field, an exponent field, and a manifess a field. Specification (Machine Learning Model) (C Reference Model) uint\_fastsz\_t uiA; bool signA; int\_fastlE\_t expA; uint\_fast3z\_t sigA, uiZ; struct exp16\_sig32 normExpSig; int\_fast18\_t expZ; uint\_fast3z\_t sigZ, shiftedSigZ; **ESL** (Chisel) val io = IO(new Bundle { val inReady\_div = Output(Bool()) RTL Code Netlist src: 2021 CCF-中国软件大会 海思/阿卡斯 数字后端 ### 验证范式: Simulation or Formal? Simulation speed: 4.5 GHz - $16bit * 16bit : 2^32 patterns => 0.95 second$ - 32bit \* 32bit: 2^64 patterns =>129.98 years - ... 对于复杂Datapath电路,基于穷举的Simulation/emulation 是不现实的 Intel 08年开始对CPU微架构主要使用Formal; ### Simulaiton/Emulation ## 商用硬件形式化验证工具一览 (基于求解器技术) **Model Checking** SAT/SMT Assertion-Based VIP AMBA 3/4/5 checkers illustrative, optimized Popular standard Certification of protocols Configurable. for formal ### Formal验证输入输出件格式由三大家主导 Model checking工具: Cadence JasperGold - 芯华章GalaxFV - 华大九天FormalMC - 国微芯EsseFPV Datapath验证工具: Synopsys Hector - 芯华章HEC - 华大九天AveCEC - 国微芯 EsseFECT ### 国内EDA厂商仍为跟随发展为主 ### 学术界相关工具:综合/验证 ### What is Yosys @ 0. × Modern Yosys has full support for the synthesizable subset of Verilog-2005 and has been described as "the GCC of hardware synthesis." Freely available and open source, Yosys finds use across hobbyist and commercial applications as well as academic. #### Note Yosys is released under the ISC License: A permissive license lets people do anything with your code with proper attribution and without warranty. The ISC license is functionally equivalent to the BSD 2-Clause and MIT licenses, removing some language that is no longer necessary. Together with the place and route tool nextpnr, Yosys can be used to program some FPGAs with a fully end-to-end open source flow (Lattice iCE40 and ECP5). It also does the synthesis portion for the OpenLane flow, targeting the SkyWater 130nm open source PDK for fully open source ASIC design. Yosys can also do formal verification with backends for solver formats like SMT2. Yosys, and the accompanying Open Source EDA ecosystem, is currently maintained by Yosys Headquarters, with many of the core developers employed by YosysHQ GmbH. A commercial extension, Tabby CAD Suite, includes the Verific frontend for industry-grade SystemVerilog and VHDL support, formal verification with SVA, and formal apps. ### **ABC** ### A System for Sequential Synthesis and Verification #### Berkeley Logic Synthesis and Verification Group ### /\*\*CFile\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* [pdrInv.c] FileName SystemName [ABC: Logic synthesis and verification system.] PackageName [Property driven reachability.] [Invariant computation, printing, verification.] [Alan Mishchenko] ``` [liveness.c] SystemName [ABC: Logic synthesis and verification system.] PackageName [Liveness property checking.] Synopsis [Main implementation module.] [Sayak Ray] Affiliation [UC Berkeley] [Ver. 1.0. Started - January 1, 2009.] ``` ### 软件所开发相关工具 #### CHA: Supporting SVA-Like Assertions in Formal Verification of Chisel Programs (Tool Paper) Shizhen Yu<sup>1,2</sup>, Yifan Dong<sup>1,2</sup>, Jiuyang Liu<sup>3</sup>, Yong Li<sup>1</sup>, Zhilin Wu<sup>1,2</sup>, David N. Jansen<sup>1</sup>, and Lijun Zhang<sup>1,2</sup> State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China {yusz,liyong,wuzl,dnjansen,zhanglj}@ios.ac.cn, dong-yf18@tsinghua.org.cn <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China 3 Huazhong University of Science and Technology, Wuhan, China jiuyang@hust.edu.cn #### Formal Verification of RISC-V Processor Chisel Designs Shidong Shen<sup>1,2</sup>, Yicheng Liu<sup>1,2</sup>, Lijun Zhang<sup>1,2</sup>, Fu Song<sup>1,2</sup>, and Zhilin Wu<sup>1,2</sup> <sup>1</sup> Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China {shensd,liuyc,zhanglj,songfu,wuzl}@ios.ac.cn <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China Formal Verification of Chisel ## Integrating Exact Simulation into Sweeping for Datapath Combinational Equivalence Checking Zhihan Chen<sup>1,2</sup>, Xindi Zhang<sup>1,2</sup>, Yuhang Qian<sup>1,2</sup>, Qiang Xu<sup>3</sup>, Shaowei Cai<sup>1,2,\*</sup> 1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China 2. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 3. Department of Computer Science & Engineering, The Chinese University of Hong Kong {chenzh, zhangxd, qianyh}@ios.ac.cn; qxu@cse.culhk.edu.hk; caisw@ios.ac.cn SAT in EDA #### rIC3 Hardware Model Checker #### HWMCC rlC3 achieved first place in both the bit-level track and the word-level bit-vector track at the 2024 Hardware Mode Checking Competition (HWMCC'24). To view the submission for HWMCC'24, please checkout the HMMCC24 branch or download the binary release at <a href="https://github.com/gipsyh/riC3-HWMCC24">https://github.com/gipsyh/riC3-HWMCC24</a>. #### **Publications** - [CAV2025] The rIC3 Hardware Model Checker - [CAV2025] Deeply Optimizing the SAT Solver for the IC3 Algorithm - [DAC2024] Predicting Lemmas in Generalization of IC3 - [arXiv] Extended CTG Generalization and Dynamic Adjustment of Generalization Strategies in I #### rIC3 Tool Flow MC in EDA ### **FPV** Modeling主要对编译后的RTL和property建立formal模型,并转化为标准格式 (AIG文件) 送给Model Checking求解。 Formal模型表达式——!P & F (P为assert property, F为约束集合的累计) 由于MC原理是在合法区间内寻找反例,因此每向前推导一步都要是合法区间,即通过下图方式对约束做累计,来寻找反例!P。 src: 2021 CCF-中国软件大会 海思 ``` property write_skip; @(posedge clk) disable iff (rst) !wen |=> $changed(waddr); endproperty w_underfill: cover property (write_skip); ``` ### SVA建模 **Table 1** Semantics of CTL\*. Here, K is a Kripke structure, $\pi$ is a path, s is a state, p is an atomic proposition, f and g are state formulas, and $\varphi$ and $\psi$ are CTL\* formulas | $K, s \models p$ | iff | $p \in L(s)$ | |------------------------------------------|-----|--------------------------------------------------------------------------------------------------------------------------| | $K, s \models \neg f$ | iff | $K, s \not\models f$ | | $K, s \models f \lor g$ | iff | $K, s \models f \text{ or } K, s \models g$ | | $K, s \models f \land g$ | iff | $K, s \models f \text{ and } K, s \models g$ | | $K, s \models \mathbf{E}\varphi$ | iff | there is an infinite path $\pi$ starting from $s$ such that $K, \pi \models \varphi$ | | $K, s \models \mathbf{A} \varphi$ | iff | for every infinite path $\pi$ starting from $s$ we have $K$ , $\pi \models \varphi$ | | $K, \pi \models f$ | iff | $K, s \models f$ for the first state $s$ of $\pi$ | | $K, \pi \models \neg \varphi$ | iff | $K, \pi \not\models \varphi$ | | $K, \pi \models \varphi \lor \psi$ | iff | $K, \pi \models \varphi \text{ or } K, \pi \models \psi$ | | $K, \pi \models \varphi \wedge \psi$ | iff | $K, \pi \models \varphi \text{ and } K, \pi \models \psi$ | | $K, \pi \models \mathbf{X}\varphi$ | iff | $K, \pi^1 \models \varphi$ | | $K, \pi \models \mathbf{F}\varphi$ | iff | there exists an $i \ge 0$ such that $K, \pi^i \models \varphi$ | | $K, \pi \models \mathbf{G}\psi$ | iff | for all $j \ge 0$ we have $K, \pi^j \models \psi$ | | $K, \pi \models \psi \mathbf{U} \varphi$ | iff | there exists an $i \ge 0$ such that $K, \pi^i \models \varphi$ and for all $0 \le j < i$ we have $K, \pi^j \models \psi$ | Clarke, Edmund M., et al., eds. *Handbook of model checking*. Vol. 10. Cham: Springer, 2018. ### LEC/CEC,C2RTL, etc. Yes Equivalent? No Design Bug Setup Issue Debug SPEC/Setur /MAP Setup SPECIFICATION (e.g. RTL1 + Power Intent1) SEC Sequential Equivalence Checking (SEC) App Mapping (optional) IMPLEMENTATION (e.g.RTL2+Power **Fix IMP** src: Cadence JasperGold Figure 2: HW-CBMC Tool Flow Mukherjee, Rajdeep, et al. "Formal techniques for effective co-verification of hardware/software codesigns." Proceedings of the 54th Annual Design Automation Conference 2017. 2017. src: 2021 CCF-中国软件大会 海思 ### 时钟/复位 #### async2sync - convert async FF inputs to sync circuits yosys> help async2sync #### async2sync [options] [selection] This command replaces async FF inputs with sync circuits emulating the same behavior for when the async signals are actually synchronized to the clock. This pass assumes negative hold time for the async FF inputs. For example when a reset deasserts with the clock edge, then the FF output will still drive the reset value in the next cycle regardless of the data-in value at the time of the clock edge. -nolower Do not automatically run 'chformal -lower' to lower \$check cells. Note Help text automatically generated from passes/sat/async2sync.cc:30 #### clk2fflogic - convert clocked FFs to generic \$ff cells #### yosys> help clk2fflogic #### clk2fflogic [options] [selection] This command replaces clocked flip-flops with generic \$ff cells that use the implicit global clock. This is useful for formal verification of designs with multiple clocks. This pass assumes negative hold time for the async FF inputs. For example when a reset deasserts with the clock edge, then the FF output will still drive the reset value in the next cycle regardless of the data-in value at the time of the clock edge. -nolower Do not automatically run 'chformal -lower' to lower \$check cells. #### -nopeepopt Do not automatically run 'peepopt -formalclk' to rewrite clock patterns to more formal friendly forms. Help text automatically generated from passes/sat/clk2fflogic.cc:36 #### chformal - change formal constraints of the design yosys> help chformal chformal [types] [mode] [options] [selection] Make changes to the formal constraints of the design. The [types] options the type of constraint to operate on. If none of the following options are given, the command will operate on all constraint types: -assert \$assert cells, representing assert (...) constraints -assume \$assume cells, representing assume(...) constraints -live \$live cells, representing assert(s\_eventually ...) \$fair cells, representing assume(s\_eventually ...) -cover \$cover cells, representing cover() statements Additionally chformal will operate on \$check cells corresponding to the selected constraint types. https://yosyshq.readthedocs.io/projects/yosys/en/latest/cmd ref.html ### RTL层面综合 #### @ <del>.</del>\\ ### The Verilog and AST frontends This chapter provides an overview of the implementation of the Yosys Verilog and AST frontends. The Verilog frontend reads Verilog-2005 code and creates an abstract syntax tree (AST) representation of the input. This AST representation is then passed to the AST frontend that converts it to RTLIL data, as illustrated in Fig. 56. Fig. 56 Simplified Verilog to RTLIL data flow 1 #### o Optimization passes - clean remove unused cells and wires - muxpack \$mux/\$pmux cascades to \$pmux - onehot optimize \$eq cells for onehot signals - opt perform simple optimizations - opt\_clean remove unused cells and wires - opt\_demorgan Optimize reductions with DeMorgan equivalents - opt\_dff perform DFF optimizations - opt\_expr perform const folding and simple expression rewriting - opt\_ffinv push inverters through FFs - opt\_hier perform cross-boundary optimization - opt\_lut optimize LUT cells - opt\_lut\_ins discard unused LUT inputs - opt\_mem optimize memories - opt\_mem\_feedback convert memory read-to-write port feedback paths to write enables - opt\_mem\_priority remove priority relations between write ports that can never collide - opt\_mem\_widen optimize memories where all ports are wide - opt\_merge consolidate identical cells - opt\_muxtree eliminate dead trees in multiplexer trees - opt\_reduce simplify large MUXes and AND/OR gates - opt\_share merge mutually exclusive cells of the same type that share an input signal - peepopt collection of peephole optimizers - pmux2shiftx transform \$pmux cells to \$shiftx cells - recover\_names Execute a lossy mapping command and recover original netnames - share perform sat-based resource sharing - wreduce reduce the word size of operations if possible ### opt\_expr - perform const folding and simple expression rewriting yosys> help opt\_expr opt\_expr [options] [selection] This pass performs const folding on internal cell types with constant inputs. It also performs some simple expression rewriting. https://yosyshq.readthedocs.io/projects/yosys/en/latest/cmd ref.html ## 综合 ### Logic层面综合 Lee, Siang-Yun, et al. "A simulation-guided paradigm for logic synthesis and verification." *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 41.8 (2021): 2573-2586. Fig. 2. Framework for a Typical Sweeping Based CEC Algorithm. Chen, Zhihan, et al. "Integrating exact simulation into sweeping for datapath combinational equivalence checking." 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023. ### Logic层面综合 Fig. 4. Effectiveness comes from many techniques Koelbl, Alfred, et al. "Solver technology for system-level to RTL equivalence checking." 2009 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 2009. ## 求解 $(p)_{1}$ $(\overline{p}\,m_{1}m_{2})_{2}\,(\overline{p}\,\overline{m}_{1}\overline{m}_{2})_{3}\,(p\,\overline{m}_{1}m_{2})_{4}\,(p\,m_{1}\overline{m}_{2})_{5}$ $(\overline{m}_{1}\overline{c}\,a_{1})_{6}\,(\overline{m}_{1}c\,x_{1})_{7}\,(m_{1}\overline{c}\,\overline{a}_{1})_{8}\,(m_{1}c\,\overline{x}_{1})_{9}$ $(\overline{m}_{2}\overline{c}\,a_{2})_{10}\,(\overline{m}_{2}c\,x_{2})_{11}\,(m_{2}\overline{c}\,\overline{a}_{2})_{12}\,(m_{2}c\,\overline{x}_{2})_{13}$ $(\overline{x}_{1}\overline{u}\,\overline{v})_{14}\,(\overline{x}_{1}u\,v)_{15}\,(x_{1}\overline{u}\,v)_{16}\,(x_{1}u\,\overline{v})_{17}$ $(\overline{x}_{2}\overline{u}\,\overline{v})_{18}\,(\overline{x}_{2}u\,v)_{19}\,(x_{2}\overline{u}\,v)_{20}\,(x_{2}u\,\overline{v})_{21}$ $(\overline{a}_{1}r)_{22}\,(\overline{a}_{1}s)_{23}\,(a_{1}\overline{r}\,\overline{s})_{24}$ $(\overline{a}_{2}r)_{25}\,(\overline{a}_{2}s)_{26}\,(a_{2}\overline{r}\,\overline{s})_{27}$ (a) gates $G_1, \ldots, G_8$ (b) miter circuit (c) CNF with clauses $C_1, \ldots, C_{27}$ ■ Figure 1 Example of an equivalence checking problem for two identical (isomorphic) circuits consisting each of one AND, XOR, and ITE (multiplexer/if-then-else) gate. The miter circuit in the middle (b) compares the output of the two circuits and assumes they are different by feeding them into another XOR gate which in turn is assumed to produce the output value 1. The equational semantics (a) is shown on the left which after Tseitin encoding [67] gives the CNF (c), e.g., the last AND gate $G_8$ in the second circuit is encoded by the last three clauses $C_{25}$ , $C_{26}$ and $C_{27}$ . ### 1) 原始电路信息 #### Version 4.0.0 - · source code matches competition version 'sc2024' - fast variable elimination during preprocessing (in fastel.c) - · lucky phases as in CaDiCaL but before and after preprocessing and with unit extraction and SLURM semantics - reason jumping only for formulas with large binary clauses fraction - · U-shaped delta scaling of probing and elimination interval - option -o <output> to write simplified formula to a file - dynamically increased reduced-clauses fraction (60% 90%) - bounded variable addition (in factor.c - clausal congruence closure algorithm (in congruence.c) - generic preprocessing phase (using a subset-set of simplifier) - more vivification (tier0=irredundant,tier1,tier2,tier3) - added --no-conflicts as synonym to --conflicts=0 - · optimized and simplified vivification https://github.com/arminbiere/kissat/blob/master/NEWS.md 2) Datapath Lie: Symbolic Computer Algebra (RevSCA/AMulet); Theorem Prover and Rewrite(Acl2 + Vescmul) Biere, Armin, et al. "Clausal congruence closure." 27th International Conference on Theory and Applications of Satisfiability Testing (SAT 2024). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2024. - Fact1: many solvers, with many parameters to set in each solver - Fact2: many properties to prove - Fact3: limited resources, cannot run all tasks at the same time - Fact4: properties with affinity Fig. 1. Cone-of-influence of high- and low- affinity properties. Dureja, Rohit, et al. "Boosting verification scalability via structural grouping and semantic partitioning of properties." 2019 Formal Methods in Computer Aided Design (FMCAD). IEEE, 2019. ### 关键问题: **Grouping/Ordering/Assume Guarantee** =>相同时间,相同资源,解出最多的任务 ## AI时代下硬件形式化工具展望 更多可能 LLM时代下新的设计范式定义了 新的验证问题! ## LLM时代下硬件形式化验证展望 ### Time Cost for C2RTL verification (setup to fully proven) - · TCL refinement loop to run correctly - · fix constraints to run correctly - fix C&RTL input/output mapping - · TCL refinement loop to converagence - try case spliting, rerun, might still inconclusive. however we find the bottleneck ... - · to prove the bottleneck, try A, inconclusive - · to prove the bottleneck, try B, inconclusive - ... - proved a tiny step - .. - · proved all properties - Rerun the final worked TCL 现有商用Formal工具很难解决Scalability,只能尽可能提供多的手段,供验证人员循环迭代验证环境,构造诸多tcl脚本 Industrial Examples: 浮点除法用例:2000+ tcl验证 环境,花一年才达到完备收敛 src: https://www.bilibili.com/video/BV1fJ23YQE9Y/?spm\_id\_from=333.1387.f avlist.content.click&vd\_source=637e596cabafb504f6a386a96f7ec263 ### 1. Decompose a theorem into sub-goals **2. Solving** Sub-goals recursively with LLM | 版本 | 关键技术 | 核心能力提升 | 数学层级覆盖 | D 0 1 | |------|--------------------------|---------------------|--------|-----------| | V1 | 大规模合成数据 | 解决数据稀缺,实现 FIMO 全验证 | 高中竞赛 | DeepSeek- | | V1.5 | 截断-恢复机制 + RMaxTS + RLPAF | 动态纠错、树搜索多样化探索 | 高中→本科 | Prover | | V2 | 子目标分解 + 递归证明 | 非形式/形式推理统一、复杂问题分层破解 | 本科→研究生 | | ### LLM Prover实现复杂Datapath电路自动证明? 实现LLM对复杂验证目标的自动抽象和拆解,降低Formal工具使用门槛,进一步提高收敛效率 School of Integrated Circuits, SEU # 谢谢聆听 - 感谢2021 CCF-中国软件大会/全国形式化方法与应用会议相关议题提供思路; - 感谢华为海思相关项目组提供宝贵落地经验; - 感谢胡旭东同学对材料的贡献; ### 拥抱开源,合作共赢! 课题组主页: https://formind.netlify.app/