除法器
与乘法相比,除法的实现较为复杂,运算过程如下:
过程:
- 被除数和余数:将余数和被除数视为一个,共享一个寄存器,初始值为被除数
- 除数:可视为不断右移,并和被除数相减
- 商:每个bit依次生成,可视为不断左移
除法器的工作流程
要注意的是,与手算相比,电路实现总是将余数减除数,所以如果出现差小于0,要执行回退操作。
怎么会退呢?其实没有真正回退地方法,由于前面执行的是减法,回退只需加回来就可以了。
除法器的电路实现
工作过程:
- 初始化:将8-bit被除数放到“余数寄存器”,4-bit除数放到“除数寄存器”的高4位,将4-bit商寄存器置零
- 执行减法运算:余数-除数,将结果放到“余数寄存器”
- 检查余数寄存器的最高位(判断正负)。如果值小于0,回退,商左移,新的最右位设为0;如果值大于或等于0,商左移,新的最右位设为1.
- 除数寄存器右移一位
- 检查是否为最后一轮(本例为第5轮)
32-bit除法器同理可得
除法器的面积优化
对上面32-bit除法器从面积分析:
- “除数寄存器”实际只使用了一半
- “商寄存器”初始是空的,从左到右依次填满
- “余数寄存器”初始是满的,有实际意义的位每过一个周期从左到右依次减少
对应的,我们可以得到针对上述问题的方案:
- “除数寄存器”缩减为32-bit的,无需支持移位
- 取消“商寄存器”,商从右端逐位移入“余数寄存器”
- 64-bit ALU缩小为32-bit
- “余数寄存器”只有高32位参与加减法运算
- “余数寄存器”需要支持左移和右移
- 运算结束时,商占据“余数寄存器”的低32位
其中有两点需要注意:
- 对优化的除法器右移,优化后的除法器的运算步骤是先相减再左移,比如0000,0111 / 0010, 每次做左移、减法,倒数第二步得到00110001,最后一次循环,减法再左移得到:00100011,因为真正的商是:0011, 而余数是0001。最终结果应该是先取低四位商,再右移取高四位余数。
- 和乘法器一样,余数寄存器实际应该是65位而保证加法器的进位不会丢失。
除法运算过程如下:
初始化时:
被除数和余数:将余数和被除数视为一个,共享一个寄存器div_rem_d,初始值为被除数扩大2倍后的值,位宽为2倍的被除数位宽2*width。所以在最后一次结果中商直接取余数寄存器的低半部分位宽的数值即可,余数即为余数寄存器的[2*width_A-1:width_A+1]位的值。
除数:将除数左移被除数位宽大小,作为除数寄存器的值,除数寄存器位宽为被除数位宽加上除数位宽; 商寄存器初始值为0,位宽为width
运算过程中,每次不断比较余数寄存器div_rem_d的值是否大于除数寄存器的值,如果大于或者等于,则余数寄存器div_rem_d更新方式为余数寄存器div_rem_d的值减去除数寄存器的值得到的结果左移一位再加1,如果小于,则余数寄存器div_rem_d更新方式为余数寄存器div_rem_d的值直接左移一位,上述处理过程共需要被除数位宽次比较过程,对于有符号数除法,可以先求绝对值,看作无符号除法,再根据有符号除法和无符号除法商与余数的关系,计算得到正确的余数和商,保证有符号除法的结果中余数一定是正数。
商:每个bit依次生成,可视为不断左移,电路实现总是将余数减除数,所以如果出现差小于0,要执行回退操作。回退只需加回来就可以了。
matlab代码
% % divide clear; close all; clc; WIDTH=16; div_A=[20855,21855,15,1166,11665,1115,1180,32777];%WIDTH=16 div_B=[2,2,5,15,123,112,134,129]; for i=1:length(div_A) for j=1:length(div_B) div_rem_d=abs(div_A(i))*2;%高位补div_A位宽-1=(WIDTH-1)个0,低位补1个0;余数初值 div_quo_d=0;%商位宽为WIDTH, div_C(i,j)=0; div_B_d=div_B(j)*2^(WIDTH); for k=1:WIDTH if div_rem_d>=div_B_d div_rem_d=(div_rem_d-div_B_d)*2+1; div_C(i,j)=div_C(i,j)*2+1; else div_rem_d=div_rem_d*2; div_C(i,j)=div_C(i,j)*2; end end div_rem_tmp2=floor(div_rem_d/2^(WIDTH+1)); div_rem(i,j)= div_rem_tmp2; end end
RTL代码
`timescale 1ns / 1ps module divide#(parameter width_A=16, parameter width_B=8 )( input clk, input rst_n, input[width_A-1:0] div_A, //被除数 input[width_B-1:0] div_B, //除数 input valid_in, output reg valid_out, output reg [width_A-1:0] div_C, //商 output reg[width_A-1:0] div_rem //余数,恒为正 ); parameter DIV_IDLE = 2'd0; parameter DIV_BUSY = 2'd1; parameter DIV_DONE = 2'd2; parameter DIV_ACK = 2'd3; reg [1:0] next_state,state; // reg [7:0] result; reg[width_A+width_B-1:0] div_B_d; reg[width_A+width_B-1:0] div_B_d_tmp; reg[2*width_A-1:0] div_rem_d; reg [2*width_A-1:0] div_rem_d_tmp; reg[width_A-1:0] div_quo; reg [width_A-1:0] div_quo_d; reg[3:0] cnt; wire [width_A-1:0] abs_A; wire [width_B-1:0] abs_B; wire [2*width_A-1:0] div_rem_tmp; assign abs_A = div_A[width_A-1] ? ~div_A+1 : div_A; assign abs_B=div_B[width_B-1] ? ~div_B+1 : div_B; assign is_neg = div_A[width_A-1] ^ div_B[width_B-1]; assign all_neg = div_A[width_A-1] & div_B[width_B-1]; always@(*) begin case(state) DIV_IDLE:begin if(valid_in) next_state=DIV_BUSY; else next_state=DIV_IDLE; end DIV_BUSY:begin if(cnt==width_A-1) next_state=DIV_DONE; else next_state=DIV_BUSY; end DIV_DONE:begin next_state=DIV_ACK; end DIV_ACK :begin next_state=DIV_IDLE; end default: next_state=DIV_IDLE; endcase end always@(posedge clk) if(~rst_n) state<=DIV_IDLE; else state<=next_state; always@(posedge clk) if(~rst_n)begin div_B_d<=0;//除数寄存器 div_rem_d<=0;//余数寄存器 // div_quo_d<=0;//商寄存器 div_rem<=0; div_C<=0; cnt<=0; valid_out<=1'b0; end else begin case(state) DIV_IDLE :begin valid_out<=1'b0; if(valid_in)begin div_B_d<={abs_B,{(width_A)1'b0}}; div_rem_d<={{(width_A-1)1'b0},abs_A,1'b0}; // div_quo_d<=0; cnt<=0; end end DIV_BUSY:begin div_rem_d<=div_rem_d_tmp; // div_quo_d<=div_quo; // div_B_d<=div_B_d; cnt<=cnt+1; end DIV_DONE:begin valid_out<=1'b1; if(all_neg) begin //A<0,B<0; div_C<=div_rem_d[width_A-1:0]+1; div_rem<=div_B_d[width_A+width_B-1:width_A]-{0,div_rem_d[2*width_A-1:width_A+1]}; end else if(is_neg) begin //A<0,B>0 // div_C<=~div_quo_d+1; /* div_C<=~div_rem_d[15:0]+1; */ div_C<=~div_rem_d[width_A-1:0]; div_rem<=div_B_d[width_A+width_B-1:width_A]-{0,div_rem_d[2*width_A-1:width_A+1]}; //31:17 end else begin //A>0,B>0 // div_C<=div_quo_d; div_C<=div_rem_d[width_A-1:0]; div_rem<={0,div_rem_d[2*width_A-1:width_A+1]}; end end DIV_ACK: valid_out<=1'b0; endcase end reg data_bit; assign div_rem_tmp=div_rem_d-div_B_d; always@(*)begin if(div_rem_d<div_B_d) begin div_rem_d_tmp={div_rem_d[2*width_A-2:0],1'b0}; data_bit=0; // div_quo={div_quo_d,1'b0}; end else begin div_rem_d_tmp={div_rem_tmp[2*width_A-2:0],1'b1}; // div_quo={div_quo_d,1'b1}; data_bit=1; end // div_B_d_tmp={1'b0,div_B_d[23:1]}; end endmodule
testbench
## testbench ```bash `timescale 1ns / 1ps ////////////////////////////////////////////////////////////////////////////////// module tb_divide; reg clk; reg rst_n; reg start; reg [15:0]div_A; reg [7:0]div_B; wire done; wire [15:0]div_C; always #10 clk = ~clk; initial begin rst_n = 0; clk = 1; #10; rst_n = 1; end divide u_divide( .clk(clk), .rst_n(rst_n), .div_A(div_A), .div_B(div_B), .valid_in(start), .valid_out(done), .div_C(div_C) ); reg [3:0]i; always @ ( posedge clk or negedge rst_n ) if( !rst_n ) begin i <= 4'd0; start <= 1'b0; div_A<= 16'd1; div_B<= 8'd1; end else begin case( i ) 0: if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'b1101_0101_0101_0101; div_B<= 8'd5; start <= 1'b1; end 1: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'd15; div_B<= 8'd5; start <= 1'b1; end 2: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'd1166; div_B<= 8'b11111011; //为了实验随手改成了负数 start <= 1'b1; end 3: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'd11665; div_B<= 8'd123; start <= 1'b1; end 4: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'd11665; div_B<= 8'd1; start <= 1'b1; end 5: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'd15; div_B<= 8'd3; start <= 1'b1; end 6: // if( done ) begin start <= 1'b0; i <= i + 1'b1; end else begin div_A<= 16'b1111_1111_1111_0001; //为了实验随手改成了二进制负数 div_B<= 8'b1111_1110; start <= 1'b1; end 7: begin i <= 4'd7; end endcase end endmodule