I am trying to migrate GF(2m) binary polynomial multiplication code from python code to Verilog
ef bit_to_list(t, n):
#from int to binary list
S = [0 for i in range(n)]
i = -1
while t != 0:
S[i] = t % 2
t = t >> 1
i -= 1
return S
def list_to_bit(S):
"""
Converts the binary number in the list to an integer
Args:
S (list): A list of 0s and 1s, representing a binary number
Returns:
int: indicates the corresponding integer value
"""
n = len(S)
result = 0
for i in range(n):
result = (result << 1) + S[i]
return result
A=0x17a5b3f9ec1
B=0x71a3b6723027362b89d4ad344f678e139a023b693
C = [0 for _ in range(203)]
X=bit_to_list(A,41)
Y=bit_to_list(B,163)
for i in range(41):
for j in range(163):
C[i+j]=C[i+j]^(X[i]&Y[j])
print(hex(list_to_bit(C)))
However, after I implemented the verilog code in the following way, the result of modelsim simulation output was wrong, and the number of logic units consumed was obviously too small. I could not solve this problem. I hope you could give me some help, I will be very grateful.
module multiplier_41x163_direct(
input clk,
input rst_n,
input [40:0] A,
input [162:0] B,
output [202:0] C);
reg [202:0] out_reg;
integer i,j;
assign C=out_reg;
always @(posedge clk or negedge rst_n) begin
if(~rst_n) begin out_reg='b0; end
else begin
for(i=0;i<=40;i=i+1) begin
for(j=0;j<=162;j=j+1) begin
out_reg[i+j]=C[i+j]^(A[i]&B[j]);
end
end
end
end
endmodule
this is the result of verilog simulation

this is the result of python simulation
This is the algorithm I use to multiply polynomials pic1

I'm not super familiar with Verilog but I think I can at least point out what the problem is here.
In the python things work differently in terms of loops compared to Verilog.
Here is a quote from NANDLAND:
"For loops are an area that new hardware developers struggle with. You have likely seen for loops dozens of times in C, so you think that they are the same in Verilog and VHDL. Let me be clear here: For loops do not behave the same way in hardware as in software.
For loops in synthesizable code are used to expand replicated logic. They are simply a way of shrinking the amount of code that is written by the hardware designer."
This may be an oversimplification but your loop is happening all at the same time. Each branch of the loop is run at the same time, its not sequential.
So this part in python:
When i=0 and j=1 we are writing to c[1]. We then reuse that result when we get to i=1 and j=0 as we use the result c[1]. In this way the results of previous iterations are reused.
In your verilog you are just wiring inputs and outputs together in a weird way. When you write
out_reg[i+j]=C[i+j]^(A[i]&B[j]);you are not reusing the result C[i+j] from previous iterations you are using it as it was to being with.