How to write the verilog to force yosys / nextpnr to output a manually designed logic tiles

1.1k Views Asked by At

I want to create a very compact parallel to serial shift register.

I have manually designed a logic tile.

I want yosys/nextpnr to just do the routing between this tile and the io pins.

I have design the code to use yosys primitive, but nextpnr fails to fuse the LUTs with the Carrys.

Here is the code:


module top (
    output PIN_21, PIN_22, PIN_23, PIN_24, USBPU,
    input CLK, PIN_1, PIN_2, PIN_3, PIN_4, PIN_5, PIN_6, PIN_7, PIN_8, PIN_9, PIN_10, PIN_11, PIN_12, PIN_13
);

    wire[12:0] loop;
    wire[12:0] carry;

    MyCell #(.LUT_INIT('h0F0F)) sRegBorder(loop[0], carry[0], 0, loop[0], PIN_13, 0, 0, CLK);
    MyCell #(.LUT_INIT('hFAFA)) sRegA(loop[1], carry[1], loop[0], loop[1], PIN_13, PIN_1, carry[0], CLK);
    MyCell #(.LUT_INIT('hFAFA)) sRegB(loop[2], carry[2], loop[1], loop[2], PIN_13, PIN_2, carry[1], CLK);
    MyCell #(.LUT_INIT('hFAFA)) sRegC(loop[3], carry[3], loop[2], loop[3], PIN_13, PIN_3, carry[2], CLK);
    MyCell #(.LUT_INIT('hFAFA)) sRegD(loop[4], carry[4], loop[3], loop[4], PIN_13, PIN_4, carry[3], CLK);
    MyCell #(.LUT_INIT('hFAFA)) sRegE(PIN_24, carry[5], loop[4], PIN_24, PIN_13, PIN_5, carry[4], CLK);
    SB_LUT4 #(.LUT_INIT('hFFFF)) sRegFin (PIN_22,0,0,0,carry[5]);

endmodule

module MyCell(output O, CO, input I0, I1, I2, I3, CI, CLK);
    parameter [15:0] LUT_INIT = 0;
    wire lo;
    SB_LUT4 #(.LUT_INIT(LUT_INIT)) lut (lo, I0, I1, I2, I3);
    SB_CARRY cr (CO, I1, I2, CI);
    SB_DFF dff (O, CLK, lo);
endmodule

The expected result is to have just one tile with a stack of 7 LUTs.

* PIN_13 should be connected to I2 of the first 6 LUTS.

* PIN_[1-6] should be connected to I3 of the first 6 LUTS, respectivelly.

* every output of the first 6 LUTs should be buffered (DFF) and the buffered output should loop to the I1 of the same LUT.

* every output of the first 5 LUTs shoud also be routed to the I0 of the next LUT in sequence.

* the carry logic should be enabled and flow through the first 6 LUTs and at LUT7 should be captured as an output.

The result I got from yosys looks OK, but nextpnr butchers the LUTs allover the place and allocated separate LUTs for the carrys, doubling the number of LUTs used.

So basically, if I know the output that I want, at least down to a specific tile configuration, What should I write as input?

I try to compile the code on a TinyFPGA.BX.

1

There are 1 best solutions below

2
On

I believe the answer is that what you try to do can currently not be done. I am sorry for this negative answer, I need tight packing too, so I hope someone can prove me wrong.

I did some investigations, downloaded the newest yosys and nextpnr (and arachne-pnr), and more or less copied your design. While ICECube2 gave the result I expected, neither yosys/arachne-pnr nor yosys/nextpnr-ice40 managed.

/* Just a toy implementation of a shift register, in itself unusable. Written 
 * to explore how arachne-pnr and nextpnr-ice40 handles SB_CARRY. The design is
 * not simulated, hence wrong.
 * 
 * Main results: Neither arachne-pnr nor nextpnr-ice40 are particularly intelligent
 * when it comes to SB_CARRY packing.
 * 
 * Details:
 * 
 * When SHOULD_WORK == 1
 * =====================
 * (a) yosys/arachne-pnr uses too many LUTs: 12
 * (b) yosys/next uses too many LUTs:        10 
 * (c) Lattice ICECube2:                      8 (as was expected)
 * 
 * Not satisfied, I tried to use an internal cell definition, ICESTORM_LC.
 * Worse and worse...
 * 
 * When SHOULD_WORK == 0
 * ======================
 * (d) yosys/arachne-pnr uses too many LUTs: 12
 * (e) yosys/next uses too many LUTs:        14
 *   
 * Commands used to produce (a) and (d):
 * yosys -p "synth_ice40 -blif hardware.blif" -q other.v
 * arachne-pnr -d 1k -P vq100 -p other.pcf -o hardware.asc  hardware.blif
 *
 * Commands used to produce (b) and (e):
 * yosys -p 'synth_ice40 -top top -json other.json' other.v  
 * nextpnr-ice40 -v --hx1k --json other.json --pcf other.pcf --asc other.asc
 *
 * Versions of programs used:
 * nextpnr-ice40 -- Next Generation Place and Route (git sha1 b863690).  (I pulled the 
 *     code today, 2019.12.05)
 * arachne-pnr 0.1+328+0 (git sha1 c40fb22, g++ 5.4.0-6ubuntu1~16.04.12 -O2)
 * Yosys 0.9+932 (git sha1 fcce940, clang 3.8.0-2ubuntu4 -fPIC -Os)
 * 
 */
module top 
  (
   output      PIN_22, // Combinatorical, high if shiftreg busy
   output      PIN_24, // Shift register output. 
   input       CLK,
   input       PIN_13, // Load
   input [5:1] PIN
   );
   wire [5:0] loop, cy;

   assign cy[0] = 1'b0;

`define SHOULD_WORK 1
`ifdef SHOULD_WORK 
   wire [5:0] cmb_loop;
   SB_LUT4 #(.LUT_INIT(16'haaaa)) l_border   ( .O(cmb_loop[0]),   .I3(1'b0),  .I2(1'b0), .I1(1'b0     ), .I0(PIN_13) );   
   SB_LUT4 #(.LUT_INIT(16'hcaca)) l_sh [4:0] ( .O(cmb_loop[5:1]), .I3(PIN_13),  .I2(1'b1), .I1(loop[4:0]), .I0(PIN) );
   SB_CARRY                       l_cy [4:0] ( .CO(cy[5:1]),    .CI(cy[4:0]), .I1(1'b1), .I0(loop[4:0])           );
   SB_DFF                         r_sh [5:0] ( .Q(loop), .C(CLK), .D(cmb_loop) );
`else
   wire       cmbloop0;
   SB_LUT4 #(.LUT_INIT(16'haaaa)) l_border ( .O(cmbloop0), .I3(1'b0), .I2(1'b0), .I1(1'b0 ), .I0(PIN_13) );
   SB_DFF r_border( .Q(loop[0]), .C(CLK), .D(cmbloop0) );
   ICESTORM_LC #(.LUT_INIT(16'hcaca),
                 .CARRY_ENABLE(1),
                 .DFF_ENABLE(1))
   l_shcyreg [4:0] ( .I0(PIN),  
                     .I1(loop[4:0]),
                     .I2(1'b1),
                     .I3(PIN_13),
                     .CIN(cy[4:0]),
                     .CLK(CLK),
                     .O(loop[5:1]),
                     .COUT(cy[5:1]));   
`endif

   SB_LUT4 #(.LUT_INIT(16'hff00)) l_empty    ( .O(PIN_22), .I3(cy[5]), .I2(1'b0), .I1(1'b0), .I0(1'b0));

   assign PIN_24 = loop[5];
endmodule

/* What I hope the code above describe. Should be a total of 7 SB_LUTs, 
 * and 1 SB_LUT to generate 1'b1
 *                    ___
 *                   |I0 |
 *                   |I1 |---------------------- PIN_22
 *                   |I2 |
 *                +--|I3_|
 *                |  FF00
 *                | cy[5]         
 *               /y\              
 *               |||  ___         
 *        PIN_5 -(((-|I0 |     _  
 *      +--------+((-|I1 |----| |--- loop[5] --- PIN_24  
 *      |     1 --(+-|I2 |    >_|  
 * +----(---------(--|I3_|         
 * |    |         |  AACC          
 * |    +---------(---------------+
 * |PIN_13        | cy[4]         |  loop[4]
 * 
 *            ::::::::::::::
 * 
 * |    |         |
 * |    +---------(---------------+
 * |              | cy[1]         |
 * |             /y\              |
 * |             |||  ___         |
 * |      PIN_1 -(((-|I0 |     _  |
 * |    +--------+((-|I1 |----| |-+  loop[1]
 * |    |     1 --(+-|I2 |    >_|  
 * +----(---------(--|I3_|         
 * |    |         |  AACC
 * |    |         0 cy[0]
 * |    |
 * |    +-------------------------+
 * |                  ___         |
 * +-----------------|I0 |     _  |
 * |                 |I1 |----| |-+  loop[0]
 * |                 |I2 |    >_|  
 * |                 |I3_|         
 * |                 AAAA   
 * | PIN_13         
 */

This is a very vexing problem, which seems to reside in nextpnr/ice40/pack.cc around line 192. I am sorry I can't help, but perhaps this input can be used to improve packing in nextpnr-ice40.