Each array is listed horizontally, has n columns and a variable number of rows. Every array has the same n number of columns. There is a blank column between each array.

enter image description here

They can be combined using vstack(array1,array2,etc...) but this requires knowing how many arrays there are. Is there a way to vstack all of the arrays where the number of arrays is determined by whether there is a column heading populated in the first column of each array?

So if cell A1 (column heading) is not blank, there is an array in A:C. If E1 is not blank, then there is an array in E:G, etc... Since M1 is empty in the above example data, there are no more arrays. The number of rows varies with each array but is not more than 1000. The data within the array could be anything including some empty cells. However each row of data has at least one populated cell.

Replacing the n/a and errors with blanks is OK too.

Using only an excel formula, no VBA please. Possible solutions could be reduce() or a recursive lambda function, which is probably the answer, splitting the large range of data into the smaller arrays one at a time and stacking them until there are no smaller arrays left to stack. Working to find it as a solution.

If there are 3 arrays, then all three should be vstacked. If there are 10, then all ten, etc. The final array should not change the order of the rows or be data dependent. The column headings will be there and will be the same.

2

There are 2 best solutions below

4
Jos Woolley On

Try:

=LET(
    ζ, A1:AQ1000,
    ξ, LAMBDA(φ, TAKE(WRAPROWS(TOCOL(IF(ζ <> "", φ, NA()), 2), 10), , 1)),
    SORTBY(WRAPROWS(TOCOL(ζ, 1), 10), ξ(COLUMN(ζ)), 1, ξ(ROW(ζ)), 1)
)

Amend the last column referenced (AQ here) as required.

Note that this set-up may fail if any of the 'blanks' are in fact null strings (""), e.g., as a result of formulas in those cells.

5
David Leal On

Here a possible solution, that works for more than one empty columns, and the group of columns doesn't have to be three. It doesn't transform the input data to remove cell with errors (#N/A, #DIV/0!, etc.) and it considers each group can have a different number of rows.

=LET(in,A1:K11, h,TAKE(in,1), data,DROP(in,1), m,COLUMNS(in),
  idx, FILTER(SEQUENCE(,m), LEN(h)), hUx,UNIQUE(FILTER(h,LEN(h)),1),
  cData,CHOOSECOLS(data,idx), cols,COLUMNS(idx),size,COLUMNS(hUx), gr,cols/size,
  LastNonEmptyRow, LAMBDA(x,XMATCH(2,1/(x<>""),-1,-1)),
  skip, MAX(TOROW(BYCOL(cData, LAMBDA(x, LastNonEmptyRow(x))),2)),
  wrap, WRAPROWS(TOCOL(cData),size),
  sort, SORTBY(wrap,TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,skip))),
  filter, BYROW(sort, LAMBDA(x, AND(IFERROR(x="",FALSE)))),
  VSTACK(hUx,FILTER(IF(sort=0,"",sort), NOT(filter))))

Here is the output: output of the first scenario

The input range (in) can have empty rows, it will be cleaned up at the end. It only requires to include in the input range in all non empty rows for each groups. The formula only depends on a single range (in, easier to maintain it), the rest of the information is taken from in via TAKE and DROP functions.

The name idx, identifies non empty columns, looking for empty cells in the header (h). Since we are using LEN it covers empty cells as a result of a formula which returns an empty string (for example =""). The name hUx, has the columns of our interest (in our sample: {"a","b","c"}). The name cData (clean data) has the input data with the empty columns (group delimiter) removed. The name cols, the total number of columns (after removing empty columns). The name size the number of columns of each group header (i.e. 3). . The name gr, represents the number of column group we have (i.e. 3). The name wrap, wraps the input only considering non empty columns by size columns.

The data of wrap, doesn't have the information sorted in the way we want it, because we want to stack all the rows of the same group of columns one after another and WRAPROWS does it by row. The second input argument of SORTBY ensures the rows are sorted properly:

TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,skip))

Note: Maybe safer to avoid repeated values in the sequence to be generated replace skip by gr*skip, this will ensure the sequence doesn't overlap, so far it works in both of the scenario tested, but just in case. It may also work just to use ROWS(in) as skip value which doesn't require additional calculation, but I came up to this later.

To obtain the skip amount, we calculate the maximum number of non empty rows in cData, via user LAMBDA function LastNonEmptyRow iterating over all input columns via BYCOL. To ensure MAX doesn't return an error we need to remove error values from the output of BYCOL, we use for that TOROW with the input argument ignore=2.

To find the row of the last non empty value in a column we take the idea from here: Get value of last non-empty cell, see section Position of the last value. We just adapted it to use XMATCH which allows reverse search (more efficient in case there are more valid values than empty rows for a large dataset), instead of LOOKUP (we don't want the cell value, instead the position) and ROW (which is row specific). This approach covers most of the cases, since we have cells with error values and considering also the case of empty strings as a result of a formula:

 LastNonEmptyRow, LAMBDA(x,XMATCH(2,1/(x<>""),-1,-1))

The sort name has our desired order. Now we need to remove empty rows. We use for that FILTER. The condition: AND(IFERROR(x="",FALSE) uses IFERROR to consider cell with error values. We can also uses: AND(IFERROR(LEN(x),1)=0). Therefore filter name is TRUE for empty rows.

Finally, we use VSTACK to return the expected result concatenating the header with the calculated data. The condition: IF(sort=0,"",sort) ensures to transform zeros into empty cells. This step is not required, if you configure Excel properly on how to treat empty cells. Check my answer to the following question: How can I return blank cells as it is instead of printing as Zeros while using filter function.

Here is the output for the second test scenario provided by the OP returning the correct answer now (my first approach didn't consider the skip amount, and failed for this scenario): output of the second scenario

I haven't tested all the scenario, please test it and let me know. Thanks

Update

The following is a simplified version, based on the Note above:

=LET(in,A1:K11, h,TAKE(in,1), data,DROP(in,1),lh,LEN(h),
  idx, FILTER(SEQUENCE(,COLUMNS(in)),lh), hUx,UNIQUE(FILTER(h,lh),1),
  cData,CHOOSECOLS(data,idx), size,COLUMNS(hUx), gr,COLUMNS(idx)/size,
  wrap, WRAPROWS(TOCOL(cData),size),
  sort, SORTBY(wrap,TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,ROWS(in)))),
  filter, BYROW(sort, LAMBDA(x,AND(IFERROR(x="",FALSE)))),
  VSTACK(hUx,FILTER(IF(sort=0,"",sort), 1-filter)))