Searching, Sorting and Saving Large Spreadsheet

42 Views Asked by At

I'm working with a rather large excel spreadsheet and am trying to find the quickest way to manipulate the data. The spreadsheet has 31 columns and 15,000 rows. I need to extract around 15 columns of information. Is there a 'best practice when working with this much data? I'm not able to use databases (such as SQL) of any kind.

I thought about creating a List or Dictionary and putting the cell value into their own List/Dictionary based on the column and then iterating through those but I'm not sure how fast it would be or if it's worth trying.

List<string> columnOne = new List<string>();

foreach(DatagridViewColumn Column in DatagridView2.Column)
{
   if (Column[0].Value.toString().Contains("lastName")
    {
        foreach(DatagridViewRow row in DatagridView2.Rows)
        {
            columnOne.add(row.Value.toString());
        }
    }
}

Any help or guidance you can provide would be greatly appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

I would define a class to represent one row of your data, and then a list of that class instance, first.

Let's say, this is your entire set of columns:

FirstName | LastName | Phone | DOB | Address | Nationality | Age |

And that you're only interested in FirstName, LastName, and DOB.

So I would define a class like so:

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime DOB { get; set; }
}

Then create a list of that class instances.

private List<Person> people = new List<Person>();

Then, instead of iterating through columns first, and inside that loop iterating through rows, I'd go about it the other way round.

foreach(DatagridViewRow row in DatagridView2.Rows)
{
    Person p = new Person();
    p.FirstName = row[0];
    p.LastName = row[1];
    p.DOB = row[3];
    people.Add(p);
}

Obviously the above is sort of pseudo-code, and won't compile. It's just to show you the idea. This has the big advantage of having data related to one row as one item in your list, as opposed to creating separate lists for separate columns, in which case you'll have to go to extra lengths to find your related data later on.