Insert into database from an Excel file only rows that don't exist, using C#, EF, and SQL Server

1.9k Views Asked by At

I read an Excel file and insert that data into a database table, but every time I do this, it adds the existing rows plus the new data, I only want to insert the rows that aren't already in the table, my unique ID is the current time stamp.

For example, this is what happens currently when I do the first insert:

ExcelFile                           Database Table

a | b | date                        a | b | date
-----------                        ---------------
1 | 1 | 2018/02/12                  1 | 1 | 2018/02/12  
2 | 2 | 2018 /03/12                 2 | 2 | 2018 /03/12 

This happens when I do the second insert:

ExcelFile                           Database Table

a | b | date                        a | b | date
-----------                        ---------------
1 | 1 | 2018/02/12                  1 | 1 | 2018/02/12  
2 | 2 | 2018 /03/12                 2 | 2 | 2018 /03/12 
3 | 3 | 2018 /04/12                 1 | 1 | 2018/02/12
                                    2 | 2 | 2018 /03/12 
                                    3 | 3 | 2018 /04/12

I use Entity Framework to perform this and the ExcelDataReader package:

var result = reader.AsDataSet();

DataTable dt = new DataTable();
dt = result.Tables[0];      // here I store the data from the Excel file

foreach (DataRow row in dt.Rows)
{
    using (AppContext context = new AppContext())
    {
        Data data = new Data();
        string date = row.ItemArray[4].ToString();
        DateTime parseDate = DateTime.Parse(date);
        Data datos = new Data
                            {
                                a = row.ItemArray[0].ToString(),
                                b = row.ItemArray[1].ToString(),
                                c = row.ItemArray[2].ToString(),
                                d = row.ItemArray[3].ToString(),
                                e = parseDate
                            };
        context.Data.Add(datos);
        context.SaveChanges();
    }
}

Is there a way to filter the excel file or compare them?

I'm all ears.

3

There are 3 best solutions below

1
On BEST ANSWER

Check for an existing row before adding it. The below should be inserted below where you calculate parseDate.

var existingRow = context.Data.FirstOrDefault(d=>d.e == parseDate); //Note that the ".e" should refer to your "date" field
if (existingRow != null)
{
  //This row already exists
}
else
{
  //It doesn't exist, go ahead and add it
}
0
On

If "a" is the PK on the table and unique across rows then I would check the existence of existing rows by ID before inserting. Similar to Mike's response, though one consideration is if the table has a number of columns I would avoid returning the entity, but rather just an exists check using .Any()

if (!context.Data.Any(x => x.a == row.a)
  // insert the row as a new entity

The caveat here is if the excel file contains edits, existing rows where the data changes, this will not accommodate that.

For bulk import processes, I would typically approach these by first staging the excel data into a staging table first. (purging the staging table prior to each import) From there I would have entities mapped to staging tables, vs entities mapped to the "real" tables. If there is a "modified date" that can be extracted from the file for each record then I would also store the import date/time as a part of the application, so that when selecting the rows to import from the staging table, only get records where that modified date/time > the last import date/time. From there you can query data from the staging table in batches, and look for new records vs. existing modifications. I find querying entities on both side of the migration is more flexible than dealing with an in-memory block for the import. With small imports it may not be worthwhile, but for larger files where you will want to work with smaller sub-sets and filtering, it can make things easier.

0
On

I could perform exactly what I needed with the help of @MikeH With this only the rows with different DateTime were added (the DateTime its always an ascending value in my case.)

foreach (DataRow row in dt.Rows) // dt = my dataTable loaded with ExcelDataReader
                    {
                        using (AppContext context = new AppContext())
                        {
                            string date = row.ItemArray[4].ToString(); 
                            DateTime parseDate = DateTime.Parse(date); // I did a parse because the column "e" only accepted DateTime and not String types.

                            var existingRow = context.Data.FirstOrDefault(d => d.e == parseDate);
                            if (existingRow != null)
                            {
                                Console.WriteLine("Do Nothing");
                            }
                            else
                            {
                                Data datos = new Data
                                {
                                    a = row.ItemArray[0].ToString(),
                                    b = row.ItemArray[1].ToString(),
                                    c = row.ItemArray[2].ToString(),
                                    d = row.ItemArray[3].ToString(),
                                    e = parseDate
                                };
                                context.Data.Add(datos);
                                context.SaveChanges();
                            }
                        }
                    }