Import unstructured Excel file to database

2.3k Views Asked by At

I have to import data from Excel files into the database. The structure of these files does not match with the structure of the model, so I guess I need to do some kind of data-manipulation to arrange the tuples accordingly.

The files I have to import look like this: excel file

The django-model has the following attributes: Country, Commodity, Year, Value

So what would be the best way to read the data, arrange it in the correct structure and import it into the database (preferably with automatic updates of existing tuples).

I spent a lot of time researching the existing python- and django-libraries for this requirements (like PyExcel, Pandas, Django-Excel, Django-Import-Export), but I couldn't really find out which is the best and if it supports the rearrangement of the data before importing.

I hope you can give me some recommendations and solutions for this task.

2

There are 2 best solutions below

2
On

If your data is an Excel file, it's structured. The nature of a table is to structure data. Unstructured data would be something like a text file or a directory of text files.

You can use Python's pandas library to import Excel files, manipulate them, and upload them to SQL databases.

import pandas as pd

df = pd.read_excel('tmp.xlsx')
df2 = pd.melt(df, id_vars = ['Country', 'Commodity'], var_name = 'Year', value_name = 'Value')
df2

input:

    Country Commodity  2009  2010  2011  2012  2013
0  Austria    Com. 1     1     1     1     1     1
1  Austria    Com. 2     2     2     2     2     2
2  Belgium    Com. 1     3     3     3     3     3
3   France    Com. 1     4     4     4     4     4

output:

   Country Commodity  Year  Value
0  Austria    Com. 1  2009      1
1  Austria    Com. 2  2009      2
2  Belgium    Com. 1  2009      3
3   France    Com. 1  2009      4
4  Austria    Com. 1  2010      1

If you have many Excel files, you can use glob or os.walk to iterate through/over a directory, and import some or all of the Excel files.

You can use pandas DataFrame.to_sql to upload your data to an SQL database, which requires defining the database connection (server address, login/pw, etc.).

0
On

This is a example for insert with one foreign key.

    models.py

        from django.db import models

        class Table1(models.Model):
            system = models.CharField(max_length=383)

        class Table2(models.Model):
            name = models.CharField(max_length=383)
            system = models.ForeignKey(Table1, blank=True, null=True, on_delete=models.CASCADE)

You need create a modelResource and override before_import_row for insert data in table1. And you can change name for fields and create tables and customize. 'column_name' is the name of field in excel data, and attribute is name of field in database. And in method nefore_import you can insert data in tables and asign foreignkeys.

And set ModelResource to admin class.

    admin.py
        from .models import Table1,Table2
        from import_export import resources,widgets
        from import_export.fields import Field
        from django.contrib import admin


        class Table2Resource(resources.ModelResource):
            name = Field(column_name='name',attribute='name')
            system_id = Field(column_name='system', attribute='system_id', widget=widgets.ForeignKeyWidget(Table1))

            class Meta:
                model = Table2
                skip_unchanged = True
                fields = ('name','system_id')

            def before_import_row(self,row, **kwargs):
                value = row['system']
                obj = Table1.objects.create(system = value) #create object place
                row['system'] = obj.id # update value to id ob new object


        @admin.register(Table2)
        class Table2Admin(ImportExportModelAdmin,admin.ModelAdmin):
            resource_class = Table2Resource

This is all.