django-import-export empty rows before csv header trigger exception while importing

1.2k Views Asked by At

While importing data from csv, I realized that this error is triggered if the first row is not the header

list indices must be integers or slices, not str


first_name,last_name,email,password,role
Noak,Larrett,[email protected],8sh15apPjI,Student
Duffie,Milesap,[email protected],bKNIlIWVfNw,Student

It only works if the first row is the header

first_name,last_name,email,password,role
Noak,Larrett,[email protected],8sh15apPjI,Student
Duffie,Milesap,[email protected],bKNIlIWVfNw,Student

...

I tried overwriting before_import to remove any blank row

def before_import(self, dataset, using_transactions, dry_run, **kwargs):
    indexes = []
    for i in range(0, len(dataset)):
        row = ''.join(dataset[i])
        if row.strip() == '':
            indexes.append(i)
    for index in sorted(indexes, reverse=True):
        del dataset[index]          
    return dataset

This works for all the rows, except the first row which should always contain the header, and if not the error is thrown.

1

There are 1 best solutions below

0
On BEST ANSWER

After hours of debugging, I found the ImportMixin class, which is in import_export/admin.py

The class contains a method called import_action that looks like this

def import_action(self, request, *args, **kwargs):
    ...
    import_file = form.cleaned_data['import_file']
    ...
    data = tmp_storage.read(input_format.get_read_mode())
    ...
    dataset = input_format.create_dataset(data)
    ...

As you can see, this is the function that reads the uploaded file to a string before passing it to input_format.create_dataset(). So all I had to do was adding a custom function that removed the blank lines

data = self.remove_blanks(data)
dataset = input_format.create_dataset(data)

import_export/admin.py/ImportMixin

def remove_blanks(self, data):
    return os.linesep.join([s for s in data.splitlines() if s.strip()])

This way any csv file will not have any blank line, which will force the first line to be the header and that solves the problem. I hope this will be useful to anyone facing the same issue.

UPDATE : There is also an easy way to do the same by overwriting create_dataset in import_export/formats/base_formats.py

import_export/formats/base_formats.py/TablibFormat

def create_dataset(self, in_stream, **kwargs):
    in_stream = os.linesep.join([s for s in in_stream.splitlines() if s.strip()])
    try:
        return tablib.import_set(in_stream, format=self.get_title())
    except:
        return tablib.import_set('', format=self.get_title())