Database insertion fails without error with scrapy

488 Views Asked by At

enter image description here

I'm working with scrapy and dataset (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data) which is a layer on top of sqlalchemy , trying to load data into a sqllite table as a follow up to Sqlalchemy : Dynamically create table from Scrapy item.

using the dataset package I have:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):

        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        db = dataset.connect(db_path)
        self.table = db[table_name].table


    def process_item(self, item, spider):

        try:
            print('TEST DATASET..')
            self.table.insert(dict(name='John Doe', age=46, country='China'))
            print('INSERTED')
        except IntegrityError:
                print('THIS IS A DUP')
        return item

after running my spider I see the print statements printed out in the try except block, with no errors, but after completion , I look in the table and see the screenshot. No data is in the table. What am I doing wrong?

2

There are 2 best solutions below

3
On BEST ANSWER

The code you posted is not working as is for me:

TypeError: __init__() takes exactly 2 arguments (1 given)

That's because the __init__ method expects a table_name argument which is not being passed. You need to implement the from_crawler class method in the pipeline object, something like:

@classmethod
def from_crawler(cls, crawler):
    return cls(table_name=crawler.spider.name)

That would create a pipeline object using the spider name as table name, you can of course use any name you want.

Also, the line self.table = db[table_name].table should be replaced by self.table = db[table_name] (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data)

After that, the data is stored: enter image description here

1
On

Maybe some problems with the Db connection. Put your this snippet into a try except to check for the problem.

try:
   db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
   db = dataset.connect(db_path)
   self.table = db[table_name].table
except Exception:
   traceback.exec_print()