Understanding how to bulk insert a many-to-many using PeeWee ORM

21 Views Asked by At

I`m a C# developer trying to solve a problem with python, please bear with my nonpythonic practices.

I'm trying to get my head around on how to implement a bulk insert on a Many-to-Many relationship scenario to create a shrank down analitical database (using SQLite3 for now) whilst normalizing repeating data. Here are the relevant model classes:

class BaseModel(Model):
    class Meta:
        database = ServiceLocator().getDatabase()
        legacy_table_names=False

class LogEntry(BaseModel):
    Id = AutoField()
    Level = ForeignKeyField(Level)
    File = ForeignKeyField(File, on_delete='CASCADE')
    Host = ForeignKeyField(Host)
    Date = DateTimeField()
    Contents = ManyToManyField(ContentDetail, backref="Entries")
    LineNumber = IntegerField()

class ContentDetail(BaseModel):
    Id = AutoField()
    PropertyName = TextField(index=True)
    PropertyValue = TextField(null=True,  default=None)
    ValueSizeInBytes = BigIntegerField()

and here is the a bit of the configuration to map the relationship table startup.py:

def main() -> None:
    locator = ServiceLocator()
    logger = locator.getLogger()

    try:
        with locator.getDatabase() as db:
            ContentDetailLogEntry = LogEntry.Contents.get_through_model() #<-- please notice this
            db.create_tables([File, Level, Host, ContentDetail, LogEntry, ContentDetailLogEntry])
        LogService().FeedDatabase()
    except Exception:   
        logger.critical(traceback.format_exc())

if __name__ == "__main__":
    main()

In order to process the big log files more quickly I devised a dictionary using the following "interface" (tuple of normalizable values as key and the list of related log entries model as value)

(propName:str, propVal:str, lenght:int) : []:LogEntry

that is automatically persisted but one by one which makes the whole process run for some minutes on each file (+/- 14min) and I`m facing like 200 files per batch

(...)
    with self.__resolver.getDatabase().atomic():
        with open(fileEntity.FileOrigin, mode="r", encoding="utf-8") as logFile:
(...)
    for tuple in self.__contentCache.keys():
        for entry in self.__contentCache[tuple]:
            entry.Contents.add(ContentDetail.create(PropertyName=tuple[0], PropertyValue=tuple[1], ValueSizeInBytes=tuple[2]))
            self.__logger.debug(f"Created content for {entry.LineNumber}")
(...)

I'm trying to figure out how I might make the ORM understand the FileEntry and ContentDetail relationship according to the docs http://docs.peewee-orm.com/en/latest/peewee/querying.html#inserting-rows-in-batches but I can`t in all my honesty even figure out how PeeWee is going to figure out the log entry and content relationship without the one-by-one Contents.add. Any ideas?

Maybe my approach is all wrong. Any suggestions are welcome

0

There are 0 best solutions below