I currently have a mySQL database that I will be migrating to MongoDB. I have started writing a migration tool and I am curious if there is a more efficient way for me to reach my end goal.
The actual structure of the data is fairly complicated but I will attempt to use a simplified version to ask my questions. Say I have the following MySQL Tables:
Surveys
Survey_id
Title
CreateUser (fk)
Users
User_id
Fname
Lname
My Plan is to migrate these to MongoDB collections that basically match this structure. The foreign key would become a reference to the User
collection. I am currently planning on taking the following steps to do this:
SELECT Survey_id, Title FROM Surveys
- Insert this data into the Surveys collection in MongoDb
Select User_id, fname, lname FROM Users
- Insert into the users collection in MongoDB
SELECT CreateUser, SurveyId FROM Surveys
- Find every corresponding record in the
Users
collection based on the CreateUser id and insert the reference to theUsers
collection into the existingSurveys
document. - Remove the
Survey_id
field from everySurveys
document - Remove the
User_id
field form everyUsers
document
Is this a reasonable approach to take, or am I making things more complicated than they need to be?
If the data isn't huge, it makes sense to change the order of operations and do the mapping in RAM, i.e.
SELECT * FROM Users
(SQL id, MongoDB id)
to a hash tableSELECT * FROM Surveys
CreateUser
withHashtable[CreateUser]
Typically, this will be quite a bit faster because you don't need to update objects in mongodb and you won't have to query your data twice.
You should try to use batch inserts for MongoDB instead of inserting documents one-by-one. Instead of getting the newly created documents' ids from the database, you can assign the MongoDB primary key yourself. Otherwise the driver will do it anyway, not the database itself, so there's no real advantage in not doing so.
If the amount of data is huge (such that you can't keep the lookup tables in RAM), I'd try to stick to the lookups and process them subset-by-subset. That will be tricky if you have many foreign keys though.