What is the best solution for handling LARGE dataset.
I have txt files broken down into multiple files.
which if I add up it will be about 100 GB
the files are nothing more than just
uniqID1 uniqID2 etc
id pairs and if I want calculate things like 1:unique number of uniqIDs etc 2:list of other IDs uniqID1 is linked to?
what is the best solution? how do I update these into a database?
thank you!
So if you had a table with the following columns:
with about five billion rows in the table, and you wanted quick answers to questions such as:
a relational database (that supports SQL) and a table with an index on id1 and another index on id2 would do what you need. SQLite would do the job.
EDIT: to import them it would be best to separate the two values with some character that never occurs in the values, like a comma or a pipe character or a tab, one pair per line:
EDIT2: You don't need relational but it doesn't hurt anything, and your data structure is more extensible if the db is relational.