https://drive.google.com/open?id=1aQkJYojDNMjNjJYlggxbkTq-KmzALDDb
I have this file (citations.dmp) and im tryin' to insert the data separed by | into a mysql database using the following code:
import MySQLdb
file = open('citations.dmp', 'r').readlines()
list = []
for x in file:
a = str(x.replace('\t', ''))
a = str(a).split('|')
a.pop(len(a) - 1)
list.append(a)
db = MySQLdb.connect(
host='127.0.0.1',
user='root',
passwd='',
db='tururu'
)
c = db.cursor()
print('Inserting...')
query = """ INSERT INTO `citations` (`cit_id`,`cit_key`,`pubmed_id`,`medline_id`,`url`,`text`,`taxid_list`)
VALUES (%s,%s,%s,%s,%s,%s,%s)
"""
c.executemany(query, list)
db.commit()
db.close()
the table has the following format:
CREATE TABLE `citations` (
`cit_id` VARCHAR(200) NULL,
`cit_key` VARCHAR(200) NULL,
`pubmed_id` VARCHAR(200) NULL,
`medline_id` VARCHAR(200) NULL,
`url` LONGTEXT NULL,
`text` LONGTEXT NULL,
`taxid_list` LONGTEXT NULL);
For some reason i recieve the following error everytime:
Traceback (most recent call last):
File "C:/Users/lucas/PycharmProjects/bruno/tst.py", line 27, in <module>
c.executemany(query, list)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 281, in executemany
self._get_db().encoding)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 306, in _do_execute_many
v = values % escape(arg, conn)
TypeError: not all arguments converted during string formatting
Can you help me? i'm in the last 3 days trying to fix it.
You have a problem on this line in your import file:
it splits into 9 fields. Your format string for the insert query expects only 7. You need to add some validation in the file parsing to make sure the sub-lists that you push into the big import list (
list
variable in your code) always have exactly 7 elements, and deal with the exceptions accordingly - if the line has too few columns, either ignore it or populate with defaults, if it has too many, decide which ones are correct.I believe in this particular case the problem is caused by the incorrect assumption that the delimiter
|
can never appear in the string, as it does infootnote ||
. You can solve it by adding some regex logic to filter out such abnormalities.