sed fails with many -e statments

98 Views Asked by At

I have huge (12G, 5.9G 1.1G,57M) files that I need to massage into submission in order to successfully run in MySQL-Shell for import. I dont have a choice on how the files were created they were zipped up and landed on my desk. So I tried and was getting good results with one or two off "sed" statements. But when I try with many as you see below It slows to a crawl and fails sometimes. Sadly every one of these sed statements are needed. I am using some powerful 40 core machines with 252G RAM to process this so I dont believe it is the machine. Can someone recommend how best to accomplish this many search and replacements? Maybe I am using the wrong language to do this (sed)? Any suggestions would be greatly appreciated.

 ls -Sr ${_WORKING_DIR} | grep -P "^_insert_.*_.*.sql$" \
 | grep -v "_indexlog\|_sessions" \
 | xargs grep -l "),(" \
 | while read F; do \
 sed -E -i \
 -e 's/\),\(/\n/g' \
 -e "s/','/', '/g" \
 -e "s/^INSERT\sINTO\s\`.*\`\sVALUES\s*\((.*)\);$/\1/g" \
 -e "s/, null/, NULL/g" \
 -e 's/,NULL,/, NULL,/g' \
 -e "s/NULL,'/NULL, '/g" \
 -e "s/NULL,NULL/NULL, NULL/g" \
 -e "s/,NULL$/, NULL/g" \
 -e "s/',NULL/', NULL/g" \
 -e "s/',1,'/', 1, '/g" \
 -e "s/',0,'/', 0, '/g" \
 -e "s/('.*',)([0-9])/\1 \2/g" \
 -e "s/^([0-9]{0,9},)('.*)$/\1 \2/g" \
 -e "s/^(.*',)[^\s]([0-9]{0,9}.*)$/\1 \2/g" \
 -e "s/,1,/, 1, /g" \
 -e "s/(.*[0-9]{1}),'/\1, '/g" \
 -e "s/([0-9]{1}),([0-9]{1})/\1, \2/g" \
 -e "s/NULL,([[:digit:]])/NULL, \1/g" \
 $F; \
 echo "File has been worked with sed: $F" >> ${_LOG_FILE}; \
 done

Example file content:

 INSERT INTO `sessions` VALUES ('6799bfac-e716-4a23-b5f3-ac4aac4811d1', '9Du3Bn3cNPmKVZqqDgRz', 'null', 'null', '1', '2023-04-25',null, NULL),('f9f30fe2-d88c-4420-afc0-769fc1f745fb', '9Du3Bn3cNPmKVZqqDgRz', 'null', 'null','1', '2023-04-25', null, '{HUGE_JSON_ENTRY}');
 INSERT INTO `sessions` VALUES ('1b9b0452-1c53-4466-92b2-c3373ce4ab67', '9Du3Bn3cNPmKVZqqDgRz', null, 'null', '1', '2023-04-25',null, 'null'),('1ef7d5d7-b795-4e4c-a29d-e16ba880a522', '9Du3Bn3cNPmKVZqqDgRz', 'null','null', '1', '2023-04-25', null, '{HUGE_JSON_ENTRY}');
 INSERT INTO `sessions` VALUES ('d6529933-0e18-426c-8793-9d1711d0c0aa', '4EBdY2RT5xC9fc9rfqnz', 'null', null,'1', '2023-04-25', null, 'null');
 ....
1

There are 1 best solutions below

2
dam On

Why not try using pipelines? For multi-core, performance may be better with parallel processing.

example:

....
| while read F; do
sed -E 's/\),\(/\n/g' $F |
sed -E "s/','/', '/g" |
sed -E "s/^INSERT\sINTO\s\`.*\`\sVALUES\s*\((.*)\);$/\1/g" |
....
sed -E "s/(.*[0-9]{1}),'/\1, '/g" |
sed -E "s/([0-9]{1}),([0-9]{1})/\1, \2/g" |
sed -E "s/NULL,([[:digit:]])/NULL, \1/g"
> workFile;
mv workFile $F;
....