Replace all occurences of regex string with info from input+string

79 Views Asked by At

I have a sql nodata dump, and I need to go through it and replace the engine part of each create table query. The point where I am stuck is that I need to mention the table name in each string i am replacing with for the respective table

Assuming a file as below

CREATE TABLE `tablename1` (
  -- #columns and keys
) ENGINE=InnoDB AUTO_INCREMENT=5075 DEFAULT CHARSET=utf8;
CREATE TABLE `tablename2` (
  -- #columns and keys
) ENGINE=something AUTO_INCREMENT=55 DEFAULT CHARSET=latin1;

The desired result is:

CREATE TABLE `tablename1` (
  -- #columns and keys
) ENGINE=-myreplacedstring/tablename1; -- #table name 1 is appended to this line
CREATE TABLE `tablename2` (
  -- #columns and keys
) ENGINE=myreplacedstring/tablename2; -- #table name 2 is appended to this line

I tried

fin = open('dump.sql','r')
filedata = fin.read()
fin.close()

newdata = re.sub('(?<=ENGINE).*;', '-myreplacedstring-', filedata)

fout = open('fed_dump.sql','w')
fout.write(newdata)
fout.close()

but this only replaces the string with a fixed string, regardless of which table it is.

I also tried to approach row by row to be able to grab the table name every time I pass it, but I am not sure how to proceed with this.

 with open('dump.sql') as infile, open('dump_fed.sql', 'w') as outfile:
     for line in infile:
         #tablename= if line contains create table, update tablename, else do nothing
         line = re.sub('(?<=ENGINE).*;', '-myreplacedstring-'+tablename, line)
         outfile.write(line)

I am stuck on how to get the table name for each table into my replaced string. Any help is appreciated.

2

There are 2 best solutions below

3
On BEST ANSWER

Your solution will certainly work, but you can do it quicker with backreferences. I tested this and it works (you could make it a one-liner but it´s more readable this way) :

pattern = r"CREATE TABLE `(.*?)`(.*?)ENGINE=.*?;"
replace_pattern = r"CREATE TABLE `\1`\2ENGINE=-myreplacedstring-\1;"
newdata = re.sub(pattern, replace_pattern, filedata, flags=re.DOTALL)

With :

  • ´.*?´ which is the non greedy version of ´.*´
  • ´\1´ is a backreference for the first capturing group for each occurence, which will be for example "tablename1", "tablename2" and so on
  • the patterns as raw string with the prefix ´r´ (which is a good habit to take)
  • the re.DOTALL flag which allows "." to match newlines

I proposed before the following solution, which is wrong because look-behinds, signaled by "(?<=...)", must contain fixed length patterns :

pattern = r"(?<=CREATE TABLE `(.*?)`.*?ENGINE=).*;"
newdata = re.sub(pattern, '-myreplacedstring-\1', filedata)
0
On

Figured it out. Not sure how elegant or unbreakable this is, but it seems to work fine for my case(taking a mysql nodata dump, replacing engine with federated connection string, to easily create a full federated DB from one db to another).

import re

def gettablename(string, defaultstring):
    if 'CREATE TABLE' in string: 
        return re.search('`(.*)`', string).group(1)
    else: 
        return defaultstring

with open('dump.sql') as infile:
    tablename='def' # do i need this default?
    for line in infile:
        tablename=gettablename(line,tablename)
        line = re.sub('(?<=ENGINE).*;', '-myreplacedstring-'+tablename, line)
        print line