I am trying to work on a data modification using Python3. The formatted data will then be saved and consumed further as a CSV.
I have a really messed up data (500k rows) but I will simplify the format for the question. Let's say we have data :
data=Global{abc{198},cdf{121},nvm,121}
As we can see the Global is a master group which contains everything inside a curly bracket. Inside Global there are other group as well like abc , cdf and few individual records nvm & 121 .
Now I have to extract the individual groups, records and then append the master group and then make it a pipe delimited.
The code I wrote as following:
import re
data = "Global{abc{198},cdf{121},nvm,121}"
regex_pattern = "\\b([\\w-]+)\\{([^{}]+)\\}"
def extract_text(input_text):
result = []
while '{' in input_text:
match = re.search(regex_pattern, input_text)
if match:
prefix = match.group(1)
result.append(prefix + '{' + match.group(2) + '}')
input_text = input_text[:match.start()] + input_text[match.end():]
else:
break
return ' | '.join(result)
result = extract_text(data)
print(result)
Which gives me result as:
abc{198} | cdf{121} | Global{,,nvm,121}
The logic I have used, grabs everything inside the curly bracket and then append it with first word before opening curly bracket.
But my expected output is:
Global{abc{198}} | Global{cdf{121}} | Global{nvm,121}
I am trying to build a logic here. Any suggestion would be appreciate.
I am providing the actual data and expected data as below:
raw data:
GLOBAL-VPN-121{GLOBAL-VPN-ALL{AUS-VPN-128{npm_192.168.101.1/24:192.167.101.1/24,npm_121.147.101.1:121.147.101.1},npm_192.168.101.1:192.168.101.1,GLOBAL-VPN-SUB{HK-VPN-128{npm_192.168.101.1/24:192.167.101.1/24,npm_121.147.101.1:121.147.101.1}},npm_192.168.101.1:192.168.101.1}}
Expected Data
GLOBAL-VPN-121{GLOBAL-VPN-ALL{AUS-VPN-128{npm_192.168.101.1/24:192.168.101.1/24,npm_121.147.101.1:121.147.101.1}}} | GLOBAL-VPN-121{GLOBAL-VPN-ALL{npm_192.168.101.1:192.168.101.1}} | GLOBAL-VPN-121{GLOBAL-VPN-ALL{GLOBAL-VPN-SUB{HK-VPN-128{npm_192.168.101.1/24:192.168.101.1/24,npm_121.147.101.1:121.147.101.1}}{}}} | GLOBAL-VPN-121{GLOBAL-VPN-ALL{npm_192.168.101.1:192.168.101.1}}
The task you're attempting involves parsing and restructuring nested data, which can be quite complex. To achieve your desired output, you need a recursive approach that can handle multiple levels of nesting. The challenge is to extract nested groups and individual records, then reconstruct them with the master group appended to each.
Let's break down the steps:
Parse the Nested Structure: We need to recursively parse the nested structure. When we encounter a group (e.g., abc{...}), we extract the group and its contents, then continue parsing the contents.
Reconstruct the Data: After extracting a group or an individual record, we prepend it with the master group and format it as required.
Handle Edge Cases: The data may contain varying levels of nesting, so our solution must be robust enough to handle these cases.
Here is a Python function that tries to accomplish this:
This code uses recursion to navigate through the nested structure and reconstructs the data in the desired format. It should work for the given example and similar structures. However, please test it thoroughly with your actual dataset to ensure it handles all cases correctly.