Parse file organised in a certain pattern

53 Views Asked by At

f is a file and is shown below:

+++++192.168.1.1+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.1

+++++192.168.1.2+++++
Port Number: 80
......
product: Apache http
IP Address: 192.168.1.2

+++++192.168.1.3+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.3

+++++192.168.1.4+++++
Port Number: 3306
......
product: MySQL
IP Address: 192.168.1.4

+++++192.168.1.5+++++
Port Number: 22
......
product: Open SSH
IP Address: 192.168.1.5

+++++192.168.1.6+++++
Port Number: 80
......
product: Apache httpd
IP Address: 192.168.1.6

The expected output is:

These hosts have Apache services:

192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.6

The code I tried:

for service in f:
    if "product: Apache httpd" in service:
        for host in f:
            if "IP Address: " in host:
                print(host[5:], service)

It just gave me all ip addresses instead of the specific hosts with Apache installed.

How can I make the expected output?

4

There are 4 best solutions below

0
On BEST ANSWER

You could also try this:

apaches = []
with open('ips.txt') as f:
    sections = f.read().split('\n\n')

    for section in sections:
        _, _, _, product, ip = section.split('\n')
        _, product_type = product.split(':')
        _, address = ip.split(':')

        if product_type.strip().startswith('Apache'):
            apaches.append(address.strip())

print('These hosts have Apache services:\n%s' % '\n'.join(apaches))

Which Outputs:

These hosts have Apache services:
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.6
0
On

Something like this maybe. I've inlined the data for illustration purposes, but it can just as well come from a file.

In addition, we're gathering all of the per-host data up first, in case you need some of the other information as well, then printing out what is needed. This means info_by_ip looks roughly like

{'192.168.1.1': {'Port Number': '80', 'product': 'Apache httpd'},
 '192.168.1.2': {'Port Number': '80', 'product': 'Apache http'},
 '192.168.1.3': {'Port Number': '80', 'product': 'Apache httpd'},
 '192.168.1.4': {'Port Number': '3306', 'product': 'MySQL'},
 '192.168.1.5': {'Port Number': '22', 'product': 'Open SSH'},
 '192.168.1.6': {'Port Number': '80', 'product': 'Apache httpd'}}

.

Code:

import collections

data = """
+++++192.168.1.1+++++
Port Number: 80
......
product: Apache httpd

+++++192.168.1.2+++++
Port Number: 80
......
product: Apache http

+++++192.168.1.3+++++
Port Number: 80
......
product: Apache httpd

+++++192.168.1.4+++++
Port Number: 3306
......
product: MySQL

+++++192.168.1.5+++++
Port Number: 22
......
product: Open SSH

+++++192.168.1.6+++++
Port Number: 80
......
product: Apache httpd
"""

ip = None  # Current IP address

# A defaultdict lets us conveniently add per-IP data without having to
# create the inner dicts explicitly:
info_by_ip = collections.defaultdict(dict)

for line in data.splitlines():  # replace with `for line in file:` for file purposes
    if line.startswith('+++++'):  # Seems like an IP address separator
        ip = line.strip('+')  # Remove + signs from both ends
        continue  # Skip to next line
    if ':' in line:  # If the line contains a colon,
        key, value = line.split(':', 1)  # ... split by it, 
        info_by_ip[ip][key.strip()] = value.strip()  # ... and add to this IP's dict.


for ip, info in info_by_ip.items():
    if info.get('product') == 'Apache httpd':
        print(ip)
0
On

You can use the +++++ as your separator and get the desired ip with the following code.

    with open('ip.txt', 'r') as fileReadObj:
    rows = fileReadObj.read()
    text_lines = rows.split('+++++')
    for i, row in enumerate(text_lines):
        if 'Apache' in str(row):
            print(text_lines[i - 1])
0
On

For explanation:

with open(filename,'r') as fobj: # Open the file as read only
    search_string = fobj.read() # Read file into string
    print('These hosts have Apache services:\n\n')
    # Split string by search term
    for string_piece in search_string.split('Apache'):  
        # Split string to isolate IP and count up/back 2
        ip_addr = string_piece.split('+++++')[-2] 
        print(ip_addr)

Compressed:

with open(filename,'r') as fobj:
    print('These hosts have Apache services:\n\n')
    for string_piece in fobj.read().split('Apache'):
        print('{}\n'.format(string_piece.split('+++++')[-2]))