Extracting data from log file into Perl hash

599 Views Asked by At

I have a four log files. Each log file is in the same format so let's focus on one.

In each log file I have to extract information such as Computer Name and such. I won't post the log file here because it has over 46,000 lines.

My main goal in the end is that, once this information is extracted, I would store it in a hash. These hashes would be later used for an insert statement into a database table.

What I have done so far is this

use strict;
use warnings;

my $filename = 'IGXLEventLog.3.17.2015.20.25.12.625.log';
open(my $fn, '<', $filename) or die "Could not open file '$filename': $!";

our %details;

while ( my $row = <$fn> ) {
    chomp $row;

    if ( $row =~ /Computer Name:\s*(\S+)/i ) {
        print $1, "\n";
    }

    if ( $row =~/Operating System:\s*(.*)/i ) {
        print $1, "\n";
    }

    if ( $row =~/IG-XL Version:\s(.*?)\;/ ) {
        print $1, "\n";  
    }
}

Problem 1

I have no problem extracting the computer name and operating system from the log file. However, IG-XL Version in the log file occurs twice. So what I'm getting from printing $1 is:

8.00.01_uflx (P7)
8.00.01_uflx (P7)

The original records are

Current IG-XL Version: 8.00.01_uflx (P7); Build: 11.10.12.01.31
Current IG-XL Version: 8.00.01_uflx (P7); Build: 11.10.12.01.31

So as you can see what I've managed to isolate the data that I want, but I get two results. My main objective now is to solve how to only get the first match.

Any help on this? Is there anything I am doing wrong? Please let me know.

Problem 2

This is problem is not my focus yet, but will be after I solve the first one. In the log file, there's a section where data is structured like this:

2.0  HSD-U       664-999-01 c301036 1251-A 5445
      ChanBoard0    233-455-00 c303bb6 1521-A 5445
      ChanBoard1    321-493-00 c303496 1321-A 5445
6.0   DC-07       888-375-02 0C31F8F1 1330-A 5445
    aka: 604-375-00
      DC-07       123-456-01 0C6203EF 1150-A 5445
    aka: 939-420-00
7.0   DC-07       613-493-00 c303496 1321-A 5445
    aka: 466-456-65
      DC-07       613-493-00 c303496 1321-A 5545

Notice there are digits (not to be treated as decimals) on the left-hand side such as 2.0, 6.0, 7.0 and so on. (Some other digits for example would be 1.4, 5.60, 57.58.)

In this section of the string, I want to get only the first line according to the digits. So for 2.0 I would only pick up the line 2.0 HSD-U 664-999-01 c301036 1251-A 5445 and ignore the lines without numbers.

In this line, I want to get the fields 2.0 and HSD-U separately, and assign those two to a separate hash each.

So I need to extract a total of four types of different data, with the last type being the one that has the most pieces.

EDIT: What I did according to Borodin's answer

while(<$fn>)
{
    if ( /Computer Name:\s*(\S+)/i ) {
        $details{comp_name} //= $1;
        print $details{comp_name}, "\n";
    }
    elsif ( /Operating System:\s*(.*)/i ) {
        $details{op_sys} //= $1;
        print $details{op_sys}, "\n";
    }
    elsif ( /IG-XL Version:\s*([^;]*)/i ) {
        $details{igxl_vn} //= $1;
        print $details{igxl_vn}, "\n";
    }
    elsif ( /^([\d.]+)\s+(\S+)/ ) {
        $details{slot} //= $1;
        $details{name} //=$2;
        print $details{slot}, "\n", $details{name};
    }
}

My output:

UFLEX-06
Windows XP Service Pack 3
8.00.01_uflx (P7)
8.00.01_uflx (P7)    <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U2.0             <-Duplicated
HSD-U8.00.01_uflx (P7) <-Weird line

This expected output is what I desire only up till the 3rd line.

An expected one would be:

UFLEX-06
Windows XP Service Pack 3
8.00.01_uflx (P7)
2.0
HSD-U
5.0
Gigabit
6.0
MattersNot
7.9
MatterMatter
15.20
Knnccb

EDIT 2: @Borodin Both as strings. In fact all as strings. What I am going to do is that for all these values, I will be inserting them into a table in my database where by I would create an SQL file with just the text: INSERT INTO TABLE1(cp_name, os, version, slot, slot_name) values('UFLEX-06', 'Windows XP dot dot', '8.0.0_uflx', '2.0', 'HSD-U'). This is just to show for a better understanding.

There would be a lot of slots in this cp_name. Just take it as a computer. There would be 4 ram slots. Each ram card would have a different name. The name or rather a way to identify which card is at which slot would be let's say the ram name is EatYou. That ram card is at slot 3. So I have to insert into the database all the details other than the ram name and ram slot number which would be the different one.

Back to the main point, that's why I'm trying to find a simple way to do this by assigning each value to a hash array so that when I create the sql file it would be easy for me to assign the insert values.

2

There are 2 best solutions below

26
On BEST ANSWER

I would write it like this. Using an explicit variable for the file records just makes more noise so I've used Perl's default $_

The expression $details{comp_name} //= $1 etc. assigns the hash element only if it doesn't already have a value

You didn't make it clear how you wanted the dotted decimals stored in your hash, so I've used the first field as a key and the second as a hash

use strict;
use warnings;

my $filename = 'IGXLEventLog.3.17.2015.20.25.12.625.log';
open my $fh, '<', $filename or die "Could not open file '$filename': $!";

my %details;

while ( <$fh> ) {

    if ( /Computer Name:\s*(.*\S)/i ) {
        $details{comp_name} //= $1;
    }
    elsif (/Operating System:\s*(.*\S)/i ) {
        $details{op_sys} //= $1;
    }
    elsif (/IG-XL Version:\s*([^;]*)/ ) {
        $details{igxl_vn} //= $1;
    }
    elsif ( /^([\d.]+)\s+(\S+)/ ) {
        $details{$1} //= $2;
    }
}

use Data::Dump;
dd \%details;

output

{
  "2.0"       => "HSD-U",
  "6.0"       => "DC-07",
  "7.0"       => "DC-07",
  "comp_name" => "UFLEX-06",
  "igxl_vn"   => "8.00.01_uflx (P7)",
  "op_sys"    => "Windows XP Service Pack 3",
}
2
On

The first question has been already answered there: Stopping regex at the first match, it shows two times

As for the second, you need to use the start anchor (^) immediately followed by some digits and dots:

if($row =~/^[\d\.]+\s+(\S+)/)
{
    print $1, "\n";  
}

Here's how this regex works: https://regex101.com/r/vY7aL6/1