Split input files into multiple file using perl

413 Views Asked by At

I have an input file with format as below ,

Line 1 ......
Line 2 ......
Line 3 ...... 
Line 4 ......
run_diagnosis ./FAILCYCLE/pat.UMK004_W13_X3Y12.dat.trans -cycle_offset 1 -verbose
Line 48 ....
Line 49 ....
Line 50 .....
run_diagnosis ./FAILCYCLE/pat.UMK004_W13_X13Y10.dat.trans -cycle_offset 1 -verbose 
Line 52 ..... 
Line 53 ..... 
Line 53 ..... 
run_diagnosis ./FAILCYCLE/pat.UMK004_W13_X15Y4.dat.trans -cycle_offset 1 -verbose
Line 55 .....
Line 56 ..... 
Line 57 .....

The keyword for my search would be "run_diagnosis".
I want to split the content into multiple file (Number of file will be equal to number of occurrence of keyword "run_diagnosis") in input file.

The data above the first occurrence of "run_diagnosis" is useless content. I want the output to be something like this ,

File 1 :

run Diagnosis ./FAILCYCLE/pat.UMK004_W13_X3Y12.dat.trans -cycle_offset 1 -verbose
Line 48 ....
Line 49 .... 
Line 50 ..... 

File 2 :

run_diagnosis ./FAILCYCLE/pat.UMK004_W13_X13Y10.dat.trans -cycle_offset 1 -verbose 
Line 52 ..... 
Line 53 ..... 
Line 53 .....

And so on ...till the last occurrence of keyword "run_diagnosis".
I have tried something using array , but it can only print the first and third occurrence of the keyword and skip the second and fourth.

Also the name of the file to be created come from the "run_diagnosis" line entry.
In my case the name of File 1 will be : UMK004_13_3_12.ext of the input file passed

my $file_in = 'Diagnosis_add_seal_ring.ppd';
my $ext = (fileparse($file_in,'\..*'))[2];
my $start_of = 'Unwanted_Content.txt';
my $line;
my @grabbed;

open my $IN, "<", $file_in or die "unable to open $file_in $!"; 
open my $OUT, ">", $start_of or die "unable to open $start_of file $!"; 

  while ($line = <$IN>) { 
      if ($line =~ /^run_diagnosis/) { 
         my $file_name = (split /\./, $line)[2] . $ext;
         push @grabbed, $line;
             while (<$IN>) {
                 last if /^run_diagnosis/;
                 push @grabbed, $_;
             }
         open $OUT, ">", $file_name or die "... $!"; 
         print $OUT @grabbed; 
         undef(@grabbed)
  }     
  close $OUT;
}

Can you please guide me with this.

3

There are 3 best solutions below

1
On BEST ANSWER

You can open output files on the fly while you're reading the input. Whenever you meet a ^run_diagnosis just open a new output file and continue to write using the same file handle variable:

#!/usr/bin/perl

use strict;
use warnings;

my $file_in = 'Diagnosis_add_seal_ring.ppd';
my ($ext) = $file_in =~ /([^.]+)$/;

open my $IN, "<", $file_in or die "unable to open $file_in $!";
my $OUT;
my $file_num = 0;

while (<$IN>) {
    if (/^run_diagnosis[^.]+\.[^.]+\.([^.]+)/) {
        my $file_out = "$1.$ext";
        open $OUT, ">", $file_out or die "unable to open $file_out file $!";
        $file_num++;
    }
    print $OUT $_ if ($file_num);
}
1
On

This program will do as you ask. It simply opens a new output file whenever a run_diagnosis line is found

use strict;
use warnings;

my $file_in = 'Diagnosis_add_seal_ring.ppd';
open my $fh, '<', $file_in or die qq{Unable to open "$file_in" for input: $!};
my ($file_ext) = $file_in =~ /(\.[^.]*)\z/;

my $filenum;
my $fh_out;

while ( <$fh> ) {
  if ( /^run_diagnosis/ ) {
    my $file_out = (split /\./)[2] . $file_ext;
    warn $file_out, "\n";
    open $fh_out, '>', $file_out or die qq{Unable to open "$file_out" for output: $!};
    select $fh_out;
  }
  print if $fh_out;
}

output

UMK004_W13_X3Y12.ppd
UMK004_W13_X13Y10.ppd
UMK004_W13_X15Y4.ppd
0
On
#!/usr/bin/env perl

use v5.20;
use experimental qw/signatures postderef/;
use autodie;

my $i = 0;
my $fh;

while(<>)
{
    if($_ =~ m/run_diagnosis/)
    {
        $i++;
        open $fh, ">", "File_".$i.".txt";
        writeFile($_, $fh);
    }else
    {
        unless($i==0)
        {
            open $fh, ">>", "File_".$i.".txt";
            writeFile($_, $fh)
        }
    }
}

sub writeFile($line, $fh)
{
    print $fh $line;
    close $fh;
}