Perl read seek tell, and text files. Too many bytes being read. Layers and newline handling

Question

Perl read seek tell, and text files. Too many bytes being read. Layers and newline handling

3.7k Views Asked by Chris At 22 June 2025 at 13:21

I've a Perl script which analyses a text file (can be UNIX or windows line endings) storing file offsets when it find something of interest.

open(my $fh, $filename);
my $groups;
my %hash;
while(<$fh>) {
   if($_ =~ /interesting/ ) {
      $hash{$groups++}{offset} = tell($fh);
   }
}
close $fh;

Then later on in the script I want to produce 'n' copies of the text file but with additional content at each 'interesting' area. To achieve this I loop through the hash of offsets:

foreach my $group (keys %hash) {
   my $href = $hash{$group};
   my $offset = $href->{offset};

   my $top;
   open( $fh, $file);
   read( $fh, $top, $offset);
   my $bottom = do{local $/; <$fh>};
   close $fh;

   $href->{modified} = $top . "Hello World\n" . $bottom;
}

The problem is the read command is reading too many bytes. I suspect this is a line ending issue as the number of bytes (chars?) out is the same as the line number. Using Notepad++ the tell() command is returning the real offset to point of interest, but using that offset value in read() returns characters past the point of interest.

I've tried adding binmode($fh) straight after the open() command prior to the read(). This does find the correct position in the text file, but then I get (CR + CRLF) output and the text file is full of double carriage returns.

I've played with layers :crlf, :bytes, but no improvement.

Bit stuck!

Original Q&A

There are 3 best solutions below

**cdarke** · Answer 1

From perldoc -f read:

read FILEHANDLE,SCALAR,LENGTH,OFFSET
read FILEHANDLE,SCALAR,LENGTH

So, when you do:

read( $fh, $top, $offset);

your $offset is actually a length. Decide how many characters you need to read. read does not respect line-endings, it reads the number of bytes specified.

If you want to read a line, then don't use read, use:

seek($fh, $offset, 0);
$top = <$fh>;

Is your file full of two new-lines, or are you adding one with a print statement?

**Borodin** · Answer 2

A hash with a continuous range of integers as keys should be an array.
You are storing a copy of the entire file for every occurrence of /interesting/

It sounds like what you need to do is this

open(my $fh, $filename);
while (<$fh>) {
  print;
  print "Hello World\n" if /interesting/;
}

**Joe Z** · Answer 3

My standard way to handle this, when the input file isn't ginormous, is to slurp the file in and normalize line endings, storing each line as an array element. I sometimes have to deal with Windows (CR+LF) and UNIX (LF only) and Mac (CR only) line endings in the same batch of files. The same script needs to run correctly across all three platforms too.

I generally take a belt-and-braces approach when having to deal with such things. One way that ought to work:

sub read_file_into_array
{
    my $file = shift;
    my ($len, $cnt, $data, @file);

    open my $fh, "<", $file         or die "Can't read $file: $!";
    seek $fh, 0, 2                  or die "Can't seek $file: $!";
    $len = tell $fh;
    seek $fh, 0, 0                  or die "Can't seek $file: $!";

    $cnt = read $fh, $data, $len;
    close $fh;

    $cnt == $len or die "Attempted to read $len bytes; got $cnt";

    $data =~ s/\r\n/\n/g;       # Convert DOS line endings to UNIX
    $data =~ s/\r/\n/g;         # Convert Mac line endings to UNIX

    @file = split /\n/, $data;  # Split on UNIX line endings

    return \@file;
}

Then do all your processing on the lines in @file. For your 'interesting' tags, you would store an array index rather than a file offset. The array index is essentially the line number in the original file, counting starting at 0 instead of 1.

To actually augment the files, instead of looping through hash keys, why not construct a hash consisting of line-number => thing-to-append pairs, generating the augmented file like this:

sub generate_augmented_file
{
    my $file   = shift @_;   # array ref
    my $extras = shift @_;   # hash ref of line => extra pairs
    my $text;        

    foreach my $line ( 0 .. scalar( $file ) - 1 )
    {
        $text .= $file->[$line];
        $text .= $extras->{$line} if defined $extras->{$line};
        $text .= "\n";
    }

    return $text;
}

Perl read seek tell, and text files. Too many bytes being read. Layers and newline handling

There are 3 best solutions below

Related Questions in PERL

Related Questions in SEEK

Related Questions in TELL

Related Questions in BINMODE

Trending Questions

Popular # Hahtags

Popular Questions