I've a Perl script which analyses a text file (can be UNIX or windows line endings) storing file offsets when it find something of interest.
open(my $fh, $filename);
my $groups;
my %hash;
while(<$fh>) {
if($_ =~ /interesting/ ) {
$hash{$groups++}{offset} = tell($fh);
}
}
close $fh;
Then later on in the script I want to produce 'n' copies of the text file but with additional content at each 'interesting' area. To achieve this I loop through the hash of offsets:
foreach my $group (keys %hash) {
my $href = $hash{$group};
my $offset = $href->{offset};
my $top;
open( $fh, $file);
read( $fh, $top, $offset);
my $bottom = do{local $/; <$fh>};
close $fh;
$href->{modified} = $top . "Hello World\n" . $bottom;
}
The problem is the read command is reading too many bytes. I suspect this is a line ending issue as the number of bytes (chars?) out is the same as the line number. Using Notepad++ the tell()
command is returning the real offset to point of interest, but using that offset value in read()
returns characters past the point of interest.
I've tried adding binmode($fh)
straight after the open()
command prior to the read()
. This does find the correct position in the text file, but then I get (CR + CRLF) output and the text file is full of double carriage returns.
I've played with layers :crlf, :bytes, but no improvement.
Bit stuck!
From
perldoc -f read
:So, when you do:
your
$offset
is actually a length. Decide how many characters you need to read.read
does not respect line-endings, it reads the number of bytes specified.If you want to read a line, then don't use
read
, use:Is your file full of two new-lines, or are you adding one with a
print
statement?