How can I prevent perl from reading past the end of a tied array that shrinks when accessed?

224 Views Asked by At

Is there any way to force Perl to call FETCHSIZE on a tied array before each call to FETCH? My tied array knows its maximum size, but could shrink from this size depending on the results of earlier FETCH calls. here is a contrived example that filters a list to only the even elements with lazy evaluation:

use warnings;
use strict;

package VarSize;

sub TIEARRAY { bless $_[1] => $_[0] }
sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while $$self[$index] % 2;
    $$self[$index]
}
sub FETCHSIZE {scalar @{$_[0]}}

my @source = 1 .. 10;

tie my @output => 'VarSize', [@source];

print "@output\n";  # array changes size as it is read, perl only checks size
                    # at the start, so it runs off the end with warnings
print "@output\n";  # knows correct size from start, no warnings

for brevity I have omitted a bunch of error checking code (such as how to deal with accesses starting from an index other than 0)

EDIT: rather than the above two print statements, if ONE of the following two lines is used, the first will work fine, the second will throw warnings.

print "$_ " for @output;   # for loop "iterator context" is fine,
                           # checks FETCHSIZE before each FETCH, ends properly

print join " " => @output; # however a list context expansion 
                           # calls FETCHSIZE at the start, and runs off the end

Update:

The actual module that implements a variable sized tied array is called List::Gen which is up on CPAN. The function is filter which behaves like grep, but works with List::Gen's lazy generators. Does anyone have any ideas that could make the implementation of filter better?

(the test function is similar, but returns undef in failed slots, keeping the array size constant, but that of course has different usage semantics than grep)

2

There are 2 best solutions below

7
On
sub FETCH {
    my ($self, $index) = @_;
    my $size = $self->FETCHSIZE;
    ...
}

Ta da!

I suspect what you're missing is they're just methods. Methods called by tie magic, but still just methods you can call yourself.

Listing out the contents of a tied array basically boils down to this:

my @array;
my $tied_obj = tied @array;
for my $idx (0..$tied_obj->FETCHSIZE-1) {
    push @array, $tied_obj->FETCH($idx);
}

return @array;

So you don't get any opportunity to control the number of iterations. Nor can FETCH reliably tell if its being called from @array or $array[$idx] or @array[@idxs]. This sucks. Ties kinda suck, and they're really slow. About 3 times slower than a normal method call and 10 times than a regular array.

Your example already breaks expectations about arrays (10 elements go in, 5 elements come out). What happen when a user asks for $array[3]? Do they get undef? Alternatives include just using the object API, if your thing doesn't behave exactly like an array pretending it does will only add confusion. Or you can use an object with array deref overloaded.

So, what you're doing can be done, but its difficult to get it to work well. What are you really trying to accomplish?

3
On

I think that order in which perl calls FETCH/FETCHSIZE methods can't be changed. It's perls internal part. Why not just explicitly remove warnings:

sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while ($$self[$index] || 0) % 2;
    exists $$self[$index] ? $$self[$index] : '' ## replace '' with default value
}