My goal with this piece of code is to sanitize an array of elements (a list of URL's, some with special characters like %
) so that I can eventually compare it to another file of URL's and output which ones match. The list of URL's is from a .csv
file with the first field being the URL that I want (with some other entries that I skip over with a quick if()
statement).
foreach my $var(@input_1) {
#Skip anything that doesn't start with http:
if ((/^[#U]/ ) || !(/^h/)) {
next;
}
#Split the .csv into the relevant field:
my @fields = split /\s?\|\s?/, $_;
$var = uri_unescape($fields[0]);
}
My delimiter is a |
in the csv. In its current setup, and also when I change the $_
to $var
, it only returns blank lines. When I remove the $var
declaration at the beginning of the loop and use $_
, it will output the URL's in the correct format. But in that case, how can I assign the output to the same element in the array? Would this require a second array to output the value to?
I'm relatively new to perl, so I'm sure there is some stuff that I'm missing. I have no clue at this moment why removing the $var
at the foreach
declaration breaks the parsing of the @fields
line, but removing it and using $_
doesn't. Reading the perlsyn documentation did not help as much as I would have liked. Any help appreciated!
/^h/
is not bound to anything, so the match happens against$_
. If you want to match$var
, you have to bind it:Using
||
with two matches could probably be incorporated into a single regular expression with an alternative:i.e. The line has to start with
#
,U
, something else thanh
, or be empty.You can populate a new array with the results by using
push
:Also note that if your data can contain
|
quoted or escaped (or newlines etc.), you should use Text::CSV instead ofsplit
.