My goal with this piece of code is to sanitize an array of elements (a list of URL's, some with special characters like %) so that I can eventually compare it to another file of URL's and output which ones match. The list of URL's is from a .csv file with the first field being the URL that I want (with some other entries that I skip over with a quick if() statement).
foreach my $var(@input_1) {
#Skip anything that doesn't start with http:
if ((/^[#U]/ ) || !(/^h/)) {
next;
}
#Split the .csv into the relevant field:
my @fields = split /\s?\|\s?/, $_;
$var = uri_unescape($fields[0]);
}
My delimiter is a | in the csv. In its current setup, and also when I change the $_ to $var, it only returns blank lines. When I remove the $var declaration at the beginning of the loop and use $_, it will output the URL's in the correct format. But in that case, how can I assign the output to the same element in the array? Would this require a second array to output the value to?
I'm relatively new to perl, so I'm sure there is some stuff that I'm missing. I have no clue at this moment why removing the $var at the foreach declaration breaks the parsing of the @fields line, but removing it and using $_ doesn't. Reading the perlsyn documentation did not help as much as I would have liked. Any help appreciated!
/^h/is not bound to anything, so the match happens against$_. If you want to match$var, you have to bind it:Using
||with two matches could probably be incorporated into a single regular expression with an alternative:i.e. The line has to start with
#,U, something else thanh, or be empty.You can populate a new array with the results by using
push:Also note that if your data can contain
|quoted or escaped (or newlines etc.), you should use Text::CSV instead ofsplit.