I have lines from a web page of the form
<a href="url with spaces">description with spaces</a>
which I want to convert to a csv format
"url%20%with%20spaces","description with spaces"
to feed into a mediawiki page that expects external links to be [url%20%with%20spaces description with spaces] (and I don't want that page to be cluttered with #rreplace)
sed -Ee 's`.*href="(.*)">(.*)</a>.*`"\1","\2"`'
can split the url, but I can't see an easy way to do a further substitution of space with %20 in just \1 without affecting \2
You might consider using GNU
awk
likeSee the
awk
demo online.The field separator pattern here is
` to split the line into fields.
href="|">|</a>
, it matches eitherhref="
, or">
, orThe second field needs additional processing, so
gsub(/ /, "%20",$2)
is used to replace each space with%20
substring. The updated Field 2 and Field 3 are used to form the resulting output.