I have an html file which I basically try to remove first occurences of <...> with sub/gsub functionalities.
I used awk regex . * + according to match anything between < >. However first occurence of > is being escaped (?). I don't know if there is a workaround.
sample input file.txt (x is added not to print empty):
<div>fruit</div></td>x
<span>banana</span>x
<br/>apple</td>x
code:
awk '{gsub(/^<.*>/,""); print}' file.txt
current output:
x
x
x
desired output:
fruit</div></td>x
banana</span>x
apple</td>x
With your shown samples, please try following
awkcode. Simple explanation would be, usingsubsubstitute function ofawkprograming. Then substituting starting<till(using[^>]means till first occurrence of>comes)>including>with NULL in current line, finally print edited/non-edited line by1.2nd solution: Using
matchfunction ofawkhere match values from 1st occurrence of<to till 1st occurrence of>and print the rest of line.OR In case you have lines which are not starting from
<and you want to print them also then use following: