Extract substrings between strings

96 Views Asked by At

I have a file with text as follows:

###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###

I want to extract all strings between ### .

My desired output would be something like this:

interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

I have tried the following:

grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'

This almost works but only seems to grab the first instance per line, so the first line in my output only grabs

interest1 moreinterest1

rather than

interest1 moreinterest1
interest2
5

There are 5 best solutions below

3
anubhava On BEST ANSWER

Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:

awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file

interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

Here is an alternative grep + sed solution:

grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'

This assumes there are no # characters in between ### markers.

0
Wiktor Stribiżew On

You can use pcregrep:

pcregrep -o1 '###(.*?)###' file

The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.

o1 option will output Group 1 value only.

See the regex demo online.

0
AudioBubble On
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file

Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.

0
Ed Morton On

With GNU awk for multi-char RS:

$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
0
potong On

This might work for you (GNU sed):

sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file

Replace all occurrences of ###'s by newlines.

If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.