Extract substrings between strings

96 Views Asked by Digsby At 24 June 2021 at 14:50

I have a file with text as follows:

###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###

I want to extract all strings between ### .

My desired output would be something like this:

interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

I have tried the following:

grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'

This almost works but only seems to grab the first instance per line, so the first line in my output only grabs

interest1 moreinterest1

rather than

interest1 moreinterest1
interest2

Original Q&A

There are 5 best solutions below

anubhava On 24 June 2021 at 14:54 BEST ANSWER

Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:

awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file

interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

Here is an alternative grep + sed solution:

grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'

This assumes there are no # characters in between ### markers.

Wiktor Stribiżew On 24 June 2021 at 15:13

You can use pcregrep:

pcregrep -o1 '###(.*?)###' file

The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.

o1 option will output Group 1 value only.

See the regex demo online.

AudioBubble On 24 June 2021 at 16:05

sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file

Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.

Ed Morton On 24 June 2021 at 16:28

With GNU awk for multi-char RS:

$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

potong On 25 June 2021 at 12:59

This might work for you (GNU sed):

sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file

Replace all occurrences of ###'s by newlines.

If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.

Extract substrings between strings

There are 5 best solutions below

Related Questions in SED

Related Questions in GREP

Related Questions in UNIX-TEXT-PROCESSING

Trending Questions

Popular # Hahtags

Popular Questions