Request:
Extract blocks of text that contain 2 or more search terms, something akin to [ AND ] logical operator in [ awk ].
Preferably run as awk in bash/zsh function (but also ok with standalone awk script), accepting input/parameter in regex style:
[ A|B|C ] = return blocks that contain either 'A' or 'B' or 'C'
[ A&B&C ] = return blocks that contain ALL 'A' and 'B' and 'C'
Context: Blocks are separated by at least 5 new lines.
Extra: Highlight search matches.
Input
Given [ veganPackage.txt ] input file:
1. Fruits
Apple
Banana
Honey
- tasty combo but too many sugars
- Low prep time
- bad for teeth, cavity warning
2. Drinks
Apple Juice
- served cold and ripe
Add Kiwi
- peel first
Banana Smoothie
- tastes good
- fast power up
3. Veggies
Frillice
Cucumber
Tomato
Want
| Input | Blocks to print | Colorize words |
|---|---|---|
| Apple|Banana | Fruits, Drinks | Apple, Banana |
| Apple|Banana|Frillice | Fruits, Drinks, Veggies | Apple, Banana, Frillice |
| Apple&Banana | Fruits, Drinks | Apple, Banana |
| Apple&Tomato | nothing | nothing |
| Kiwi&Banana | Drinks | Kiwi, Banana (only in Drinks) |
Tried
Bash function
Named as [ searchBlock ]
searchBlock ()
{
...
awk \
-v RS='\n{4}' \
-v ORS='\n***\n***\n' \
-v color=$colorOut \
-v colorReset=$colorReset \
-v search=$(echo "$searchTerm" | perl -pe 's/(?<!\\)&+/\/&&\//g and s/^/\//g and s/(.)(?=$)/\1\//g') searchTerm \
-v searchAND=$(echo $searchTerm | perl -pe 's/&+/|/g') '$0~search{gsub(searchAND,color"&"colorReset);print}' $file |
vim - -c "/$searchTerm" \
-c ':AnsiEsc' \
-c 'highlight ColorReverse gui=reverse cterm=reverse' \
-c ":match ColorReverse /$searchTerm/"
}
Example Call as:
searchBlock -s 'Apple|Banana' veganPackage.txt
Rationale:
- if OR pattern as [ | ] in input, do regular match
- if AND pattern as [ & ] in input, preserve [ | ] for colorizing, but change to [ && ] for pattern marching
- feed operand as part of parameter
Bottleneck
- If I manually feed
'/Apple/&&/Kiwi/{gsub(/(Apple|Kiwi)/,color"&"colorReset);print}' veganPackage.txt, then output as expected:
\*\*\*
\*\*\*
2. Drinks
Apple Juice
- served cold and ripe
Add Kiwi
- peel first
Banana Smoothie
- tastes good
- fast power
up
\*\*\*
\*\*\*
However, using '$0 ~ search{gsub(searchAND,color"&"colorReset);print}, [ AND ] pattern #fails (nothing is printed)
(not what I filtered/searched), highlight/coloring is correct though)
$0 ~ search= for every block that contains pattern in awk [ search ] variable,{gsub(searchAND,color"&"colorReset);print}= print global substitution of searched text surrounded by ANSI Escape sequences- [ & ] is double-quoted as regex-specific syntax (NOT to be confused with [ && ] in AND-pattern match for awk))
It seems that $0 ~ /Apple/&&Kiwi/ does NOT collab with me.
Tests
| Input | Code fragment | Result | Expect |
|---|---|---|---|
| Apple|Kiwi | $0~search |
Fruits, Drinks | Fruits, Drinks |
| Apple&Kiwi | $0~search |
nothing | Drinks |
| Apple|Kiwi | search |
entire file | Drinks, Fruits |
| Apple&Kiwi | search |
entire file | Drinks |
| Apple&&Kiwi | search |
entire file | Drinks |
| /Apple/&&/Kiwi/ | $0~search |
nothing | Drinks |
| Apple&&Kiwi | $0~search |
nothing | Drinks |
Focusing solely on
awkand the use of dynamically generated regexes ...Assumptions:
bash) to parse/reformat the various inputs into formats that are acceptable toawkGeneral approach:
gsub()regex in as a-v variable=valueclauseawkscriptgsub()code to bracket the matches with a pair of underscores (__); OP can incorporate the color codes laterA simple data set for demonstration purposes:
We'll use a
bash/forloop to test a few different search regexs (thegsub()regex is the same for all 3 search regexes):NOTE: here's where I'm assuming OP can parse the various input formats into one of these formats for the two regex variables
Where:
awkscript:'BEGIN { filler=ignore } 'awkscript:"${search_regex}"; must be wrapped in double quotesawkscript:' { gsub(gsub_regex,"__&__") } 1; END { filler=ignore }''"${search_regex}"')Taking for a test drive: