Using csplit in Bash script with Form Feed Regex

604 Views Asked by At

I have a print output file (uncomp.txt) that has form feeds in it. I'm trying to split the single document into multiple documents based on the \f regex match, and outputting files with the epoch time.

I've tried this:

$ csplit --prefix=$(date +%s) -s  /tmp/uncomp.txt "/%\f%/+1" "{*}"

as well as this:

$ csplit --prefix=$(date +%s) -s  /tmp/uncomp.txt "/\f/+1" "{*}"

and even this:

$ csplit -s  --prefix=$(date +%s) /tmp/uncomp.txt /\f/ {*}

But each time I end up with a single file. It's apparently not picking up the \f regex... What am I doing wrong?

3

There are 3 best solutions below

0
On

Just tried like this, using the standalone bash shell for windows

csplit -z --prefix=Stored dumpstored.sql /^L/ "{*}"

where I obtained ^L by pressing CTRL+L. It worked for me.

0
On

bash solution

It appears that csplit requires a literal formfeed in its regex. One way to achieve that is to use bash's $'...' construct:

csplit --prefix=$(date +%s) -s  uncomp.txt $'/\f/+1' "{*}"

POSIX solution

If you don't have bash, you can use printf:

csplit --prefix=$(date +%s) -s  uncomp.txt "/$(printf "\f")/+1" "{*}"

Or, equivalently:

csplit --prefix=$(date +%s) -s  uncomp.txt "$(printf "/\f/+1")" "{*}"
0
On

I don't believe you want the "+1" after the regex. For me this moves the first line of each page to the previous page. (BTW, for the explanation of the $'...' construct, search for the string 'ANSI C' in the bash manpage.)