When I include the NULL character (\x00) in a regex character range in BSD grep, the result is unexpected: no characters match. Why is this happening?
Here is an example:
$ echo 'ABCabc<>/ă' | grep -o [$'\x00'-$'\x7f']
Here I expect all characters up until the last one to match, however the result is no output (no matches).
Alternatively, when I start the character range from \x01, it works as expected:
$ echo 'ABCabc<>/ă' | grep -o [$'\x01'-$'\x7f']
A
B
C
a
b
c
<
>
/
Also, here are my grep and BASH versions:
$ grep --version
grep (BSD grep) 2.5.1-FreeBSD
$ echo $BASH_VERSION
3.2.57(1)-release
Noting that
$'...'is a shell quoting construct, this,would try to pass a literal NUL character as part of the command line argument to
grep. That's impossible to do in any Unix-like system, as the command line arguments are passed to the process as NUL-terminated strings. So in effect,grepsees just the arguments-oand[.You would need to create some pattern that matches the NUL byte without including it literally. But I don't think
grepsupports the\000or\x00escapes itself. Perl does, though, so this prints the input line with the NUL:As an aside, at least GNU grep doesn't seem to like that kind of a range expression, so if you were to use that, you'd to do something different. In the
Clocale,[[:cntrl:][:print:]]'might perhaps work to match the characters from\x01to\x7f, but I didn't check comprehensively. The manual for grep has some descriptions of the classes.Note also that
[$'\x00'-$'\x7f']has an unquoted pair of[and]and so is a shell glob. This isn't related to the NUL byte, but if you had files that match the glob (any one-letter names, if the glob works on your system -- it doesn't on my Linux), or hadfailglobornullglobset, it would probably give results you didn't want. Instead, quote the brackets too:$'[\x00-\x7f]'.