modify gettext .pot file output to exclude empty strings or strings containing only spaces

431 Views Asked by At

I have a .pot file produced by xgettext on my c++ source code in format:

#: file1.cpp:line
#: file2.cpp:line
msgid "" - empty string

#: file1.cpp:line
#: file2.cpp:line
msgid " \t\n\r" - string contains only spaces

#: file1.cpp:line
#: file2.cpp:line
msgid "real text"

Then i use command like:

grep "#: " "$(POT_FILE)" | sed -e 's/^\(#: \)\(.*)/\2'

to have the only file names and lines to be in the output.

But the thing is that I don't need files for the strings containing only spaces.

It's quite complicated because I have to find the line msgid "" or such just next to the sequence of lines #: blablabla and according to the contents of the string bypass all preceding lines.

Can anybody help with such command?

Thanks!

1

There are 1 best solutions below

0
On

If I understand you correctly, put the following into an executable file:

#!/usr/bin/awk -f

BEGIN { FS="\"" } # make it easier to test the text for msgid

# clean "file:line" line and store it in an array called "a"
/^#: / { sub(/^#: /, "", $0); a[i++]=$0 }

/^msgid/ {
    if( valid_msgid() ) { for( j in a ) print a[j] }
    reset() # clear array a after every msgid encountered
    }

function reset() {
    for( j in a ) { delete a[j]  }
    i = 0
    }

# put your validity tests here.
# $2 won't contain the entire string if the gettext contains double quotes
function valid_msgid() {
    if( length($2) > 0 && $2 !~ /^ / ) return 1
    return 0
    }

If I put the above into an file called awko and chmod +x awko then run awko data.pot I get the following:

#: file1.cpp:line
#: file2.cpp:line

which matches your last section if you convert the "line" values to numbers.

One of the tricks is using " as the delimiter. If you need to reject lines where msgid contains ", then you would have to use more complicated parsing to identify the complete message text.

I don't have access to xgettext so I don't know if the comments after the - in the example bad lines are from you or the program. The the xgettext program outputs them, the delimiter could be altered to " - to test those strings in valid_msgid().