Reverse BWT without knowing last character

566 Views Asked by At

Usually in Burrows-Wheeler Transform algorithm, a $ character is used to signal the end of string, but in so many cases, this $ is omitted.

I was wondering how it can be reversed without knowing the position of the last character?

For example, I have this BWT:

[[[[[1[[11endgnad1234245ndbnbbb]]]]]]]nnnngnabbbdiaaaiaaii

Following the algorithm, I can easily construct the first column of the BWT matrix, which I choose to represent in a compressed way such as below:

Character : Occurrences
1         : 4
2         : 2
3         : 1
4         : 2
5         : 1
[         : 7
]         : 7
a         : 7
b         : 7
d         : 4
e         : 1
g         : 2
i         : 4
n         : 9

Without knowing which character is the last in the original string, I'm unable to see how I could reconstruct the original string.

Any help is greatly appreciated. Thang

P/S: In case you're wondering what the original string is:

[1]ban[2]banana[3]band[4]bandage[12]bin[14]bind[15]binding

1

There are 1 best solutions below

0
On BEST ANSWER

You can't (but you can try ;-). Your 1st bwt symbol is the last in the original string 'S'. Now you should unroll the original string backward through LF mapping. It's actually bin[sym] + rank(sym, i) + 1 where you start with i = 0. You can easy get bin[] array from occurences. The problem is that once your 'i' is bigger then omitted '$' you shouldn't add this last '1' so you break the string and things go nasty. You can detect the error if you also reconstruct sa[] and overwrite already set index. So you can set arbitrary $ position to '0' and try to recover, then if it fails set it to 1... until you reconstruct correctly. don't know if this could be optimized.

Cheers,

D.