I'm trying to find a way to split this properly but until now I bump into many issues.
using string.split / string.substring, string.indexof, string.replace
and so on.
here is a sample string that needs to be split into a list.
We are <b><i>very</i></b><b>a</b>mused!\nThank you.
and the result in the list should be in this order :
0: We
1: are
2: <b>
3: <i>
4: very
5: </i>
6: </b>
7: <b>
8: a
9: </b>
10: mused!
11: \n
12: Thank
13: you.
So what i am trying to do is this :
splitStart = baseString.Value.Split(' ');
foreach (string part in splitStart)
{
if (part.Contains("<"))
{
// get the parts <b> <i> <size> <color> </b> </i> </size> </color> \n
textlist.Add(part); // add each part to list
}
else
{
textlist.Add(part);
Debug.Log(part);
}
}
I tried things like
contains("<n>")
replace "<n>" "" and add "<n>" to array
but that can break the sequence.
Edit : I forgot to say that this is for c#
I think you need some pre-processing of characters using some html parser like jsoup or tree structure algorithm.
It's one option to make this case with Jsoup library.
1. Java version
First, prepare word list from the html tags.
then, traverse the html contents using Jsoup's NodeVisitor class.
Finally, the code is as follows.
The output must be looks like,
2. C# version
Well, The source code of C# version is a litte bit different but the same process( with a little change needed).
This is my NodeVisitor version of code.
First parse the html contents.
Second, select original sentence from 'body' tag.
The complete code as follows.
The output also should be
You can find the NSoup library I used at this time(actually, not a official version 0.8.0) from the site.
The official NSoup site is here but no visitor interface.
Then, you can use your own method to complete code.
I must tell you this is just an option for your goal.
Regard,