I'm using serialize and deserialize right now, and when decoding the serialized textbuffer with utf-8 I get this:
GTKTEXTBUFFERCONTENTS-0001 <text_view_markup>
<tags>
<tag name="bold" priority="1">
<attr name="weight" type="gint" value="700" />
</tag>
<tag name="#efef29292929" priority="2">
<attr name="foreground-gdk" type="GdkColor" value="efef:2929:2929" />
</tag>
<tag name="underline" priority="0">
<attr name="underline" type="PangoUnderline" value="PANGO_UNDERLINE_SINGLE" />
</tag>
</tags>
<text><apply_tag name="underline">At</apply_tag> the first <apply_tag name="bold">comes</apply_tag> rock! <apply_tag name="underline">Rock</apply_tag>, <apply_tag name="bold">paper,</apply_tag> <apply_tag name="#efef29292929">scissors!</apply_tag></text>
</text_view_markup>
I'm trying to apply the tags using some html tags like <u></u><b></b>, as I asked before and that was closed as a duplicate I'll be asking differently. So, how can I tell where these tags are ending if all they ends with </apply_tag>, instead of something like </apply_tag name="nameoftag"> I tried this before:
def correctTags(text):
tags = []
newstring = ''
for i in range(len(text)):
if string[i] == '<' and i+18 <= len(text):
if text[i+17] == '#':
tags.append('</font color>')
elif text[i+17] == 'b':
tags.append('</b>')
elif text[i+17] == 'u':
tags.append('</u>')
newstring = string.replace('<apply_tag name="#', '<font color="#').replace('<apply_tag name="bold">', '<b>').replace('<apply_tag name="underline">', '<u>')
for j in tags:
newstring = newstring.replace('</apply_tag>', j, 1)
return '<text>' + newstring + '</text>'
But there is a problem with inner tags, they will be closed where it shouldn't be. I think maybe the answer is gtk.TextBuffer.register_serialize_format as I think this should serialize using the mime that I pass to it, like html, and then I should know where the tags are ending. But I didn't found any example extensive friendly usage of it.
I found the solution to get tags correctly out of serialized textbuffer at Serialising Gtk TextBuffers to HTML, it isn't register_serialize_format, but as was said at the site it's possible to write a serializer but the documentation is sparse (and for that I think is using register_serialize_format). Either way, the solution uses htlm.parser and xml.etree.ElementTree, but it's possible to use BeautifulSoup.
Basically, this script will handle the serialized textbuffer content using html paser, the hard work starts at the feed, that receive byte content (the serialized textbuffer content) and returns a string (the formated text with the html tags), first it'll find the index of
<text_view_markup>dropping out the readerGTKTEXTBUFFERCONTENTS-0001(this is what couldn't be decoded using decode('utf-8')) as it will result in "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position : invalid start byte", you can use decode('utf-8', erros='ignore') or erros='replace' for that, but as the feed method will drop this part the content is decoded with simple .decode().Then tags and text will be handled separetly, first the tags will be handled and here I used xml.etree.ElementTree, but it's possible use beautifulsoup as the original script, after the tags are handled feed is called and the text is passed, this feed is the method of HTMLParser.
Also for the tags it's possible handle more than italis, bold, and color, you just need to update the tag2html dictionary.
Besides of not using beautifulsoup I made some other changes, as for the tag name, all the tags has names and so they are not using id, my color tag also already has hex values so I didn't need use the pango_to_html_hex method. And here is how it looks right now:
Also a big thanks to Cyril Danilevski who wrote this, all credits to him. And as he explained, "There is also , that mark the beginning and end of a TextBuffer's content." so if you follow allong the example from the site, at the handle_endtag it has
self.markup_text += self.current_closing_tags.pop()and that will try to pop a empty list, so I recommend anyone who wants to handle tags also see pango_html.py which handle this by checking if the list is not empty (it's also on the code on this answer at the handle_endtag), there's also a test file test_pango_html.py.Exemple of usage