We were passing the utf-8 encoded string to textwrap, which took the
bytes as characters. Hence multi-byte unicode characters (in utf-8)
would count as multiple columns, which is clearly wrong.
self.state = self.stNONE
whole_para = ' '.join(self.para_lines)
self.addtext(whole_para)
- self.text.write(textwrap.fill(whole_para, 80,
- break_long_words=False,
- break_on_hyphens=False))
+ wrapped = textwrap.fill(whole_para.decode('utf-8'), 80,
+ break_long_words=False,
+ break_on_hyphens=False)
+ self.text.write(wrapped.encode('utf-8'))
self.html.write('</p>')
del self.para_lines[:]