>>> You're right, I realised after playing with
Tim's example that the
>>> problem was that I wasn't calling close() on
the codecs file. Adding
>>> this after the f.write(html_text) seems to
flush the buffer which
>>> means that the content now gets written to the
file.
>>
>> Quick note: it may be important to write and read
from the file using
>> binary mode "b". It's not so
significant under Unix, but it is more
>> significant under Windows, because otherwise we may
get some weird
>> results.
>
> But the file is utf-8 text, ISTM it should be written
as text, not
> binary. Why do you recommend binaray mode?
Hi Kent,
Oh! I just wrote that out because I had a vague and fuzzy
feeling that
utf-8, having high-order binary bits, needed to be written
carefully.
But let me examine that unexamined assumption...
No, you're right, we don't have to be so careful here, for
carriage
returns and newlines have their standard interpretation
under utf-8 too.
Ok, good to know. Thank you!
I'd seen too many problems with Windows and binary data
that I do 'rb' out
of habit whenever dealing with high-order binary data. For
example,
ord(26) causes Windows to prematurely truncate the reading
of a file in
text mode:
http://mail.python.org/pipermail/python-list/
2003-March/154659.html
On a close reading of how the utf-8 encoding standard,
though, I see that
it does say that utf-8 avoids encoding high Unicode code
points with
control characters, so my caution is unfounded.
_______________________________________________
Tutor maillist - Tutor python.org
http://
mail.python.org/mailman/listinfo/tutor
|