List Info

Thread: Output as UTF8 not working?




Output as UTF8 not working?
country flaguser name
United States
2007-04-13 13:56:33
I am unable to create a UTF-8 encoded file using the
following:

<code start>
use strict;
use warnings;
use open OUT => ':utf8';
use Unicode::String qw(utf8 latin1);

my $output = 'D:temptest.txt';
my $content = "This is a test.n";
my $u = latin1($content);
open(OUT,">:utf8",$output) or die "could
not open $output.";
binmode (OUT,":utf8");
print OUT $u->utf8;
close(OUT);
<code end>

To verify the output is UTF-8 encoded, I use a DOS console
with "debug".
Here is a paste of my output:

<output start>
D:Temp>debug test.txt
-d
1373:0100  54 68 69 73 20 69 73 20-61 20 74 65 73 74 2E 0D  
This is a 
test..
<output end>

The output should look like this, specifically having the
UTF8 marker " 
EF BB BF":

<output start>
D:Temp>debug UTF8.txt
-d
1373:0100  EF BB BF 54 68 69 73 20-69 73 20 61 20 74 65 73  
...This is 
a tes
1373:0110  74 2E 0D 0A 43 20 22 2D-2F 2F 57 33 34 00 62 13  
t...C 
"-//W34.b.
<output end>

Am I missing something here?  Here's my environment:

<output start>
D:Temp>perl -V
Summary of my perl5 (revision 5 version 8 subversion 8)
configuration:
  Platform:
    osname=MSWin32, osvers=4.0,
archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef
useithreads=define 
usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1
-DWIN32 
-D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED
-DUSE_SITECUSTOMIZE 
-DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define,
longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='__int64', 
lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug
-opt:ref,icf  
-libpath:"C:
PerllibCORE"  -machine86'
    libpth=lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib  
comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib 
uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib 
winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib 
uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes,
libperl=perl58.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib
-debug 
-opt:ref,icf  -
libpath:"C:PerllibCORE"  -machine86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS PERL_MALLOC_WRAP
                        PL_OP_SLAB_ALLOC USE_ITHREADS
USE_LARGE_FILES
                        USE_PERLIO USE_SITECUSTOMIZE
  Locally applied patches:
        ActivePerl Build 820 [274739]
        Iin_load_module moved for compatibility with build
806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        Patch for CAN-2005-0448 from Debian with
modifications
        Rearrange INC so that 'site' is searched before
'perl'
        Partly reverted 24733 to preserve binary
compatibility
        29930 win32.c typo in #define MULTIPLICITY
        29868 win32_async_check() can still loop
indefinitely
        29690,29732 ANSIfy the PATH environment variable on
Windows
        29689 Add error handling to win32_ansipath
        29675 Use short pathnames in $^X and INC
        29607,29676 allow blib.pm to be used for testing
Win32 module
        29605 Implement killpg() for MSWin32
        29598 cwd() to return the short pathname
        29597 let readdir() return the alternate filename
        29590 Don't destroy the Unicode system environment
on Perl startup
        29528 get ext/Win32/Win32.xs to compile on cygwin
        29509,29510,29511 Move Win32: functions
into Win32 module
        29483 Move Win32 from win32/ext/Win32 to ext/Win32
        29481 Makefile.PL changes to compile Win32.xs using
cygwin
        28671 Define PERL_NO_DEV_RANDOM on Windows
        28376 Add error checks after execing PL_cshname or
PL_sh_path
        28305 Pod::Html should not convert "foo"
into ``foo''
        27833 Change anchor generation in Pod::Html for
'=item item 2'
        27832,27847 fix Pod::Html::depod() for multi-line
strings
        27719 Document the functions htmlify() and
anchorify() in Pod::Html
        27619 Bug in Term::ReadKey being triggered by a bug
in 
Term::ReadLine
        27549 Move DynaLoader.o into libperl.so
        27528 win32_pclose() error exit doesn't unlock
mutex
        27527 win32_async_check() can loop indefinitely
        27515 ignore directories when searching INC
        27359 Fix -d:Foo=bar syntax
        27210 Fix quote typo in c2ph
        27203 Allow compiling swigged C++ code
        27200 Make stat() on Windows handle trailing slashes
correctly
        27133 Initialise lastparen in the regexp structure
        27061 L<PerlIO> and Pod::Html
        27034 Avoid "Prototype mismatch" warnings
with autouse
        26970 Make Passive mode the default for Net::FTP
        26921 Avoid getprotobyname/number calls in
IO::Socket::INET
        26897,26903 Make common IPPROTO_* constants always
available
        26670 Make '-s' on the shebang line parse -foo=bar
switches
        26637 Make Borland and MinGW happy with change
26379
        26536 INSTALLSCRIPT versus INSTALLDIRS
        26379 Fix alarm() for Windows 2003
        26087 Storable 0.1 compatibility
        25861 IO::File performace issue
        25084 long groups entry could cause memory
exhaustion
        24699 ICMP_UNREACHABLE handling in Net::Ping
  Built under MSWin32
  Compiled at Jan 23 2007 15:57:46
  INC:
    C:/Perl/site/lib
    C:/Perl/lib
    .

D:Temp>
<output end>

John Poole




_______________________________________________
PDK mailing list
PDKlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs

  
RE: Output as UTF8 not working?
user name
2007-04-13 14:32:53
On Fri, 13 Apr 2007, John Poole wrote:
> 
> I am unable to create a UTF-8 encoded file using the
following:

This isn't really a PDK related question. I would recommend
to use the
perl-unicodeperl.org mailing list for Unicode related questions.

[...]
 
> The output should look like this, specifically having
the UTF8 marker "
> EF BB BF":

Perl does not write BOMs (byte-order-marks) itself; you'll
have to write
them yourself.  You may want to take a look at the File::BOM
module:

    http://search.cpan.org/~mattlaw/File-BOM-0.14/lib/F
ile/BOM.pm

Cheers,
-Jan


_______________________________________________
PDK mailing list
PDKlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )