List Info

Thread: UTF8 flag missing on stringified references




UTF8 flag missing on stringified references
user name
2007-10-17 08:00:45
    juerdlanova:~$ perl -MDevel::Peek -Mutf8 -e'
        { package Føø::Bær; sub new { bless {}, shift } }
        print Dump("".Føø::Bær->new)
    '
    SV = PV(0x8153ba8) at 0x81f86a8
      REFCNT = 1
      FLAGS = (PADTMP,POK,pPOK)
      PV = 0x815aff0
"F303270303270::B303246r=HASH(0x8152d90)"
      CUR = 27
      LEN = 28

I expected this to have:
      PV = 0x1234567
"F303270303270::B303246r=HASH(0x8152d9c)"
[UTF8 "Fxx::Bxr=HASH(0x8152d9c)"]
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####juerd.nl>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy
<salesconvolution.nl>

Re: UTF8 flag missing on stringified references
user name
2007-10-17 08:06:30
Could you file this to perlbug ? it's obviously one of the
things that
need to be remembered when cleaning up the compiler for
accepting utf8
identifiers properly. (So we could put TODO tests and a
meta-bug for
all the related issues)

On 17/10/2007, Juerd Waalboer <juerdconvolution.nl> wrote:
>     juerdlanova:~$ perl -MDevel::Peek -Mutf8 -e'
>         { package Føø::Bær; sub new { bless {}, shift }
}
>         print Dump("".Føø::Bær->new)
>     '
>     SV = PV(0x8153ba8) at 0x81f86a8
>       REFCNT = 1
>       FLAGS = (PADTMP,POK,pPOK)
>       PV = 0x815aff0
"F303270303270::B303246r=HASH(0x8152d90)"
>       CUR = 27
>       LEN = 28
>
> I expected this to have:
>       PV = 0x1234567
"F303270303270::B303246r=HASH(0x8152d9c)"
[UTF8 "Fxx::Bxr=HASH(0x8152d9c)"]
> --
> Met vriendelijke groet,  Kind regards,  Korajn
salutojn,
>
>   Juerd Waalboer:  Perl hacker  <#####juerd.nl>  <http://juerd.nl/sig>
>   Convolution:     ICT solutions and consultancy
<salesconvolution.nl>
>

Re: UTF8 flag missing on stringified references
user name
2007-10-17 08:38:06
Rafael Garcia-Suarez skribis 2007-10-17 15:06 (+0200):
> Could you file this to perlbug ? it's obviously one of
the things that
> need to be remembered when cleaning up the compiler for
accepting utf8
> identifiers properly. (So we could put TODO tests and a
meta-bug for
> all the related issues)

Sure
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####juerd.nl>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy
<salesconvolution.nl>

Re: UTF8 flag missing on stringified references
user name
2007-10-17 15:12:42
On Wed, Oct 17, 2007 at 03:00:45PM +0200, Juerd Waalboer
wrote:
}     juerdlanova:~$ perl -MDevel::Peek -Mutf8 -e'
}         { package Føø::Bær; sub new { bless {}, shift } }
}         print Dump("".Føø::Bær->new)
}     '
}     SV = PV(0x8153ba8) at 0x81f86a8
}       REFCNT = 1
}       FLAGS = (PADTMP,POK,pPOK)
}       PV = 0x815aff0
"F303270303270::B303246r=HASH(0x8152d90)"
}       CUR = 27
}       LEN = 28

Aren't package names just C-strings? They aren't SVs and
can't get
marked with an encoding. Your ae character isn't valid where
you used it.

-- 
Josh
Re: UTF8 flag missing on stringified references
user name
2007-10-17 15:19:19
josh skribis 2007-10-17 13:12 (-0700):
> Aren't package names just C-strings? They aren't SVs
and can't get
> marked with an encoding. Your ae character isn't valid
where you used it.

Identifiers are either ASCII or UTF-8, so the difference can
be detected
rather easily. In fact, marking them all with SvUTF8 would
only hurt at
a distance: upgrades that some legacy code can very probably
not handle.

It's not invalid, "use utf8" enables exactly
this.
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####juerd.nl>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy
<salesconvolution.nl>

Re: UTF8 flag missing on stringified references
user name
2007-10-18 16:47:39
On Wed, Oct 17, 2007 at 10:19:19PM +0200, Juerd Waalboer
wrote:
} josh skribis 2007-10-17 13:12 (-0700):
} > Aren't package names just C-strings? They aren't SVs
and can't get
} > marked with an encoding. Your ae character isn't
valid where you used it.
} 
} Identifiers are either ASCII or UTF-8, so the difference
can be detected
} rather easily. In fact, marking them all with SvUTF8 would
only hurt at
} a distance: upgrades that some legacy code can very
probably not handle.

Ok, where I remember this is from hv.h where stash names are
stored in
a character array. What I didn't know was that even though
it is
stored as a byte sequence, the very last byte has some flags
which
also tell whether the byte sequence is utf8 encoded or not.
Oops.

HEK_UTF8( HvNAME_HEK( hv ) ) ;; is utf8?
HEK_LEN( HvNAME_hek( hv ) ) ;; how many bytes long?
HvNAME( hv ) ;; the name

} It's not invalid, "use utf8" enables exactly
this.

Yep, you're right. I goofed.

-- 
Josh
[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )