List Info

Thread: printing wchar_t*




printing wchar_t*
user name
2006-04-14 20:27:20
On Fri, Apr 14, 2006 at 12:16:36PM -0700, Jim Blandy wrote:
> The command line and MI already use the ISO C syntax
for conveying
> values to the user/consumer.  I'm just saying we
should expand our use
> of the syntax we already use.

I don't agree.

Saying "we use ISO C syntax for conveying data"
is fairly inaccurate. 
We are inconsistent.  Some things are escaped in a C-like
fashion. 
Other things are escaped in other fashions, with their own
quoting
rules.  This is true in both directions, for user input and
for output.

Let's consider strings in particular.  Strings are printed
using
LA_PRINT_STRING.  As the name implies, the quoting done is
adjusted
to match the source language convention.  Asking an FE to
grok that
is just impractical.  In data intended for CLI users, we can
prettyprint things any way we want; in data intended for
anything
more machinelike, I recommend we define a syntax and stick
with it.

Personally, I'd just use UTF-8.  If you want GDB's output,
expect it to
be UTF-8.  The MI layer is a "transport", and
can add its own necessary
escaping (of quote marks, mostly).  Alternatively, make GDB
output in
the current locale's character set.

So, if we print a wchar_t string as a string, and the user
has conveyed
to us that their wchar_t strings are Unicode code points,
then we can
convert that to the appropriate multibyte string on output
using the
host character set.

Picked a host character set that can't represent some
target characters?
The CLI should fall back to pretty escape sequences, I
don't know what
the MI should do, but probably the answer is unimportant.

> My point is, MI consumers are already parsing ISO C
strings.  They
> just need to parse more of them.

IMO, we need to make them parse less of them.

Everywhere the MI consumer needs to parse something which
originated
as GDB CLI output, things go bad.  For instance, MI
consumers may get
confused by the automatic limits on "set print
elements", which
truncates strings.

After "set print elements 2":

(gdb) interpreter-exec mi "-var-create - *
\"(char *)&__libc_version\""
^done,name="var1",numchild="1",type=
"char *"
(gdb) 
(gdb) interpreter-exec mi "-var-evaluate-expression
var1"
^done,value="0x102a80 \"2.\"..."
(gdb) 

Not very nice of us, was that?

> There is no provision in ISO C for variable-size
wchar_t encodings. 
> The portion of the standard I referred to says that
wchar_t "...is an
> integer type whose range of values can represent
distinct codes for
> all members of the largest extended character set
speci???ed among the
> supported locales".

(A) GDB supports languages other than C.

(B) While I am inclined to agree with you about the language
of ISO C,
we don't get to ignore the reality of platforms with a
16-bit wchar_t
which store UTF-16 in it.

-- 
Daniel Jacobowitz
CodeSourcery
printing wchar_t*
user name
2006-04-14 22:18:50
As far as conveying strings accurately to GUI's via MI is
concerned:

It's fine to improve the way MI conveys data to the front
end.  It
seems to me we still need to do things like repetition
elimination and
length limiting, but that syntax should certainly be
designed to make
the front ends' life easier.

I'm not so sure about GDB doing character set conversion. 
I think I'd
rather see GDB concentrate on accurately and safely
conveying target
code points to the front end, and make the front end
responsible for
displaying it.  If the front end hasn't asked GDB to
"print" the value
in GDB's own way, then the front end has accepted
responsibility for
presentation, it seems to me.
printing wchar_t*
user name
2006-04-15 07:14:25
> Date: Fri, 14 Apr 2006 15:18:50 -0700
> From: "Jim Blandy" <jimbred-bean.com>
> 
> As far as conveying strings accurately to GUI's via MI
is concerned:
> 
> It's fine to improve the way MI conveys data to the
front end.  It
> seems to me we still need to do things like repetition
elimination and
> length limiting, but that syntax should certainly be
designed to make
> the front ends' life easier.

Do you agree that the array feature suggested by
Daniel is a step in
the right direction?

> I'm not so sure about GDB doing character set
conversion.  I think I'd
> rather see GDB concentrate on accurately and safely
conveying target
> code points to the front end, and make the front end
responsible for
> displaying it.

I'd rather see GDB offering something in this area as well,
but until
we have a volunteer for this job, this disagreement is
academic.
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )