On Fri, Apr 14, 2006 at 12:16:36PM -0700, Jim Blandy wrote:
> The command line and MI already use the ISO C syntax
for conveying
> values to the user/consumer. I'm just saying we
should expand our use
> of the syntax we already use.
I don't agree.
Saying "we use ISO C syntax for conveying data"
is fairly inaccurate.
We are inconsistent. Some things are escaped in a C-like
fashion.
Other things are escaped in other fashions, with their own
quoting
rules. This is true in both directions, for user input and
for output.
Let's consider strings in particular. Strings are printed
using
LA_PRINT_STRING. As the name implies, the quoting done is
adjusted
to match the source language convention. Asking an FE to
grok that
is just impractical. In data intended for CLI users, we can
prettyprint things any way we want; in data intended for
anything
more machinelike, I recommend we define a syntax and stick
with it.
Personally, I'd just use UTF-8. If you want GDB's output,
expect it to
be UTF-8. The MI layer is a "transport", and
can add its own necessary
escaping (of quote marks, mostly). Alternatively, make GDB
output in
the current locale's character set.
So, if we print a wchar_t string as a string, and the user
has conveyed
to us that their wchar_t strings are Unicode code points,
then we can
convert that to the appropriate multibyte string on output
using the
host character set.
Picked a host character set that can't represent some
target characters?
The CLI should fall back to pretty escape sequences, I
don't know what
the MI should do, but probably the answer is unimportant.
> My point is, MI consumers are already parsing ISO C
strings. They
> just need to parse more of them.
IMO, we need to make them parse less of them.
Everywhere the MI consumer needs to parse something which
originated
as GDB CLI output, things go bad. For instance, MI
consumers may get
confused by the automatic limits on "set print
elements", which
truncates strings.
After "set print elements 2":
(gdb) interpreter-exec mi "-var-create - *
\"(char *)&__libc_version\""
^done,name="var1",numchild="1",type=
"char *"
(gdb)
(gdb) interpreter-exec mi "-var-evaluate-expression
var1"
^done,value="0x102a80 \"2.\"..."
(gdb)
Not very nice of us, was that?
> There is no provision in ISO C for variable-size
wchar_t encodings.
> The portion of the standard I referred to says that
wchar_t "...is an
> integer type whose range of values can represent
distinct codes for
> all members of the largest extended character set
speci???ed among the
> supported locales".
(A) GDB supports languages other than C.
(B) While I am inclined to agree with you about the language
of ISO C,
we don't get to ignore the reality of platforms with a
16-bit wchar_t
which store UTF-16 in it.
--
Daniel Jacobowitz
CodeSourcery
|