List Info

Thread: Problems with utf-8




Problems with utf-8
user name
2006-03-21 17:19:23
Hi Julian,
After migrating from SuSE 9.0 to 9.3 I decided to change all my projects and
source files from iso8859-1 to utf-8 using recode. Than I changed project and
source encoding in DialogBlocks (actually 2.12 Gtk unicode) an everything
works fine, for example having "German Umlauts" in string literals for
buttons or messages. GCC compiles that sources even if wxUSE_UNICODE isn't
set. (BTW, wxGTK-2.6.2 don't compile when configured with "--with-odbc" in
unicode mode, but that's another story).

When I change to VC6 on a WinXP box to compile the same sources the Umlauts of
the literal strings are destroyed in the editor and in the compiled program,
two. All strings have the "_()" or "_T()" macro. Searching the web, msdn, vc6
help for hours didn't help to find how to do it right.

As I don't believe that you are embedding an option for generating Unicode
sources if they won't work on windows the only conclusion is that I'm doing
something wrong. But I'm running out of ideas.

Have you any ideas where to look at? They would be appreciated.
Thanks
Thomas

--
Dipl.-Ing. Thomas Zehbe
INGENION GmbH
Kuhweide 6
31552 Apelern
Fon: 05043 / 40 57 904
Fax: 05043 / 40 57 907
Problems with utf-8
user name
2006-03-22 15:34:02
Hi Thomas,

At 17:19 21/03/2006, you wrote:
>Hi Julian,
>After migrating from SuSE 9.0 to 9.3 I decided to change all my projects and
>source files from iso8859-1 to utf-8 using recode. Than I changed project and
>source encoding in DialogBlocks (actually 2.12 Gtk unicode) an everything
>works fine, for example having "German Umlauts" in string literals for
>buttons or messages. GCC compiles that sources even if wxUSE_UNICODE isn't
>;set. (BTW, wxGTK-2.6.2 don't compile when configured with "--with-odbc" in
>unicode mode, but that's another story).
>
>When I change to VC6 on a WinXP box to compile the same sources the
>Umlauts of
>the literal strings are destroyed in the editor and in the compiled program,
>two. All strings have the "_()" or "_T()" macro. Searching the web, msdn, vc6
>help for hours didn't help to find how to do it right.
>
>As I don't believe that you are embedding an option for generating Unicode
>sources if they won't work on windows the only conclusion is that I'm doing
>;something wrong. But I'm running out of ideas.

Hm. I created a new project from scratch with DB 2.12, set the project and
source
encodings to utf-8, and created a dialog with a button label containing an
e-acute
and an a-umlaut (pasted from Word). The dialog previews OK, and the label
appears correctly
in the source file. However when I compile it as a Unicode Debug project in
VC++ 6,
the dialog shows the label with the wrong characters, as if it's trying to
display
a Unicode string thinking it's in ISO-8859-1.

AFAIK what's going on is that the compiler doesn't know the file
is encoded in UTF-8, so the characters are being converted from ISO-8859-1
or similar to Unicode, when they're already in Unicode.

It's normally recommended that C++ source files only contain ASCII, and
catalogs are used to translate from ASCII strings to strings
in whatever encoding you choose. When I switch to XRC mode and regenerate
the code, the dialog does come up with the right text, because it's loaded
from an XRC that is marked with UTF-8.

In theory we should be able to allow UTF-8 source but I don't know how
to get the compiler to recognize the encoding.

Here's an excerpt from a relevant newsgroup article:

====

  > If I write in a source file
> wchar_t* st  "something"; what encoding would it be stored as? And
> what
> about wchar_t* st  L"something";;  UTF-8?

Let me to quote one of post by Ulrich Eckhardt (from
microsoft.public.windowsce.embedded.vc), here is complete thread
so you can get a better overview of the problem I asked -
http://tinyurl.com/dbhyj:

"It is invalid C or C++ to embed these characters*** into sourcecode.
You are relying on compiler-specific support.
That said, there is a #pragma to tell MSC which codepage you're using."

====

Regards,

Julian



Problems with utf-8, configuration settings in DB
user name
2006-03-30 17:23:00
Hi Julian,
Now it is an infomation of what I found out to solve my utf8 problem and a
question to DB.
Having had a brief look at and a small test with the xrc story didn't make
happy. So I decided to go another way.

First of all I tried to get a "with-odbc with-unicode" version (2.6.2)
compiled on my box to get a comparable status quo on both machines.
After looing deep into the system I found that I wasn't working with the
builtin iodbc as I assumed but with an similar old installed version.

Trying to compile with-odbc=builtin failed, too, as there is no or not the
right unicode support in this builtin version.

So I installed libiodbc-3.52.2 and I got my odbc-unicode version up and
running.
After compiling an linking my app in unicode mode I got the same destroyed
literal strings as on windows when I used utf8 encoding for the source files!
So switching back all the sources to at least iso8859-1 works fine for both
systems.

I assume that in non unicode compile Gtk just displays the utf8 coded literals
right cause my environment is set to de_DE.UTF-8.

Now the question. During my tests I tried to set an additional preprocessor
flag in the configurations dialog. This settings seem to be written
to /dev/null as they don't have any influence in my case even when I'm adding
a non existent one. Is this ok? Did I miss something?
I checked the auto settings an found:
-fno-rtti -fno-pcc-struct-return -fstrict-aliasing -Wall -D__WXMSW__
-D__GNUWIN32__ -D__WIN95__ -E
and I am a bit astonished about the windows related defines in my GCC configs.
Did I miss something or is it a tiny bug?

BTW. Now I'll download 2.13. Thanks for the "find patch"; and all your other
efforts on improving DialogBlocks on a day to day schedule.
Best Regards
Thomas

Am Mittwoch, 22. März 2006 16:34 schrieb Julian Smart:
> Hi Thomas,
>
> At 17:19 21/03/2006, you wrote:
> >Hi Julian,
> >After migrating from SuSE 9.0 to 9.3 I decided to change all my projects
> > and source files from iso8859-1 to utf-8 using recode. Than I changed
> > project and source encoding in DialogBlocks (actually 2.12 Gtk unicode)
> > an everything works fine, for example having "German Umlauts" in string
> > literals for buttons or messages. GCC compiles that sources even if
> > wxUSE_UNICODE isn't set. (BTW, wxGTK-2.6.2 don't compile when configured
> > with "--with-odbc" in unicode mode, but that's another story).
> >
> >When I change to VC6 on a WinXP box to compile the same sources the
> >Umlauts of
> >the literal strings are destroyed in the editor and in the compiled
> > program, two. All strings have the "_()" or "_T()" macro. Searching the
> > web, msdn, vc6 help for hours didn't help to find how to do it right.
> >
> >As I don't believe that you are embedding an option for generating Unicode
> >sources if they won't work on windows the only conclusion is that I'm
> > doing something wrong. But I'm running out of ideas.
>
> Hm. I created a new project from scratch with DB 2.12, set the project and
> source
> encodings to utf-8, and created a dialog with a button label containing an
> e-acute
> and an a-umlaut (pasted from Word). The dialog previews OK, and the label
>; appears correctly
> in the source file. However when I compile it as a Unicode Debug project in
> VC++ 6,
> the dialog shows the label with the wrong characters, as if it's trying to
> display
> a Unicode string thinking it's in ISO-8859-1.
>
> AFAIK what's going on is that the compiler doesn't know the file
> is encoded in UTF-8, so the characters are being converted from ISO-8859-1
> or similar to Unicode, when they're already in Unicode.
>
> It's normally recommended that C++ source files only contain ASCII, and
> catalogs are used to translate from ASCII strings to strings
> in whatever encoding you choose. When I switch to XRC mode and regenerate
> the code, the dialog does come up with the right text, because it's loaded
> from an XRC that is marked with UTF-8.
>
> In theory we should be able to allow UTF-8 source but I don't know how
> to get the compiler to recognize the encoding.
>
> Here's an excerpt from a relevant newsgroup article:
>
> ====
>
;  > If I write in a source file
>  >
>  > wchar_t* st  "something"; what encoding would it be stored as? And
>  > what
>  > about wchar_t* st  L"something";;  UTF-8?
>
> Let me to quote one of post by Ulrich Eckhardt (from
>; microsoft.public.windowsce.embedded.vc), here is complete thread
> so you can get a better overview of the problem I asked -
> http://tinyurl.com/dbhyj:
>
> "It is invalid C or C++ to embed these characters*** into sourcecode.
> You are relying on compiler-specific support.
> That said, there is a #pragma to tell MSC which codepage you're using."
>
> ====
>
> Regards,
>
> Julian
>
>
>
>
>
>
> Yahoo! Groups Links
>;
>
>

--
Dipl.-Ing. Thomas Zehbe
INGENION GmbH
Kuhweide 6
31552 Apelern
Fon: 05043 / 40 57 904
Fax: 05043 / 40 57 907
Problems with utf-8, configuration settings in DB
user name
2006-03-31 11:28:40
Hi Thomas,

At 18:23 30/03/2006, you wrote:
>First of all I tried to get a "with-odbc with-unicode" version (2.6.2)
>compiled on my box to get a comparable status quo on both machines.
>After looing deep into the system I found that I wasn't working with the
>builtin iodbc as I assumed but with an similar old installed version.
>
>Trying to compile with-odbc=builtin failed, too, as there is no or not the
>right unicode support in this builtin version.
>
>So I installed libiodbc-3.52.2 and I got my odbc-unicode version up and
>running.

Aha, interesting.

>After compiling an linking my app in unicode mode I got the same destroyed
>literal strings as on windows when I used utf8 encoding for the source files!
>So switching back all the sources to at least iso8859-1 works fine for both
>systems.

Right.

>I assume that in non unicode compile Gtk just displays the utf8 coded
>literals
>right cause my environment is set to de_DE.UTF-8.

I guess so. Anyhow I'm glad you got things working satisfactorily in the
end.

>Now the question. During my tests I tried to set an additional preprocessor
>flag in the configurations dialog. This settings seem to be written
>to /dev/null as they don't have any influence in my case even when I'm adding
>a non existent one. Is this ok? Did I miss something?
>I checked the auto settings an found:
>  -fno-rtti -fno-pcc-struct-return -fstrict-aliasing -Wall -D__WXMSW__
>-D__GNUWIN32__ -D__WIN95__ -E
>and I am a bit astonished about the windows related defines in my GCC configs.
>Did I miss something or is it a tiny bug?

In fact these aren't used, since wx-config is used instead, but I
agree they shouldn't be displayed. Probably I need to check
my inheritance hierarchy.

>BTW. Now I'll download 2.13. Thanks for the "find patch"; and all your other
>;efforts on improving DialogBlocks on a day to day schedule.

My pleasure!

Thanks,

Julian


[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )