List Info

Thread: Problems with file names in UTF-8 on Windows




Problems with file names in UTF-8 on Windows
user name
2006-08-09 06:25:04
Hello.

First of all would like to thank for libxml. Useful and
convenient thing 
has turned out.

Now on business.

First, realization in version 2.6.24 of file name processing
in the 
UTF-8 encoding for Windows has led to the following
problems:

1. Updating library to new version results to incapacity for
work of 
programs, which use file names in
    native encoding; now all such programs are compelled to
transform 
file names to UTF-8
2. The library became incompatible with Windows 95/98/ME, as
functions 
_wfopen
    and _wstat use features not realized by default in these
versions of 
OS (bug #346367).

It seems reasonable to process file names in native encoding
by default, 
and establish
transformation mode from UTF-8 obviously.
In attachment there is a corrected variant of xmlIO.c. A
name 
transformation mode
is established by function xmlSetFileNameMode.

However using of names in UTF-8 in the offered realization
is possible 
only in
Windows NT/2000/XP/... For Windows 9x it should to add
reverse
transformation from Unicode to native encoding.

Second, it would be quite good to add in library group of
simple exported
functions for read access to fields of structures.
It will simplify API description in other languages and will
allow not to
recompile programs after possible changes of library
structures.

The example of realization of similar functions is in the
same archive 
(files wrappers.*).
At reading of string fields copying is not carried out to
reduce call 
overhead.

All changes are made on the basis of library version 2.6.26.


With best regards, Emelyanov Alexey.



_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
Problems with file names in UTF-8 on Windows
user name
2006-08-09 08:46:18
On Wed, Aug 09, 2006 at 10:25:04AM +0400, Emelyanov Alexey
wrote:
> Hello.
> 
> First of all would like to thank for libxml. Useful and
convenient thing 
> has turned out.
> 
> Now on business.
> 
> First, realization in version 2.6.24 of file name
processing in the 
> UTF-8 encoding for Windows has led to the following
> problems:
> 
> 1. Updating library to new version results to
incapacity for work of 
> programs, which use file names in
>    native encoding; now all such programs are compelled
to transform 
> file names to UTF-8
> 2. The library became incompatible with Windows
95/98/ME, as functions 
> _wfopen
>    and _wstat use features not realized by default in
these versions of 
> OS (bug #346367).
> 
> It seems reasonable to process file names in native
encoding by default, 
> and establish
> transformation mode from UTF-8 obviously.

  I don't think it's obvious. Roland Schwingel who
provided that patch 
argued differently. I don't use Windows, I have no way to
test or check,
I have to rely on the expertise of people on the
mailing-list in that area.

> In attachment there is a corrected variant of xmlIO.c.
A name 
> transformation mode
> is established by function xmlSetFileNameMode.

  I'm sorry, send contextual patches, not new files, even
worse a bunch of
files. You must send a patch, which shows up exactly what
you modified.
Also you should send a clear explanation of the
modifications, why you changed
things. "a corrected variant" is not acceptable
for review, sorry.
  Moreover I expect all those changes/diff to be guarded by
#ifdef WIN32
or something similar at the code level, because obviously
this should not
affect non Windows code in any way.
  Last but not least xmlSetFileNameMode() is not acceptable,
this means 
having to introduce a global variable in the library, and
I'm trying to
get rid of them. If you want different mode of operation for
older Windows
version find a way to detect that version at compile time or
runtime, but
adding a new API which makes no sense on other platforms
introducing a
global variable is definitely not okay.
  
> However using of names in UTF-8 in the offered
realization is possible 
> only in
> Windows NT/2000/XP/... For Windows 9x it should to add
reverse
> transformation from Unicode to native encoding.

  I do not understand clearly what you mean here, is taht
what you suggest
to do, what your changes should do or something else ?

> Second, it would be quite good to add in library group
of simple exported
> functions for read access to fields of structures.
> It will simplify API description in other languages and
will allow not to
> recompile programs after possible changes of library
structures.
> 
> The example of realization of similar functions is in
the same archive 
> (files wrappers.*).
> At reading of string fields copying is not carried out
to reduce call 
> overhead.

  Okay, that's not acceptable. Adding a new header involves
a lot of work
not just merely adding a file to the subdir. I think it's
frivolous to
add one for teh reason exposed. Moreover I disagree with
adding accesors
on technical ground:
   - libxml2 exports a lot of existing structures,
containing a lot of fields
   - if we start adding accessors, this means a lot of new
function
   - this won't help for API since existing uses those
structures
   - adding new functions is costly *at runtime*

 to be clear about the last point libxml2 already has more
than 1500 exported
entry point. For position independant code in shared
libraries there is a
runtime cost of relocating all exported symbols, if we start
adding accessors
that's so many more work to be done, so I'm against it
unless it's for new
functionalities and it's clear that the number of entry
point is low.

 So overall, I'm sorry I cannot work on your code
submission, it's really
too far off from the normal review process, not in line with
libxml2 development
rules. I suggest you revisit the issue based on my feedback,

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
Problems with file names in UTF-8 on Windows
user name
2006-08-09 09:54:12

Hi...

The utf-8 support for Windows was my idea and my patch, so I feel responsible for the problems

> On Wed, Aug 09, 2006 at 10:25:04AM +0400, Emelyanov Alexey wrote:
> > 1. Updating library to new version results to incapacity for work of
> > programs, which use file names in
> >    native encoding; now all such programs are compelled to transform
> > file names to UTF-8
UTF-8 is IMHO the best choice to handle nowadays, but well I see the

problem... I think I will modify my patch to have a fallback mode if
UTF-8 file is not present/accessable.

> > 2. The library became incompatible with Windows 95/98/ME, as functions
> > _wfopen
> >    and _wstat use features not realized by default in these versions of
> > OS (bug #346367).
Ok.. I will adress that, too. Did not know that there is a bug report.


At present I am awfully busy, but I hope I can supply my revised patch (based
on libxml 2.6.26) by beginning of next week.

I hope this will solve all problems with win9x and non utf-8 encoding without
adding new api. Would this be ok for everyone?

Roland
Problems with file names in UTF-8 on Windows
user name
2006-08-09 10:04:16
On Wed, Aug 09, 2006 at 11:54:12AM +0200, Roland Schwingel
wrote:
> Hi...
> 
> The utf-8 support for Windows was my idea and my patch,
so I feel 
> responsible for the problems
> 
> > On Wed, Aug 09, 2006 at 10:25:04AM +0400,
Emelyanov Alexey wrote:
> > > 1. Updating library to new version results to
incapacity for work of 
> > > programs, which use file names in
> > >    native encoding; now all such programs are
compelled to transform 
> > > file names to UTF-8
> UTF-8 is IMHO the best choice to handle nowadays, but
well I see the
> problem... I think I will modify my patch to have a
fallback mode if 
> UTF-8 file is not present/accessable.
> 
> > > 2. The library became incompatible with
Windows 95/98/ME, as functions 
> 
> > > _wfopen
> > >    and _wstat use features not realized by
default in these versions 
> of 
> > > OS (bug #346367).
> Ok.. I will adress that, too. Did not know that there
is a bug report.
> 
> At present I am awfully busy, but I hope I can supply
my revised patch 
> (based
> on libxml 2.6.26) by beginning of next week.
> 
> I hope this will solve all problems with win9x and non
utf-8 encoding 
> without
> adding new api. Would this be ok for everyone?

  That sounds excellent to me. I didn't expect a new
release within a 
couple of weeks so even if it takes a bit of time it is not
a big deal,

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
Problems with file names in UTF-8 on Windows
user name
2006-08-16 14:58:00

Hi...

> > The utf-8 support for Windows was my idea and my patch, so I feel
> > responsible for the problems
> > [...]

> > > > OS (bug #346367).
> > Ok.. I will adress that, too. Did not know that there is a bug report.
> >
> > At present I am awfully busy, but I hope I can supply my revised patch
> > (based
> > on libxml 2.6.26) by beginning of next week.
>; >
> > I hope this will solve all problems with win9x and non utf-8 encoding
> > without
> > adding new api. Would this be ok for everyone?
>
>   That sounds excellent to me. I didn't expect a new release within a
> couple of weeks so even if it takes a bit of time it is not a big deal,
>
> Daniel

Here comes my revised/extended patch.

What is the state now:
In the case that a path cannot be accessed on disk asuming the path to be in
utf-8 on windows, it is also tried with native encoding now as fallback. That should
fix the first part.

Because of win9x compatibility it is now decided on runtime whether a system
is capable of calling _wstat()/_wfopen(). If the system is not capable doing it,
my utf-8 part is invisible. This should also fix bug #346367. But well, I do not
have a win9x installation so I implemented it blind but it *should*really* work.
(OT: Is win9x nowadays really of any relevance for professional applications?
 We dropped support for it several years ago, and nobody really complained. But this
 is a different discussion, but someday libxml2 should IMO also declare End-Of-Life
 for win9x.)

When doing the patch I found 2 static functions in xmlIO.c doing quite the same thing.
xmlSysIDExists() and xmlNoNetExists(). In favour of simplicity I decided to discard xmlSysIDExists().

So I hope this resolves all pending issues. Feel free to reply in case of any problems.

Roland
Problems with file names in UTF-8 on Windows
user name
2006-08-16 15:05:58
On Wed, Aug 16, 2006 at 04:58:00PM +0200, Roland Schwingel
wrote:
> Hi...
> 
> > > The utf-8 support for Windows was my idea and
my patch, so I feel 
> > > responsible for the problems
> > > [...]
> > > > > OS (bug #346367).
> > > Ok.. I will adress that, too. Did not know
that there is a bug report.
> > > 
> > > At present I am awfully busy, but I hope I
can supply my revised patch 
> 
> > > (based
> > > on libxml 2.6.26) by beginning of next week.
> > > 
> > > I hope this will solve all problems with
win9x and non utf-8 encoding 
> > > without
> > > adding new api. Would this be ok for
everyone?
> > 
> >   That sounds excellent to me. I didn't expect a
new release within a 
> > couple of weeks so even if it takes a bit of time
it is not a big deal,
> > 
> > Daniel
> 
> Here comes my revised/extended patch.

  To follow a good tradition, it seems you forgot the patch

I do that all the time too !

> What is the state now:
> In the case that a path cannot be accessed on disk
asuming the path to be 
> in
> utf-8 on windows, it is also tried with native encoding
now as fallback. 
> That should
> fix the first part.
> 
> Because of win9x compatibility it is now decided on
runtime whether a 
> system
> is capable of calling _wstat()/_wfopen(). If the system
is not capable 
> doing it,
> my utf-8 part is invisible. This should also fix bug
#346367. But well, I 
> do not
> have a win9x installation so I implemented it blind but
it *should*really* 
> work.

  Sounds like famous last words  well I
expect people with win9x to try it
out !

> (OT: Is win9x nowadays really of any relevance for
professional 
> applications? 
>  We dropped support for it several years ago, and
nobody really 
> complained. But this
>  is a different discussion, but someday libxml2 should
IMO also declare 
> End-Of-Life
>  for win9x.)

  You know we have code in there for VMS and MVS, somehow
portability even
to older platform is a tradition here.

> When doing the patch I found 2 static functions in
xmlIO.c doing quite the 
> same thing.
> xmlSysIDExists() and xmlNoNetExists(). In favour of
simplicity I decided 
> to discard xmlSysIDExists().

  As long as static and identical, fine by me

> So I hope this resolves all pending issues. Feel free
to reply in case of 
> any problems.

  yup, can we get that sweet patch  ?

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )