|
List Info
Thread: Problems with file names in UTF-8 on Windows
|
|
| Problems with file names in UTF-8 on
Windows |

|
2006-08-09 06:25:04 |
Hello.
First of all would like to thank for libxml. Useful and
convenient thing
has turned out.
Now on business.
First, realization in version 2.6.24 of file name processing
in the
UTF-8 encoding for Windows has led to the following
problems:
1. Updating library to new version results to incapacity for
work of
programs, which use file names in
native encoding; now all such programs are compelled to
transform
file names to UTF-8
2. The library became incompatible with Windows 95/98/ME, as
functions
_wfopen
and _wstat use features not realized by default in these
versions of
OS (bug #346367).
It seems reasonable to process file names in native encoding
by default,
and establish
transformation mode from UTF-8 obviously.
In attachment there is a corrected variant of xmlIO.c. A
name
transformation mode
is established by function xmlSetFileNameMode.
However using of names in UTF-8 in the offered realization
is possible
only in
Windows NT/2000/XP/... For Windows 9x it should to add
reverse
transformation from Unicode to native encoding.
Second, it would be quite good to add in library group of
simple exported
functions for read access to fields of structures.
It will simplify API description in other languages and will
allow not to
recompile programs after possible changes of library
structures.
The example of realization of similar functions is in the
same archive
(files wrappers.*).
At reading of string fields copying is not carried out to
reduce call
overhead.
All changes are made on the basis of library version 2.6.26.
With best regards, Emelyanov Alexey.
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|
|
| Problems with file names in UTF-8 on
Windows |

|
2006-08-09 08:46:18 |
On Wed, Aug 09, 2006 at 10:25:04AM +0400, Emelyanov Alexey
wrote:
> Hello.
>
> First of all would like to thank for libxml. Useful and
convenient thing
> has turned out.
>
> Now on business.
>
> First, realization in version 2.6.24 of file name
processing in the
> UTF-8 encoding for Windows has led to the following
> problems:
>
> 1. Updating library to new version results to
incapacity for work of
> programs, which use file names in
> native encoding; now all such programs are compelled
to transform
> file names to UTF-8
> 2. The library became incompatible with Windows
95/98/ME, as functions
> _wfopen
> and _wstat use features not realized by default in
these versions of
> OS (bug #346367).
>
> It seems reasonable to process file names in native
encoding by default,
> and establish
> transformation mode from UTF-8 obviously.
I don't think it's obvious. Roland Schwingel who
provided that patch
argued differently. I don't use Windows, I have no way to
test or check,
I have to rely on the expertise of people on the
mailing-list in that area.
> In attachment there is a corrected variant of xmlIO.c.
A name
> transformation mode
> is established by function xmlSetFileNameMode.
I'm sorry, send contextual patches, not new files, even
worse a bunch of
files. You must send a patch, which shows up exactly what
you modified.
Also you should send a clear explanation of the
modifications, why you changed
things. "a corrected variant" is not acceptable
for review, sorry.
Moreover I expect all those changes/diff to be guarded by
#ifdef WIN32
or something similar at the code level, because obviously
this should not
affect non Windows code in any way.
Last but not least xmlSetFileNameMode() is not acceptable,
this means
having to introduce a global variable in the library, and
I'm trying to
get rid of them. If you want different mode of operation for
older Windows
version find a way to detect that version at compile time or
runtime, but
adding a new API which makes no sense on other platforms
introducing a
global variable is definitely not okay.
> However using of names in UTF-8 in the offered
realization is possible
> only in
> Windows NT/2000/XP/... For Windows 9x it should to add
reverse
> transformation from Unicode to native encoding.
I do not understand clearly what you mean here, is taht
what you suggest
to do, what your changes should do or something else ?
> Second, it would be quite good to add in library group
of simple exported
> functions for read access to fields of structures.
> It will simplify API description in other languages and
will allow not to
> recompile programs after possible changes of library
structures.
>
> The example of realization of similar functions is in
the same archive
> (files wrappers.*).
> At reading of string fields copying is not carried out
to reduce call
> overhead.
Okay, that's not acceptable. Adding a new header involves
a lot of work
not just merely adding a file to the subdir. I think it's
frivolous to
add one for teh reason exposed. Moreover I disagree with
adding accesors
on technical ground:
- libxml2 exports a lot of existing structures,
containing a lot of fields
- if we start adding accessors, this means a lot of new
function
- this won't help for API since existing uses those
structures
- adding new functions is costly *at runtime*
to be clear about the last point libxml2 already has more
than 1500 exported
entry point. For position independant code in shared
libraries there is a
runtime cost of relocating all exported symbols, if we start
adding accessors
that's so many more work to be done, so I'm against it
unless it's for new
functionalities and it's clear that the number of entry
point is low.
So overall, I'm sorry I cannot work on your code
submission, it's really
too far off from the normal review process, not in line with
libxml2 development
rules. I suggest you revisit the issue based on my feedback,
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|
|
| Problems with file names in UTF-8 on
Windows |

|
2006-08-09 09:54:12 |
|
Hi...
The utf-8 support for Windows was my
idea and my patch, so I feel responsible for the problems
> On Wed, Aug 09, 2006 at 10:25:04AM +0400, Emelyanov
Alexey wrote:
> > 1. Updating library to new version results to incapacity for
work of
> > programs, which use file names in
> > native encoding; now all such programs are compelled
to transform
> > file names to UTF-8
UTF-8 is IMHO the best choice to handle nowadays, but well I see the
problem... I think I will modify my patch to have
a fallback mode if
UTF-8 file is not present/accessable.
> > 2. The library became incompatible with
Windows 95/98/ME, as functions
> > _wfopen
> > and _wstat use features not realized by default
in these versions of
> > OS (bug #346367).
Ok.. I will adress that, too. Did not know that there is a bug report.
At present I am awfully busy, but I hope I can supply
my revised patch (based
on libxml 2.6.26) by beginning of next week.
I hope this will solve all problems with win9x and
non utf-8 encoding without
adding new api. Would this be ok for everyone?
Roland |
| Problems with file names in UTF-8 on
Windows |

|
2006-08-09 10:04:16 |
On Wed, Aug 09, 2006 at 11:54:12AM +0200, Roland Schwingel
wrote:
> Hi...
>
> The utf-8 support for Windows was my idea and my patch,
so I feel
> responsible for the problems
>
> > On Wed, Aug 09, 2006 at 10:25:04AM +0400,
Emelyanov Alexey wrote:
> > > 1. Updating library to new version results to
incapacity for work of
> > > programs, which use file names in
> > > native encoding; now all such programs are
compelled to transform
> > > file names to UTF-8
> UTF-8 is IMHO the best choice to handle nowadays, but
well I see the
> problem... I think I will modify my patch to have a
fallback mode if
> UTF-8 file is not present/accessable.
>
> > > 2. The library became incompatible with
Windows 95/98/ME, as functions
>
> > > _wfopen
> > > and _wstat use features not realized by
default in these versions
> of
> > > OS (bug #346367).
> Ok.. I will adress that, too. Did not know that there
is a bug report.
>
> At present I am awfully busy, but I hope I can supply
my revised patch
> (based
> on libxml 2.6.26) by beginning of next week.
>
> I hope this will solve all problems with win9x and non
utf-8 encoding
> without
> adding new api. Would this be ok for everyone?
That sounds excellent to me. I didn't expect a new
release within a
couple of weeks so even if it takes a bit of time it is not
a big deal,
Daniel
--
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|
|
| Problems with file names in UTF-8 on
Windows |

|
2006-08-16 14:58:00 |
|
Hi...
> > The utf-8 support for Windows was my idea
and my patch, so I feel
> > responsible for the problems
> > [...]
> > > > OS (bug #346367).
> > Ok.. I will adress that, too. Did not know that there is a bug
report.
> >
> > At present I am awfully busy, but I hope I can supply my revised
patch
> > (based
> > on libxml 2.6.26) by beginning of next week.
> >
> > I hope this will solve all problems with win9x and non utf-8
encoding
> > without
> > adding new api. Would this be ok for everyone?
>
> That sounds excellent to me. I didn't expect a new release
within a
> couple of weeks so even if it takes a bit of time it is not a big
deal,
>
> Daniel
Here comes my revised/extended patch.
What is the state now:
In the case that a path cannot be accessed on disk
asuming the path to be in
utf-8 on windows, it is also tried with native encoding
now as fallback. That should
fix the first part.
Because of win9x compatibility it is now decided on
runtime whether a system
is capable of calling _wstat()/_wfopen(). If the system
is not capable doing it,
my utf-8 part is invisible. This should also fix bug
#346367. But well, I do not
have a win9x installation so I implemented it blind
but it *should*really* work.
(OT: Is win9x nowadays really of any relevance for
professional applications?
We dropped support for it several years ago,
and nobody really complained. But this
is a different discussion, but someday libxml2
should IMO also declare End-Of-Life
for win9x.)
When doing the patch I found 2 static functions in
xmlIO.c doing quite the same thing.
xmlSysIDExists() and xmlNoNetExists(). In favour of
simplicity I decided to discard xmlSysIDExists().
So I hope this resolves all pending issues. Feel free
to reply in case of any problems.
Roland
|
| Problems with file names in UTF-8 on
Windows |

|
2006-08-16 15:05:58 |
On Wed, Aug 16, 2006 at 04:58:00PM +0200, Roland Schwingel
wrote:
> Hi...
>
> > > The utf-8 support for Windows was my idea and
my patch, so I feel
> > > responsible for the problems
> > > [...]
> > > > > OS (bug #346367).
> > > Ok.. I will adress that, too. Did not know
that there is a bug report.
> > >
> > > At present I am awfully busy, but I hope I
can supply my revised patch
>
> > > (based
> > > on libxml 2.6.26) by beginning of next week.
> > >
> > > I hope this will solve all problems with
win9x and non utf-8 encoding
> > > without
> > > adding new api. Would this be ok for
everyone?
> >
> > That sounds excellent to me. I didn't expect a
new release within a
> > couple of weeks so even if it takes a bit of time
it is not a big deal,
> >
> > Daniel
>
> Here comes my revised/extended patch.
To follow a good tradition, it seems you forgot the patch
I do that all the time too !
> What is the state now:
> In the case that a path cannot be accessed on disk
asuming the path to be
> in
> utf-8 on windows, it is also tried with native encoding
now as fallback.
> That should
> fix the first part.
>
> Because of win9x compatibility it is now decided on
runtime whether a
> system
> is capable of calling _wstat()/_wfopen(). If the system
is not capable
> doing it,
> my utf-8 part is invisible. This should also fix bug
#346367. But well, I
> do not
> have a win9x installation so I implemented it blind but
it *should*really*
> work.
Sounds like famous last words well I
expect people with win9x to try it
out !
> (OT: Is win9x nowadays really of any relevance for
professional
> applications?
> We dropped support for it several years ago, and
nobody really
> complained. But this
> is a different discussion, but someday libxml2 should
IMO also declare
> End-Of-Life
> for win9x.)
You know we have code in there for VMS and MVS, somehow
portability even
to older platform is a tradition here.
> When doing the patch I found 2 static functions in
xmlIO.c doing quite the
> same thing.
> xmlSysIDExists() and xmlNoNetExists(). In favour of
simplicity I decided
> to discard xmlSysIDExists().
As long as static and identical, fine by me
> So I hope this resolves all pending issues. Feel free
to reply in case of
> any problems.
yup, can we get that sweet patch ?
Daniel
--
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|
|
[1-6]
|
|