List Info

Thread: utf-8 woes




utf-8 woes
user name
2008-02-27 07:49:43
Hi

I use xpathscript to process docbook xml articles for our
woundcare
journal World Wide Wounds (www.worldwidewounds.com).

In recent years we have had lots of articles referencing
articles in
Scandanavia, hence a requirement to use ö and other
entities in
authors' names.

Most of the time the processing works OK (they are usually
found in the
citations), but we have an issue at the moment where

       ö 

in the body of the text results in the output of a two-byte
character,
and yet in a different part of the document (the
bibliography) the
conversion works correctly and the 'o' with two little dots
above it
comes out OK when viewed by firefox using ISO-8859-1
encoding.

Clearly this is a problem in the way I am handling the
incoming xml.

An example of what doesn't work is text in a sidebar, where
my code is:

------------------------------------------------------------
------------
# Sidebar
$t->{'sidebar'} = sub {
  my ($node, $t) = _;
  my ($fileref);
  my $id;
  if ($id = findvalue('id', $node)) {
    $t-> .= "<a
name="$id"></a>";
  }
  # When we find a sidebar,
  # put link to it
  $t-> .= '<DIV CLASS="sidebar">';
  $t-> = '</DIV>';
  return 1; #
};

$t->{'para'} = sub {
  my ($node, $t2) = _;
  my ($id);
  my ($anchor);
  if ($id = findvalue('id', $node)) {
    $anchor=$id;
  }
  # we want to get rid of para breaks directly after
  # we start a glossary definition, so check for
  # $removepara > 0
  if ($removepara > 0) {
    $t2-> .= "";
  } else {
    if ($anchor ne "") {
      $t2-> .= "<p><a
name="$anchor"></a>";
  } else {
    $t2-> .= "<p>";
  }
    #    $t2-> = "</p>"; # this
really should work, but it messes up NS4.x
  }
  #$removepara=0;
  $removepara--;
  return 1;
 };
------------------------------------------------------------
------------

If I use:

 <sidebar><para>This article was sponsored by an
educational grant from
 L&#x00f6;nd Corp</para></sidebar>

the "Lönd Corp" comes out with a two-byte
character in it.

Is there some easy hack which I am missing which would
automatically
convert the text to the appropriate encoding ?  I suspect it
is
something I should be doing in the 'para' subroutines ?

I'm happy to send the three files (main file plus two
library files)
which do the conversion, but in total they come to about 30
pages of
code.

Any help much appreciated.

Regards,
Pete
--
Pete Phillips, Acting Director,     |   http://www.smtl.co.uk/
Surgical Materials Testing Lab,     |   http://www.worldwidew
ounds.com/
Princess of Wales Hospital, S Wales |   http://www.dressings.org/
Tel/Fax: +44 1656-752820/30         |   petesmtl.co.uk

------------------------------------------------------------
---------
To unsubscribe, e-mail: axkit-users-unsubscribeaxkit.org
For additional commands, e-mail: axkit-users-helpaxkit.org


Re: utf-8 woes
user name
2008-02-27 16:23:57
Howdie,

On Wed, Feb 27, 2008 at 01:49:43PM +0000, Pete Phillips
wrote:
> I use xpathscript to process docbook xml articles for
our woundcare
> journal World Wide Wounds (www.worldwidewounds.com).

	Before anything else: do you use the XPathScript bundled
with AxKit, or the standalone version from CPAN?  

Joy,
`/anick

------------------------------------------------------------
---------
To unsubscribe, e-mail: axkit-users-unsubscribeaxkit.org
For additional commands, e-mail: axkit-users-helpaxkit.org


Re: utf-8 woes
user name
2008-02-27 17:20:35
>>>>> "yanick" == yanick 
<yanickbabyl.dyndns.org> writes:

    yanick> Howdie, On Wed, Feb 27, 2008 at 01:49:43PM
+0000, Pete
    yanick> Phillips wrote:

    >> I use xpathscript to process docbook xml
articles for our
    >> woundcare journal World Wide Wounds
(www.worldwidewounds.com).

    yanick> 	Before anything else: do you use the
XPathScript bundled
    yanick> with AxKit, or the standalone version from
CPAN?

I'm using the CPAN stand alone version.

Pete

------------------------------------------------------------
---------
To unsubscribe, e-mail: axkit-users-unsubscribeaxkit.org
For additional commands, e-mail: axkit-users-helpaxkit.org


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )