List Info

Thread: Strange conflagration of characters during editing




Strange conflagration of characters during editing
user name
2006-07-24 22:14:15
Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs running
on Tomcat/linux.

I've seen this a few times, thought it was a freak, but
it's repeatable
for me.

When editing an article, I pasted in the title of a NYT
article* that
contains some curly quote characters. Each time I preview
and save,
these characters are replaced by multiple other characters
(must be
something in the filters) and continually get
"thicker." So whereas
the initial text was

    In the Race With Google, It?s Consistency vs. ?Wow?

after two or three edits that's been amplified to:

  In the Race With Google, It’s
  Consistency vs.
‘Wow’

and will continue in this fashion with each edit. This
happens
both within and outside of square bracket wiki link markup.

Anyone seen this before? Know what is causing it? It's
entirely
conceivable that it's something I'm doing custom -- am
currently
investigating. This seems to *only* happen with curly single
and
double quote characters from what I can see.

Thanks,

Murray

* http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

       In the evening
       The rice leaves in the garden
       Rustle in the autumn wind
       That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-07-25 07:28:01
Sounds like an UTF-8 issue.  Did you check

http://jspwiki.
org/wiki/TomcatAndUTF8

?

BTW, if the tick in "it's" is getting changed,
then you are using the  
wrong tick mark - you should be using Unicode 0027, not
00B4.  This  
is a common annoyance for non-Windows users, since the other
tick  
mark gets usually translated to a question mark.  Ditto with
the  
double-quote which is not 0022. 

/Janne

On Jul 25, 2006, at 01:14 , Murray Altheim wrote:

>
> Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs  
> running
> on Tomcat/linux.
>
> I've seen this a few times, thought it was a freak,
but it's  
> repeatable
> for me.
>
> When editing an article, I pasted in the title of a NYT
article* that
> contains some curly quote characters. Each time I
preview and save,
> these characters are replaced by multiple other
characters (must be
> something in the filters) and continually get
"thicker." So whereas
> the initial text was
>
>     In the Race With Google, It?s Consistency vs. ?Wow?
>
> after two or three edits that's been amplified to:
>
>   In the Race With Google, It’s
>   Consistency vs.
> ‘Wow’
>
> and will continue in this fashion with each edit. This
happens
> both within and outside of square bracket wiki link
markup.
>
> Anyone seen this before? Know what is causing it? It's
entirely
> conceivable that it's something I'm doing custom --
am currently
> investigating. This seems to *only* happen with curly
single and
> double quote characters from what I can see.
>
> Thanks,
>
> Murray
>
> * http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
>
............................................................
.......... 
> .....
> Murray Altheim <murray06altheim.com>            
                  
> ===  = =
> http://www.altheim.com
/murray/                                      
> = =  ===
> SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk   
            
> = =  = =
>
>        In the evening
>        The rice leaves in the garden
>        Rustle in the autumn wind
>        That blows through my reed hut.  -- Minamoto no
Tsunenobu
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-usersecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-07-25 21:59:23
Quoting Janne Jalkanen <Janne.Jalkanenecyrd.com>:
>
> Sounds like an UTF-8 issue.  Did you check
>
> http://jspwiki.
org/wiki/TomcatAndUTF8

Well, I've made the change suggested there and restarted
Tomcat.
Sadly, didn't make any difference.

> BTW, if the tick in "it's" is getting
changed, then you are using the 
>  wrong tick mark - you should be using Unicode 0027,
not 00B4.  This  
> is a common annoyance for non-Windows users, since the
other tick  
> mark gets usually translated to a question mark.  Ditto
with the  
> double-quote which is not 0022. 

The problem here isn't what I do, it's that if any user of
the wiki
either types or cuts and pastes one of these characters, it
starts
this ugly situation that upon each edit, or upon even each
preview,
continually adds and adds the unwanted characters. I think
this would
be very confusing, particularly to new users. It's just a
strange,
funky behaviour that wouldn't lend users to any confidence
in the
wiki, i.e., it's nothing that they'd ever see in MS Word
(though I'd
not want to make that case too strongly).

I just don't quite follow why any character would get
translated to
a completely different character by the engine; it seems
that if
someone typed in any kind of straight or curly quote it
should come
through the processing unscathed (that is, if it's not
interpreted
as wiki markup).

Murray

............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

       In the evening
       The rice leaves in the garden
       Rustle in the autumn wind
       That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
R: Strange conflagration of characters during editing
user name
2006-07-28 14:00:17
Hi, 
Yesterday I've an issue like you setting up a new JSPWiki
site on internet, hosted on http://eatj.com (great!). Before
I did it only on my (Windows) server for company Intranet.

The only really difference from intranet and internet
configuration is about urlConstructor: I have a trouble with
ShortURLConstructor (both on Preview and Save) but not with
ShortViewURLConstructor or DefaultURLConstructor.
This thing don't depend on the template used (default,
MediaWiki or brushed).
I’ve also tried different JSPWiki versions: all ok with
2.2.33, bad with 2.4.15 and 2.4.24. 

Sounds a bug. Don’t you? (But i'm not so good in java to
find it on code). 

Workaround: 
jspwiki.urlConstructor = DefaultURLConstructor
or 
jspwiki.urlConstructor = ShortViewURLConstructor (OT:
problem with edit a section page on brushed)

I hope this help you.

enricom

Enrico Maria Carmona
U.O. Controllo di Gestione e Programmazione
Azienda Ospedaliera San Gerardo
http://www.hsgerardo.org
Tel. +39-039-233-9077
email: e.carmonahsgerardo.org
 
 
-----Messaggio originale-----
Da: jspwiki-users-bouncesecyrd.com
[mailto:jspwiki-users-bouncesecyrd.com] Per conto di
Murray Altheim
Inviato: martedì 25 luglio 2006 0.14
A: JSPWiki Users discussion list
Oggetto: [Jspwiki-users] Strange conflagration of characters
during editing



Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs running on Tomcat/linux.

I've seen this a few times, thought it was a freak, but
it's repeatable for me.

When editing an article, I pasted in the title of a NYT
article* that contains some curly quote characters. Each
time I preview and save, these characters are replaced by
multiple other characters (must be something in the filters)
and continually get "thicker." So whereas the
initial text was

    In the Race With Google, It?s Consistency vs. ?Wow?

after two or three edits that's been amplified to:

  In the Race With Google,
It’s
  Consistency vs.
‘WowÃÆ
’ƒÂ¢Ã‚€Â™

and will continue in this fashion with each edit. This
happens both within and outside of square bracket wiki link
markup.

Anyone seen this before? Know what is causing it? It's
entirely conceivable that it's something I'm doing custom
-- am currently investigating. This seems to *only* happen
with curly single and double quote characters from what I
can see.

Thanks,

Murray

* http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

       In the evening
       The rice leaves in the garden
       Rustle in the autumn wind
       That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-07-30 21:37:59
> Well, I've made the change suggested there and
restarted Tomcat.
> Sadly, didn't make any difference.

Do you have UTF-8 set up as your jspwiki.encoding?

> The problem here isn't what I do, it's that if any
user of the wiki
> either types or cuts and pastes one of these
characters, it starts
> this ugly situation that upon each edit, or upon even
each preview,

This is a separate issue, I just wanted to point out that
those  
characters are specific to the Windows version of Latin1,
and unless  
you are using UTF-8 all around, they will be shown wrong in
any other  
operating system.

> I just don't quite follow why any character would get
translated to
> a completely different character by the engine; it
seems that if
> someone typed in any kind of straight or curly quote it
should come
> through the processing unscathed (that is, if it's not
interpreted
> as wiki markup).

Java converts everything internally to Unicode.  If the
character is  
posted in Windows-Latin1, and it contains characters that
are not in  
ISO-Latin1, then they get converted to garbage.

/Janne


_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-07-30 22:28:36
Quoting Janne Jalkanen <Janne.Jalkanenecyrd.com>:

>> Well, I've made the change suggested there and
restarted Tomcat.
>> Sadly, didn't make any difference.
>
> Do you have UTF-8 set up as your jspwiki.encoding?

Yes, I do. I noticed that a previous page I'd entered with
Japanese characters (that used to look okay) now looks
like muck. So I assume that's because the encoding has
changed to UTF-8.

>> The problem here isn't what I do, it's that if
any user of the wiki
>> either types or cuts and pastes one of these
characters, it starts
>> this ugly situation that upon each edit, or upon
even each preview,
>
> This is a separate issue, I just wanted to point out
that those
> characters are specific to the Windows version of
Latin1, and unless
> you are using UTF-8 all around, they will be shown
wrong in any other
> operating system.

I've got Tomcat set for UTF-8. I've got Firefox set for
UTF-8. Not
sure what else I can set. (one doesn't have to set Apache
too?)

>> I just don't quite follow why any character would
get translated to
>> a completely different character by the engine; it
seems that if
>> someone typed in any kind of straight or curly
quote it should come
>> through the processing unscathed (that is, if it's
not interpreted
>> as wiki markup).
>
> Java converts everything internally to Unicode.  If the
character is
> posted in Windows-Latin1, and it contains characters
that are not in
> ISO-Latin1, then they get converted to garbage.

It's not so much that they get converted to garbage that
troubles me,
as that's kinda expected when there's an incorrect
encoding. What I'm
worried about is that upon each preview and each save, the
*number*
of bad characters doubles. That could be very irritating to
a user,
more irritating than not seeing the correct display, like a
character
virus. My guess is that this is a filtering symptom...

Murray

............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

        In the evening
        The rice leaves in the garden
        Rustle in the autumn wind
        That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-07-31 10:12:41
> Yes, I do. I noticed that a previous page I'd entered
with
> Japanese characters (that used to look okay) now looks
> like muck. So I assume that's because the encoding has
> changed to UTF-8.

No.  We do two encodings: ISO Latin1 and UTF-8.  You can't
show  
Japanese characters in Latin1 - the only case where they
would show  
correctly would be UTF-8.

> I've got Tomcat set for UTF-8. I've got Firefox set
for UTF-8. Not
> sure what else I can set. (one doesn't have to set
Apache too?)

Well, you need to have

* jspwiki.encoding = UTF-8
* Tomcat context set to UTF-8

Firefox should detect it all automatically.

> It's not so much that they get converted to garbage
that troubles me,
> as that's kinda expected when there's an incorrect
encoding. What I'm
> worried about is that upon each preview and each save,
the *number*
> of bad characters doubles. That could be very
irritating to a user,
> more irritating than not seeing the correct display,
like a character
> virus. My guess is that this is a filtering symptom...

The duplication occurs because the browser is sending faulty
 
characters to Tomcat.  Tomcat interprets them using either
Latin1 or  
UTF-8 (default is to use Latin1, and it wasn't until
Servlet API 2.3  
before it can be programmatically changed.  2.4 does this,
2.2.  
resorts to hackery), and creates internal String
implementations.

You type in a character, which gets sent as UTF-8 (two
bytes).   
Tomcat reads them in, apparently interprets them as Latin1
(where  
each character is only one byte, so suddenly you get two
garbage  
characters), and shoves them to JSPWiki, which dutifully
saves them  
in the page repository.  When user edits again, they will
see two bad  
characters, instead of one.  However, when they save again,
they get  
again sent as UTF-8 to Tomcat (so two bytes become four),
and so on...

I think the problem is to get Tomcat to understand that the
encoding  
is UTF-8, not Latin1.  Which versions of Tomcat and JSPWiki
are you  
using, exactly?

/Janne
_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-08-01 05:04:07
Quoting Janne Jalkanen <Janne.Jalkanenecyrd.com>:
>
>> Yes, I do. I noticed that a previous page I'd
entered with
>> Japanese characters (that used to look okay) now
looks
>> like muck. So I assume that's because the encoding
has
>> changed to UTF-8.
>
> No.  We do two encodings: ISO Latin1 and UTF-8.  You
can't show
> Japanese characters in Latin1 - the only case where
they would show
> correctly would be UTF-8.

Yes, now that you put it that way, that's the only way they
could
have been shown. So that they *were* being displayed
correctly
and now aren't has me worried.

>> I've got Tomcat set for UTF-8. I've got Firefox
set for UTF-8. Not
>> sure what else I can set. (one doesn't have to set
Apache too?)
>
> Well, you need to have
>
> * jspwiki.encoding = UTF-8

Yup.

> * Tomcat context set to UTF-8

I've left it at the default in web.xml (UTF-8), and not
sure if
this is relevant, but in server.xml there's a port 8080
connector
with a URIEncoding="UTF-8" parameter. This was
the most recent
change to the config in trying to solve this, after which
the
Japanese text went hooey. But as a test, I removed that
parameter
and restarted Tomcat but it made no difference, so I've
lost
exactly where this started. I *used* to have Japanese text.
:-(

> Firefox should detect it all automatically.

I looked at the generated HTML and found

   <meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />

so the ContentEncoding tag is at least set for UTF-8.

>> It's not so much that they get converted to
garbage that troubles me,
>> as that's kinda expected when there's an
incorrect encoding. What I'm
>> worried about is that upon each preview and each
save, the *number*
>> of bad characters doubles. That could be very
irritating to a user,
>> more irritating than not seeing the correct
display, like a character
>> virus. My guess is that this is a filtering
symptom...
>
> The duplication occurs because the browser is sending
faulty characters
> to Tomcat.  Tomcat interprets them using either Latin1
or UTF-8
> (default is to use Latin1, and it wasn't until Servlet
API 2.3 before
> it can be programmatically changed.  2.4 does this,
2.2. resorts to
> hackery), and creates internal String implementations.
>
> You type in a character, which gets sent as UTF-8 (two
bytes).  Tomcat
> reads them in, apparently interprets them as Latin1
(where each
> character is only one byte, so suddenly you get two
garbage
> characters), and shoves them to JSPWiki, which
dutifully saves them in
> the page repository.  When user edits again, they will
see two bad
> characters, instead of one.  However, when they save
again, they get
> again sent as UTF-8 to Tomcat (so two bytes become
four), and so on...

Thanks for the explanation. This almost sounds like it might
make a
good FAQ entry.

> I think the problem is to get Tomcat to understand that
the encoding is
> UTF-8, not Latin1.  Which versions of Tomcat and
JSPWiki are you using,
> exactly?

   Tomcat Version       JVM Version    JVM Vendor
   Apache Tomcat/5.0    1.4.2_10-b03   Sun Microsystems Inc.

   OS Name      OS Version       OS Arch
   Linux        2.6.13-15-smp 	i386

and JSPWiki is version v2.4.18-cvs.

As an experiment, I built and deployed a copy of
2.4.15-beta, but
it did the same thing. Then I built (from a fresh tarball) a
copy
of 2.4.18, deployed it, and dammit but it works! *sigh* I'm
not
sure what happened...

gotta run, but at least things are working again even if I
don't
know why.

Murray

............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

        In the evening
        The rice leaves in the garden
        Rustle in the autumn wind
        That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-08-01 14:05:37
> Yes, now that you put it that way, that's the only way
they could
> have been shown. So that they *were* being displayed
correctly
> and now aren't has me worried.

It got me worried, too, and I managed to replicate it over
on my  
server.  It appears that in certain cases some part of
JSPWiki  
requests parameters from the HttpServletRequest *before* we
call  
request.setCharacterEncoding().  However, I am not at all
sure in  
which case it happens, because it still runs perfectly in
2.4.20 at  
jspwiki.org - but it seems that it is already occurring in
2.4.15?

So, the patch is pretty simple: call setCharacterEncoding()
at  
WikiServlet instead of createContext()

Index: WikiServletFilter.java
============================================================
=======
RCS file: /p/cvs//JSPWiki/src/com/ecyrd/jspwiki/ui/ 
WikiServletFilter.java,v
retrieving revision 1.8
diff -u -r1.8 WikiServletFilter.java
--- WikiServletFilter.java	1 Aug 2006 11:40:11 -0000	1.8
+++ WikiServletFilter.java	1 Aug 2006 14:04:41 -0000
 -95,6
+95,8 
          //   replace markers with scripts/stylesheet.
          HttpServletRequest httpRequest =
(HttpServletRequest) request;

+        httpRequest.setCharacterEncoding(
m_engine.getContentEncoding 
() );
+
          NDC.push( m_engine.getApplicationName() 
+":"+httpRequest.getRequestURL() );
          try

I'm putting it in CVS now...

/Janne


_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Strange conflagration of characters during editing
user name
2006-08-01 22:18:57
Quoting Janne Jalkanen <Janne.Jalkanenecyrd.com>:

>> Yes, now that you put it that way, that's the only
way they could
>> have been shown. So that they *were* being
displayed correctly
>> and now aren't has me worried.
>
> It got me worried, too, and I managed to replicate it
over on my
> server.  It appears that in certain cases some part of
JSPWiki requests
> parameters from the HttpServletRequest *before* we call
> request.setCharacterEncoding().  However, I am not at
all sure in which
> case it happens, because it still runs perfectly in
2.4.20 at
> jspwiki.org - but it seems that it is already occurring
in 2.4.15?

Yes, whatever the problem was, I did a fresh build of 2.4.15
and it
exhibited the behaviour, as did 2.4.18. I'm glad (sorta) to
hear that
you were able to replicate it, and certainly happy to hear
that you
found the problem, that something good came out of that
exercise...

> So, the patch is pretty simple: call
setCharacterEncoding() at
> WikiServlet instead of createContext()

Great -- thanks!

Murray

............................................................
...............
Murray Altheim <murray06altheim.com>            
                 ===  = =
http://www.altheim.com
/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk        
      = =  = =

        In the evening
        The rice leaves in the garden
        Rustle in the autumn wind
        That blows through my reed hut.  -- Minamoto no
Tsunenobu

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )