|
List Info
Thread: Strange conflagration of characters during editing
|
|
| Strange conflagration of characters
during editing |

|
2006-07-24 22:14:15 |
Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs running
on Tomcat/linux.
I've seen this a few times, thought it was a freak, but
it's repeatable
for me.
When editing an article, I pasted in the title of a NYT
article* that
contains some curly quote characters. Each time I preview
and save,
these characters are replaced by multiple other characters
(must be
something in the filters) and continually get
"thicker." So whereas
the initial text was
In the Race With Google, It?s Consistency vs. ?Wow?
after two or three edits that's been amplified to:
In the Race With Google, It’s
Consistency vs.
‘Wow’
and will continue in this fashion with each edit. This
happens
both within and outside of square bracket wiki link markup.
Anyone seen this before? Know what is causing it? It's
entirely
conceivable that it's something I'm doing custom -- am
currently
investigating. This seems to *only* happen with curly single
and
double quote characters from what I can see.
Thanks,
Murray
* http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-07-25 07:28:01 |
Sounds like an UTF-8 issue. Did you check
http://jspwiki.
org/wiki/TomcatAndUTF8
?
BTW, if the tick in "it's" is getting changed,
then you are using the
wrong tick mark - you should be using Unicode 0027, not
00B4. This
is a common annoyance for non-Windows users, since the other
tick
mark gets usually translated to a question mark. Ditto with
the
double-quote which is not 0022.
/Janne
On Jul 25, 2006, at 01:14 , Murray Altheim wrote:
>
> Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs
> running
> on Tomcat/linux.
>
> I've seen this a few times, thought it was a freak,
but it's
> repeatable
> for me.
>
> When editing an article, I pasted in the title of a NYT
article* that
> contains some curly quote characters. Each time I
preview and save,
> these characters are replaced by multiple other
characters (must be
> something in the filters) and continually get
"thicker." So whereas
> the initial text was
>
> In the Race With Google, It?s Consistency vs. ?Wow?
>
> after two or three edits that's been amplified to:
>
> In the Race With Google, It’s
> Consistency vs.
> ‘Wow’
>
> and will continue in this fashion with each edit. This
happens
> both within and outside of square bracket wiki link
markup.
>
> Anyone seen this before? Know what is causing it? It's
entirely
> conceivable that it's something I'm doing custom --
am currently
> investigating. This seems to *only* happen with curly
single and
> double quote characters from what I can see.
>
> Thanks,
>
> Murray
>
> * http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
>
............................................................
..........
> .....
> Murray Altheim <murray06 altheim.com>
> === = =
> http://www.altheim.com
/murray/
> = = ===
> SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
> = = = =
>
> In the evening
> The rice leaves in the garden
> Rustle in the autumn wind
> That blows through my reed hut. -- Minamoto no
Tsunenobu
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-users ecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-07-25 21:59:23 |
Quoting Janne Jalkanen <Janne.Jalkanen ecyrd.com>:
>
> Sounds like an UTF-8 issue. Did you check
>
> http://jspwiki.
org/wiki/TomcatAndUTF8
Well, I've made the change suggested there and restarted
Tomcat.
Sadly, didn't make any difference.
> BTW, if the tick in "it's" is getting
changed, then you are using the
> wrong tick mark - you should be using Unicode 0027,
not 00B4. This
> is a common annoyance for non-Windows users, since the
other tick
> mark gets usually translated to a question mark. Ditto
with the
> double-quote which is not 0022.
The problem here isn't what I do, it's that if any user of
the wiki
either types or cuts and pastes one of these characters, it
starts
this ugly situation that upon each edit, or upon even each
preview,
continually adds and adds the unwanted characters. I think
this would
be very confusing, particularly to new users. It's just a
strange,
funky behaviour that wouldn't lend users to any confidence
in the
wiki, i.e., it's nothing that they'd ever see in MS Word
(though I'd
not want to make that case too strongly).
I just don't quite follow why any character would get
translated to
a completely different character by the engine; it seems
that if
someone typed in any kind of straight or curly quote it
should come
through the processing unscathed (that is, if it's not
interpreted
as wiki markup).
Murray
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| R: Strange conflagration of characters
during editing |

|
2006-07-28 14:00:17 |
Hi,
Yesterday I've an issue like you setting up a new JSPWiki
site on internet, hosted on http://eatj.com (great!). Before
I did it only on my (Windows) server for company Intranet.
The only really difference from intranet and internet
configuration is about urlConstructor: I have a trouble with
ShortURLConstructor (both on Preview and Save) but not with
ShortViewURLConstructor or DefaultURLConstructor.
This thing don't depend on the template used (default,
MediaWiki or brushed).
I’ve also tried different JSPWiki versions: all ok with
2.2.33, bad with 2.4.15 and 2.4.24.
Sounds a bug. Don’t you? (But i'm not so good in java to
find it on code).
Workaround:
jspwiki.urlConstructor = DefaultURLConstructor
or
jspwiki.urlConstructor = ShortViewURLConstructor (OT:
problem with edit a section page on brushed)
I hope this help you.
enricom
Enrico Maria Carmona
U.O. Controllo di Gestione e Programmazione
Azienda Ospedaliera San Gerardo
http://www.hsgerardo.org
Tel. +39-039-233-9077
email: e.carmona hsgerardo.org
-----Messaggio originale-----
Da: jspwiki-users-bounces ecyrd.com
[mailto:jspwiki-users-bounces ecyrd.com] Per conto di
Murray Altheim
Inviato: martedì 25 luglio 2006 0.14
A: JSPWiki Users discussion list
Oggetto: [Jspwiki-users] Strange conflagration of characters
during editing
Hi, just wondering if anyone's seen this before. Using
2.4.18-cvs running on Tomcat/linux.
I've seen this a few times, thought it was a freak, but
it's repeatable for me.
When editing an article, I pasted in the title of a NYT
article* that contains some curly quote characters. Each
time I preview and save, these characters are replaced by
multiple other characters (must be something in the filters)
and continually get "thicker." So whereas the
initial text was
In the Race With Google, It?s Consistency vs. ?Wow?
after two or three edits that's been amplified to:
In the Race With Google,
It’s
Consistency vs.
‘WowÃÆ
’ƒÂ¢Ã‚€Â™
and will continue in this fashion with each edit. This
happens both within and outside of square bracket wiki link
markup.
Anyone seen this before? Know what is causing it? It's
entirely conceivable that it's something I'm doing custom
-- am currently investigating. This seems to *only* happen
with curly single and double quote characters from what I
can see.
Thanks,
Murray
* http://www.nytimes.com/2006/07/24/technology/24yahoo.ht
ml
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-07-30 21:37:59 |
> Well, I've made the change suggested there and
restarted Tomcat.
> Sadly, didn't make any difference.
Do you have UTF-8 set up as your jspwiki.encoding?
> The problem here isn't what I do, it's that if any
user of the wiki
> either types or cuts and pastes one of these
characters, it starts
> this ugly situation that upon each edit, or upon even
each preview,
This is a separate issue, I just wanted to point out that
those
characters are specific to the Windows version of Latin1,
and unless
you are using UTF-8 all around, they will be shown wrong in
any other
operating system.
> I just don't quite follow why any character would get
translated to
> a completely different character by the engine; it
seems that if
> someone typed in any kind of straight or curly quote it
should come
> through the processing unscathed (that is, if it's not
interpreted
> as wiki markup).
Java converts everything internally to Unicode. If the
character is
posted in Windows-Latin1, and it contains characters that
are not in
ISO-Latin1, then they get converted to garbage.
/Janne
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-07-30 22:28:36 |
Quoting Janne Jalkanen <Janne.Jalkanen ecyrd.com>:
>> Well, I've made the change suggested there and
restarted Tomcat.
>> Sadly, didn't make any difference.
>
> Do you have UTF-8 set up as your jspwiki.encoding?
Yes, I do. I noticed that a previous page I'd entered with
Japanese characters (that used to look okay) now looks
like muck. So I assume that's because the encoding has
changed to UTF-8.
>> The problem here isn't what I do, it's that if
any user of the wiki
>> either types or cuts and pastes one of these
characters, it starts
>> this ugly situation that upon each edit, or upon
even each preview,
>
> This is a separate issue, I just wanted to point out
that those
> characters are specific to the Windows version of
Latin1, and unless
> you are using UTF-8 all around, they will be shown
wrong in any other
> operating system.
I've got Tomcat set for UTF-8. I've got Firefox set for
UTF-8. Not
sure what else I can set. (one doesn't have to set Apache
too?)
>> I just don't quite follow why any character would
get translated to
>> a completely different character by the engine; it
seems that if
>> someone typed in any kind of straight or curly
quote it should come
>> through the processing unscathed (that is, if it's
not interpreted
>> as wiki markup).
>
> Java converts everything internally to Unicode. If the
character is
> posted in Windows-Latin1, and it contains characters
that are not in
> ISO-Latin1, then they get converted to garbage.
It's not so much that they get converted to garbage that
troubles me,
as that's kinda expected when there's an incorrect
encoding. What I'm
worried about is that upon each preview and each save, the
*number*
of bad characters doubles. That could be very irritating to
a user,
more irritating than not seeing the correct display, like a
character
virus. My guess is that this is a filtering symptom...
Murray
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-07-31 10:12:41 |
> Yes, I do. I noticed that a previous page I'd entered
with
> Japanese characters (that used to look okay) now looks
> like muck. So I assume that's because the encoding has
> changed to UTF-8.
No. We do two encodings: ISO Latin1 and UTF-8. You can't
show
Japanese characters in Latin1 - the only case where they
would show
correctly would be UTF-8.
> I've got Tomcat set for UTF-8. I've got Firefox set
for UTF-8. Not
> sure what else I can set. (one doesn't have to set
Apache too?)
Well, you need to have
* jspwiki.encoding = UTF-8
* Tomcat context set to UTF-8
Firefox should detect it all automatically.
> It's not so much that they get converted to garbage
that troubles me,
> as that's kinda expected when there's an incorrect
encoding. What I'm
> worried about is that upon each preview and each save,
the *number*
> of bad characters doubles. That could be very
irritating to a user,
> more irritating than not seeing the correct display,
like a character
> virus. My guess is that this is a filtering symptom...
The duplication occurs because the browser is sending faulty
characters to Tomcat. Tomcat interprets them using either
Latin1 or
UTF-8 (default is to use Latin1, and it wasn't until
Servlet API 2.3
before it can be programmatically changed. 2.4 does this,
2.2.
resorts to hackery), and creates internal String
implementations.
You type in a character, which gets sent as UTF-8 (two
bytes).
Tomcat reads them in, apparently interprets them as Latin1
(where
each character is only one byte, so suddenly you get two
garbage
characters), and shoves them to JSPWiki, which dutifully
saves them
in the page repository. When user edits again, they will
see two bad
characters, instead of one. However, when they save again,
they get
again sent as UTF-8 to Tomcat (so two bytes become four),
and so on...
I think the problem is to get Tomcat to understand that the
encoding
is UTF-8, not Latin1. Which versions of Tomcat and JSPWiki
are you
using, exactly?
/Janne
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-08-01 05:04:07 |
Quoting Janne Jalkanen <Janne.Jalkanen ecyrd.com>:
>
>> Yes, I do. I noticed that a previous page I'd
entered with
>> Japanese characters (that used to look okay) now
looks
>> like muck. So I assume that's because the encoding
has
>> changed to UTF-8.
>
> No. We do two encodings: ISO Latin1 and UTF-8. You
can't show
> Japanese characters in Latin1 - the only case where
they would show
> correctly would be UTF-8.
Yes, now that you put it that way, that's the only way they
could
have been shown. So that they *were* being displayed
correctly
and now aren't has me worried.
>> I've got Tomcat set for UTF-8. I've got Firefox
set for UTF-8. Not
>> sure what else I can set. (one doesn't have to set
Apache too?)
>
> Well, you need to have
>
> * jspwiki.encoding = UTF-8
Yup.
> * Tomcat context set to UTF-8
I've left it at the default in web.xml (UTF-8), and not
sure if
this is relevant, but in server.xml there's a port 8080
connector
with a URIEncoding="UTF-8" parameter. This was
the most recent
change to the config in trying to solve this, after which
the
Japanese text went hooey. But as a test, I removed that
parameter
and restarted Tomcat but it made no difference, so I've
lost
exactly where this started. I *used* to have Japanese text.
:-(
> Firefox should detect it all automatically.
I looked at the generated HTML and found
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />
so the ContentEncoding tag is at least set for UTF-8.
>> It's not so much that they get converted to
garbage that troubles me,
>> as that's kinda expected when there's an
incorrect encoding. What I'm
>> worried about is that upon each preview and each
save, the *number*
>> of bad characters doubles. That could be very
irritating to a user,
>> more irritating than not seeing the correct
display, like a character
>> virus. My guess is that this is a filtering
symptom...
>
> The duplication occurs because the browser is sending
faulty characters
> to Tomcat. Tomcat interprets them using either Latin1
or UTF-8
> (default is to use Latin1, and it wasn't until Servlet
API 2.3 before
> it can be programmatically changed. 2.4 does this,
2.2. resorts to
> hackery), and creates internal String implementations.
>
> You type in a character, which gets sent as UTF-8 (two
bytes). Tomcat
> reads them in, apparently interprets them as Latin1
(where each
> character is only one byte, so suddenly you get two
garbage
> characters), and shoves them to JSPWiki, which
dutifully saves them in
> the page repository. When user edits again, they will
see two bad
> characters, instead of one. However, when they save
again, they get
> again sent as UTF-8 to Tomcat (so two bytes become
four), and so on...
Thanks for the explanation. This almost sounds like it might
make a
good FAQ entry.
> I think the problem is to get Tomcat to understand that
the encoding is
> UTF-8, not Latin1. Which versions of Tomcat and
JSPWiki are you using,
> exactly?
Tomcat Version JVM Version JVM Vendor
Apache Tomcat/5.0 1.4.2_10-b03 Sun Microsystems Inc.
OS Name OS Version OS Arch
Linux 2.6.13-15-smp i386
and JSPWiki is version v2.4.18-cvs.
As an experiment, I built and deployed a copy of
2.4.15-beta, but
it did the same thing. Then I built (from a fresh tarball) a
copy
of 2.4.18, deployed it, and dammit but it works! *sigh* I'm
not
sure what happened...
gotta run, but at least things are working again even if I
don't
know why.
Murray
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-08-01 14:05:37 |
> Yes, now that you put it that way, that's the only way
they could
> have been shown. So that they *were* being displayed
correctly
> and now aren't has me worried.
It got me worried, too, and I managed to replicate it over
on my
server. It appears that in certain cases some part of
JSPWiki
requests parameters from the HttpServletRequest *before* we
call
request.setCharacterEncoding(). However, I am not at all
sure in
which case it happens, because it still runs perfectly in
2.4.20 at
jspwiki.org - but it seems that it is already occurring in
2.4.15?
So, the patch is pretty simple: call setCharacterEncoding()
at
WikiServlet instead of createContext()
Index: WikiServletFilter.java
============================================================
=======
RCS file: /p/cvs//JSPWiki/src/com/ecyrd/jspwiki/ui/
WikiServletFilter.java,v
retrieving revision 1.8
diff -u -r1.8 WikiServletFilter.java
--- WikiServletFilter.java 1 Aug 2006 11:40:11 -0000 1.8
+++ WikiServletFilter.java 1 Aug 2006 14:04:41 -0000
 -95,6
+95,8 
// replace markers with scripts/stylesheet.
HttpServletRequest httpRequest =
(HttpServletRequest) request;
+ httpRequest.setCharacterEncoding(
m_engine.getContentEncoding
() );
+
NDC.push( m_engine.getApplicationName()
+":"+httpRequest.getRequestURL() );
try
I'm putting it in CVS now...
/Janne
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Strange conflagration of characters
during editing |

|
2006-08-01 22:18:57 |
Quoting Janne Jalkanen <Janne.Jalkanen ecyrd.com>:
>> Yes, now that you put it that way, that's the only
way they could
>> have been shown. So that they *were* being
displayed correctly
>> and now aren't has me worried.
>
> It got me worried, too, and I managed to replicate it
over on my
> server. It appears that in certain cases some part of
JSPWiki requests
> parameters from the HttpServletRequest *before* we call
> request.setCharacterEncoding(). However, I am not at
all sure in which
> case it happens, because it still runs perfectly in
2.4.20 at
> jspwiki.org - but it seems that it is already occurring
in 2.4.15?
Yes, whatever the problem was, I did a fresh build of 2.4.15
and it
exhibited the behaviour, as did 2.4.18. I'm glad (sorta) to
hear that
you were able to replicate it, and certainly happy to hear
that you
found the problem, that something good came out of that
exercise...
> So, the patch is pretty simple: call
setCharacterEncoding() at
> WikiServlet instead of createContext()
Great -- thanks!
Murray
............................................................
...............
Murray Altheim <murray06 altheim.com>
=== = =
http://www.altheim.com
/murray/ = = ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk
= = = =
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no
Tsunenobu
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
[1-10]
|
|