List Info

Thread: RE: Fixing MS Word smart quotes/character encoding




RE: Fixing MS Word smart quotes/character encoding
user name
2008-02-22 15:38:35
We ended up going with this method. But I never got code
from
demoriniser to work. The regex codes didn't seem to match
the smart
quotes that word was putting out.

What did work for us to just copy smart quotes from word
directly into
our filter regexes. We don't need the full gamut of
demorinser features,
so our filter looks like this (at the top of
/autohandler.mc).

<%filter>
   s/['']/'/g;
   s/[""]/"/g;
   s/.../.../g;
   s/-/&ndash;/g;
   s/-/&mdsash;/g;
</%filter>

(I hope those characters come through. It may look messed up
in some
email clients. (exactly the problem we're trying to solve))

-----Original Message-----
From: Terence Bodola [mailto:terence.bodolacbsparamount.com] 
Sent: Monday, February 18, 2008 7:51 AM
To: userslists.bricolage.cc
Subject: Re: Fixing MS Word smart quotes/character encoding

We just use a modified demoroniser script (
www.fourmilab.ch/webtools/ 
demoroniser/ ), called by the autohandler as a filter.  This
seems to  
work well.  I don't see why this would break other templates
unless  
they are for some reason creating smart quotes that you want
to keep:

<%filter>
$_ = demoroniser($_);
</%filter>


On Feb 15, 2008, at 3:28 PM, Jason Brackins wrote:

> Our Marketing Dept. uses MS Office to write content,
and then they  
> copy
> and paste directly into fields in Bricolage.
>
> The problem is that Word, Excel, etc has a strong
affinity for "Smart
> Quotes". These are 'Windows Extended ASCII'
characters that aren't
> useful at all UTF-8 or ISO-8859-1. In fact, they read
like gibberish.
>
> We can't make people not use Word. We can do our best
to make sure
> everybody turns off smart quotes. But we want to be
sure, 100% sure,
> that these characters never make it on to our pages.
>
> I can think of several approaches to translate these
into 'good'
> characters.
>
> 1. A utility template that gets called on every single
bit of user  
> input
> anywhere. Not ideal; too easy for template writers to
mess up.
>
> 2. A custom Distribution Action that translates
pre-publish.
>
> 3. What svn would call a 'pre-commit hook'. A bit of
code that runs
> during the checkin process to do the translation.
>
> Number two seem to be the easy way, based my still
incomplete  
> knowledge
> of the system. Three would be the most ideal, since
we'd always have
> 'clean' data in the database.
>
> Any advice is appreciated. Even more appreciated is a
packaged  
> solution
> ;)
>
> Thanks,
> -jason brackins
>
>
>



Re: Fixing MS Word smart quotes/character encoding
user name
2008-02-28 17:34:27
On Feb 22, 2008, at 13:38, Jason Brackins wrote:

> We ended up going with this method. But I never got
code from
> demoriniser to work. The regex codes didn't seem to
match the smart
> quotes that word was putting out.
>
> What did work for us to just copy smart quotes from
word directly into
> our filter regexes. We don't need the full gamut of
demorinser  
> features,
> so our filter looks like this (at the top of
/autohandler.mc).
>
> <%filter>
>   s/['']/'/g;
>   s/[""]/"/g;
>   s/.../.../g;
>   s/-/&ndash;/g;
>   s/-/&mdsash;/g;
> </%filter>
>
> (I hope those characters come through. It may look
messed up in some
> email clients. (exactly the problem we're trying to
solve))

Encode::ZapCP1252 does the same thing, as Marshall said. Put
this in  
your bricolage.conf:

PERL_LOADER=use Encode::ZapCP1252;

Then this would be your template:

<%filter>
$_ = zap_cp1252($_);
</%filter>

Best,

David

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )