List Info

Thread: Plug-in to transcode messages




Plug-in to transcode messages
country flaguser name
United States
2007-03-06 15:54:28
I'm tired of seeing messages written in completely
superfluous (in that they don't provide any functionality
not already present in the "open standards" that
exist)
encodings.

Like "gb2312" and "big5" and
"windows-1252", etc.

Does anyone have a plug-in that will transcode an
entire message into USASCII, ISO-8859-1, or UTF-8
(depending on the smallest charset it will fit into)?

Yes, I know that the authentication/tamperproofing
people will all want to play the 'stick game' with me
for doing so.

I don't care.

There's no point in maintaining the integrity of a
message if it can't be delivered, or won't be read.

Those come first.

-Philip

_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in
the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpengu
in.com
MIMEDefang mailing list MIMEDefanglists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mime
defang

Re: Plug-in to transcode messages
country flaguser name
Netherlands
2007-03-06 18:00:22
On Tue, 6 Mar 2007, Philip Prindeville wrote:

>There's no point in maintaining the integrity of a
>message if it can't be delivered, or won't be read.

And what's the point in transcoding a cyrillic or japanese
charset
into different encoded _but still cyrillic or japanese_
chars?
Do you realy expect to be able to read the message
afterwards?
Are you perhaps also looking for a plugin to _translate_ the
message?

>Those come first.

What first comes if someone wants to communicate with me is
the
selection _by the sender of the message_, of a common
language.
If someone sends me an unreadable message (s)he has
probably
no intention to communicate with me in particular. It's
simply
cheaper to spam the whole world instead of only addressing
the
intended audience.
Transcoding a message doesn't change that. It's just a big
waste
of CPU cycles in my opinion.

Regards,

Kees.

-- 
Kees Theunissen
F.O.M.-Institute for Plasma Physics Rijnhuizen, Nieuwegein,
Netherlands
E-mail: theunissrijnh.nl,  Tel: (+31|0)306096724,  Fax:
(+31|0)306031204

_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in
the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpengu
in.com
MIMEDefang mailing list MIMEDefanglists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mime
defang

Re: Plug-in to transcode messages
country flaguser name
United States
2007-03-07 13:00:17

On Tue, 6 Mar 2007, Philip Prindeville wrote:

> I'm tired of seeing messages written in completely
> superfluous (in that they don't provide any
functionality
> not already present in the "open standards"
that exist)
> encodings.
>
> Like "gb2312" and "big5" and
"windows-1252", etc.

   On a mail machine that I alone receive email on, I reject
those.  On
our campus email relay, I flag them so that a user can
reject them if they
wish.  In sub filter(), I have this:

#
#  Look for foreign character sets and create a header that
a user can filter
#  against.  13-JUN-2006 JHMc
#
    $head = $entity->head;
    my $charset =
$head->mime_attr("content-type.charset");
    if (defined($charset)) {
      $charset =~ tr/A-Z/a-z/;
      if ($charset eq "ks_c_5601-1987" or
        $charset eq "euc-kr" or
        $charset eq "koi8-r" or
        $charset eq "gb2312" or
        $charset eq "windows-1251" or
        $charset eq "big5") {
         
action_add_header('X-UAH-Foreign-Charset',"$charset&quo
t;)
      }
    }


If you don't want to deal with them at all,
action_add_header could just
as easily be a return of action_bounce.  I don't know about
converting
them to a "readable" character set.

   HTH...

Jim McCullars
University of Alabama in Huntsville


_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in
the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpengu
in.com
MIMEDefang mailing list MIMEDefanglists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mime
defang

Re: Plug-in to transcode messages
country flaguser name
United States
2007-03-07 14:10:12
Kees Theunissen wrote:
> On Tue, 6 Mar 2007, Philip Prindeville wrote:
>
>   
>> There's no point in maintaining the integrity of a
>> message if it can't be delivered, or won't be
read.
>>     
>
> And what's the point in transcoding a cyrillic or
japanese charset
> into different encoded _but still cyrillic or japanese_
chars?
> Do you realy expect to be able to read the message
afterwards?
> Are you perhaps also looking for a plugin to
_translate_ the message?
>   

Let op, mijn heer.  Ik caan ein bietje neederlands
praten.

Wo dong i dien dien Juong Wen.

Hablo un pocu espagnol.

Je parle francais courrament aussi.

Just because I'm writing in English doesn't mean it's
my only language.

The point is that there was a requirement (or perhaps it
was just a recommendation... I don't have the text at
hand) that all email converge on these 3 encodings.

I rather like the idea of not having redundant encodings.

It makes writing filters a lot easier, for one.  It also
means that my browser doesn't need an N*M matrix of
encoding tables.

I also like the idea of conforming to standards.  But
that's just me.  I weird that way.


>> Those come first.
>>     
>
> What first comes if someone wants to communicate with
me is the
> selection _by the sender of the message_, of a common
language.
> If someone sends me an unreadable message (s)he has
probably
> no intention to communicate with me in particular. It's
simply
> cheaper to spam the whole world instead of only
addressing the
> intended audience.
> Transcoding a message doesn't change that. It's just a
big waste
> of CPU cycles in my opinion.
>
> Regards,
>
> Kees

Except that most encodings include USASCII, if not
some subset/variant of Latin1 as well.

Why do we need duplicated flavors of Latin1?

Because some moron at M$ wanted to build a
tower of babel...

Someone can write me in English (or Dutch or French
or Spanish or German... without accents, ij, or oe)
in jis2122... or windows-1252... In fact, someone
could write me in Japanese (which would be someone
more challenging for me to read since I haven't lived
there in 40 years) in BG5 or ISO2022CN...

So if your argument is predicated on their being a
single unique encoding per language, then that's
simply not the case.

-Philip

_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in
the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpengu
in.com
MIMEDefang mailing list MIMEDefanglists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mime
defang

Re: Plug-in to transcode messages
country flaguser name
Germany
2007-03-08 02:58:07
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 7 Mar 2007, Philip Prindeville wrote:

> Why do we need duplicated flavors of Latin1?

OK, I cannot stand up with all those different languages,
but did you 
checked out Encode in order to translit one character set
into another?
It's included in the Perl base since v5.8, I think.

I use it for some stuff in my scripts in order to transform
raw IBM850 and 
Latin1 octets into Perl's UTF8, so I can forget about the
character set. 
It handles surprisingly many little problems I had with LDAP
and Postgres.

However, in my impression Encode is quite strict insofar
that when you 
tell Encode an octet string is USASCII, but contains >=
128 characters, 
they are replaced by "the substitution
character".

As stated in another post, how about decode everything into
Latin1 or, for 
some incompatible sets, into UTF8?
UTF8 seems to be the best choice these days.

Bye,

- -- 
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBRe/QIegJIbZtwg6XAQJWuggAiJeyiKRpI+Wf/hL3zE9rNxQgM0+n
y1JX
5KLcMzxfjTNOD6rKobqtqJt5PiXAoWIxh8dIMeLFqAM00ehaqVs4P6wlLp/G
0r7q
YrQ4x4JtO3TxdxDovyEQtgwblCcpYVT6RTYp6xniRcnjpH+u9Q4b4hQcNf41
tpuL
BFJgPsB7tcdP5y9SWQs3R11hLDMQt7q7nWbjqRaGGNk+z5qiNTdzyAgHI2Ih
3Svg
18mN7+D0bP8smKuOsQIiLvSstmQiECcjGwHdqKyB1ipNdWYwTkMCaDohGgY8
8KnO
SCsNNxC7g3dtEkWQrsBXXFkNpFhaMQ5O2XkRySKswxj7o8vA55xnFw==
=Nsck
-----END PGP SIGNATURE-----
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in
the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpengu
in.com
MIMEDefang mailing list MIMEDefanglists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mime
defang

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )