List Info

Thread: PDF spam solutions




PDF spam solutions
country flaguser name
Canada
2007-08-13 16:41:19
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Like the rest of you, I'm sure, I've been receiving a glut
of PDF spam
lately, and I've been experimenting with various tactics for
curbing the
onslaught.  Some tactics work better than others, naturally,
so I
thought I'd share my results here.


(1) SpamAssassin core rules

To deal with PDF spam, the SpamAssassin developers added a
new core rule
called TVD_PDF_FINGER01, which identifies emails that have
empty bodies
but contain PDF attachments.  It works well, but its default
score of
1.0 is too low to make it the only tool for the job. 
Increasing the
score isn't really a good idea, though, since a lot of
business users
regularly send PDF attachments with empty mail bodies, and
this could
lead to false positives in a hurry.

You can certainly get this new rule for any version of
SpamAssassin
(newer than 3.1.1) using sa-update, but now that the 3.2.x
series
appears to have stabilized I'd also recommend that you
upgrade to 3.2.3
to take advantage of the latest rulesets.


(2) PDFInfo plugin

Available from <http://w
ww.rulesemporium.com/plugins.htm>, this plugin
is a step better in that it tries to identify specific PDF
spams by
their characteristics--image dimensions, number of images in
the file,
image-to-text ratio, filename, and meta-information (e.g.
author,
creator, creation/modified date, etc.), as well as fuzzy
hashes of the
file itself.

The downside is that it's /too/ specific, and that requires
you to
download new versions of the pdfinfo.cf file whenever new
signatures are
added, because every new signature is a new rule.  This
makes the plugin
very nice for catching PDF spam that's already circulating,
but it's not
effective at catching new variants, and updating it is
awkward.


(3) PDFText plugin

The PDFText plugin uses the pdftotxt and pdfinfo utilities
from the xpdf
package to try to extract the text and meta-information from
PDF files,
so that they can then be subjected to pattern-based tests
for spammy
content.  Two versions are currently available:

For SpamAssassin 3.1.x:

<http://www.mail-arc
hive.com/usersspamassassin.apache.org/msg45465.html>

For SpamAssassin 3.2.x:

<http://www.mail-arc
hive.com/usersspamassassin.apache.org/msg45494.html>

Unfortunately this plugin is still a very early
alpha--proof-of-concept,
really--and needs a considerable amount of polishing before
it could
really be recommended for production use.  It also relies on
its own
wordlist for scoring, rather than making the discovered text
available
to the full battery of SpamAssassin rules, but the author is
apparently
working on that, along with experimental support for using
GOCR to scan
the images in PDF files.


(4) FuzzyOCR plugin

There's been some discussion about FuzzyOCR's potential role
in catching
PDF spam--at least the PDF spam that incorporates images. 
The plugin's
author is reluctant at best: "actually, I will not try
to scan PDFs, the
risk of false positives is too high and PDFs do not have a
future for
spammers (in my opinion) as most clients do not display them
directly.
Sending PDFs is only a desperate try of spammers to
circumvent image
scanners, but I don't think this will be the new
"trend", neither do I
think that this kind of spam has any future or success, like
image spam
has."

That said, he seems to have relented under the pressure, and
some basic
support for this was added recently to the svn version with
a lot of
disclaimers ("highly experimental and disabled by
default", "Enable this
at your own risk, this might lead to false positives and
classify
important documents as spam. YOU HAVE BEEN WARNED.").

Since you need to be using the svn version of FuzzyOCR if
you're running
SpamAssassin 3.2.x anyway, you may wish to experiment with
the
PDF-scanning support, since it won't cost you any resources
you aren't
already spending.  If you're /not/ using FuzzyOCR, though, I
wouldn't
advise installing it just to solve the PDF spam problem.


(5) Custom rules

Eric A. Hall posted a custom ruleset recently to the
SpamAssassin-Users
list that uses the AWL to determine whether the sender of a
binary
attachment (major MIME-type of application, image, audio,
video, or
model) has sent the recipient mail before.  If this is the
first email
the recipient has ever received from this sender, and it
contains such
an attachment, it gets penalized accordingly for coming from
a stranger.

You need to have the MIMEHeader plugin installed, but this
is included
by default in the newer SpamAssassin 3.2.x series.  The
ruleset can be
added easily to your local.cf file:

ifplugin Mail::SpamAssassin::Plugin::MIMEHeader

mimeheader  __L_C_TYPE_APP     Content-Type =~
/^application/i
mimeheader  __L_C_TYPE_IMAGE   Content-Type =~ /^image/i
mimeheader  __L_C_TYPE_AUDIO   Content-Type =~ /^audio/i
mimeheader  __L_C_TYPE_VIDEO   Content-Type =~ /^video/i
mimeheader  __L_C_TYPE_MODEL   Content-Type =~ /^model/i

meta        L_STRANGER_APP     (!AWL &&
__L_C_TYPE_APP)
score       L_STRANGER_APP     1.0
tflags      L_STRANGER_APP     noautolearn
priority    L_STRANGER_APP     1001 # defer till after AWL
describe    L_STRANGER_APP     Application file sent by a
stranger

meta        L_STRANGER_IMAGE   (!AWL &&
__L_C_TYPE_IMAGE)
score       L_STRANGER_IMAGE   1.0
tflags      L_STRANGER_IMAGE   noautolearn
priority    L_STRANGER_IMAGE   1001 # defer till after AWL
describe    L_STRANGER_IMAGE   Image file sent by a
stranger

meta        L_STRANGER_AUDIO   (!AWL &&
__L_C_TYPE_AUDIO)
score       L_STRANGER_AUDIO   1.0
tflags      L_STRANGER_AUDIO   noautolearn
priority    L_STRANGER_AUDIO   1001 # defer till after AWL
describe    L_STRANGER_AUDIO   Audio file sent by a
stranger

meta        L_STRANGER_VIDEO   (!AWL &&
__L_C_TYPE_VIDEO)
score       L_STRANGER_VIDEO   1.0
tflags      L_STRANGER_VIDEO   noautolearn
priority    L_STRANGER_VIDEO   1001 # defer till after AWL
describe    L_STRANGER_VIDEO   Video file sent by a
stranger

meta        L_STRANGER_MODEL   (!AWL &&
__L_C_TYPE_MODEL)
score       L_STRANGER_MODEL   1.0
tflags      L_STRANGER_MODEL   noautolearn
priority    L_STRANGER_MODEL   1001 # defer till after AWL
describe    L_STRANGER_MODEL   Model file sent by a
stranger

endif


(6) SaneSecurity signatures

If you use ClamAV (you do, don't you?), another option is to
use the
phishing and scam signatures published by SaneSecurity
<http://www
.sanesecurity.co.uk/clamav/>.  These signatures are
updated
multiple times a day, and include a lot of PDF spam, making
it perhaps
the most responsive solution available at the moment.

These phishing/scam emails get caught by ClamAV rather than
SpamAssassin, so they show up in Maia's
"Viruses/Malware" quarantine
instead of the spam quarantine, which is a bit annoying, but
that's
something I'll be working to address in future versions.

I can't argue with the effectiveness of SaneSecurity's
signatures,
though--they are by far the most effective blockers of PDF
spam that
I've found, and I would strongly recommend that you use
them.


(7) Other plugins

While rules and plugins that target PDF spam specifically
are very
useful, it's worth noting that the bulk of the PDF spam
comes from
botnets, so adding the Botnet plugin
<http:
//people.ucsc.edu/~jrudd/spamassassin/> can catch a
lot of these
things on its own, and it provides a nice score supplement
to go along
with the PDF-specific rules.  The latest version is 0.8, and
it just
needs one small patch (courtesy of Mark Martinec):

- --- Botnet.pm.orig	Mon Aug  6 15:59:16 2007
+++ Botnet.pm	Mon Aug  6 16:02:43 2007
 -711,5
+711,14 
         (defined $max) &&
         ($max =~ /^-?d+$/) ) {
- -      $resolver = Net:NS::Reso
lver->new();
+      $resolver = Net:NS::Reso
lver->new(
+               udp_timeout => 5,
+               tcp_timeout => 5,
+               retrans => 0,
+               retry => 1,
+               persistent_tcp => 0,
+               persistent_udp => 0,
+               dnsrch => 0,
+               defnames => 0,
+       );
       if ($query = $resolver->search($name, $type)) {
          # found matches
 -834,5
+843,14 
    my ($ip) = _;
    my ($query, answer, $rr);
- -   my $resolver = Net:NS::Reso
lver->new();
+   my $resolver = Net:NS::Reso
lver->new(
+       udp_timeout => 5,
+       tcp_timeout => 5,
+       retrans => 0,
+       retry => 1,
+       persistent_tcp => 0,
+       persistent_udp => 0,
+       dnsrch => 0,
+       defnames => 0,
+       );
    my $name = "";


- --
Robert LeBlanc <rjlrenaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamail
guard.com/>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGwM//GmqOER2NHewRAhqDAKCRY5U7T4hgl3yj928ajM8KuceI2wCf
YESS
25zC3NMEDVmcUaEJw9En4A8=
=zjNR
-----END PGP SIGNATURE-----
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

Re: PDF spam solutions
country flaguser name
United Kingdom
2007-08-14 18:23:14
Hi Robert

>From the patch available at the following URL

http://theinternet.org.uk/downloads/spamshifter-maia1
02.txt

+# Imported from Amavisd-new 2.5 the ability to make it
possible for a virus
+# scanner to derate an infection report to a spam report
with some highly
+# ragged bits (if's and dodgy vars) that need working on!!

I have tested and it works on our devel box, however my poor
excuse for perl
programming and numerous interruptions means that it will
not be up to your
standard, so you will want to check it over more than once.
Esp the bit marked

# - MH Domain for recipient can't remember why this is here!
but you might
not need it.

If I get time, I will clean it up and redo the lazy parts.

Cheers
Mike

Robert LeBlanc wrote:
> Like the rest of you, I'm sure, I've been receiving a
glut of PDF spam
> lately, and I've been experimenting with various
tactics for curbing the
> onslaught.  Some tactics work better than others,
naturally, so I
> thought I'd share my results here.
> 
> 
Snip
> 
> (6) SaneSecurity signatures
> 
> If you use ClamAV (you do, don't you?), another option
is to use the
> phishing and scam signatures published by SaneSecurity
> <http://www
.sanesecurity.co.uk/clamav/>.  These signatures are
updated
> multiple times a day, and include a lot of PDF spam,
making it perhaps
> the most responsive solution available at the moment.
> 
> These phishing/scam emails get caught by ClamAV rather
than
> SpamAssassin, so they show up in Maia's
"Viruses/Malware" quarantine
> instead of the spam quarantine, which is a bit
annoying, but that's
> something I'll be working to address in future
versions.
> 
> I can't argue with the effectiveness of SaneSecurity's
signatures,
> though--they are by far the most effective blockers of
PDF spam that
> I've found, and I would strongly recommend that you use
them.
> 
> 
> (7) Other plugins
> 
> While rules and plugins that target PDF spam
specifically are very
> useful, it's worth noting that the bulk of the PDF spam
comes from
> botnets, so adding the Botnet plugin
> <http:
//people.ucsc.edu/~jrudd/spamassassin/> can catch a
lot of these
> things on its own, and it provides a nice score
supplement to go along
> with the PDF-specific rules.  The latest version is
0.8, and it just
> needs one small patch (courtesy of Mark Martinec):
> 
> --- Botnet.pm.orig	Mon Aug  6 15:59:16 2007
> +++ Botnet.pm	Mon Aug  6 16:02:43 2007
>  -711,5 +711,14 
>          (defined $max) &&
>          ($max =~ /^-?d+$/) ) {
> -      $resolver = Net:NS::Reso
lver->new();
> +      $resolver = Net:NS::Reso
lver->new(
> +               udp_timeout => 5,
> +               tcp_timeout => 5,
> +               retrans => 0,
> +               retry => 1,
> +               persistent_tcp => 0,
> +               persistent_udp => 0,
> +               dnsrch => 0,
> +               defnames => 0,
> +       );
>        if ($query = $resolver->search($name, $type))
{
>           # found matches
>  -834,5 +843,14 
>     my ($ip) = _;
>     my ($query, answer, $rr);
> -   my $resolver = Net:NS::Reso
lver->new();
> +   my $resolver = Net:NS::Reso
lver->new(
> +       udp_timeout => 5,
> +       tcp_timeout => 5,
> +       retrans => 0,
> +       retry => 1,
> +       persistent_tcp => 0,
> +       persistent_udp => 0,
> +       dnsrch => 0,
> +       defnames => 0,
> +       );
>     my $name = "";
> 
> 
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

Re: PDF spam solutions
country flaguser name
Germany
2007-08-18 03:42:42
Robert LeBlanc schrieb:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Like the rest of you, I'm sure, I've been receiving a
glut of PDF spam
> lately, and I've been experimenting with various
tactics for curbing the
> onslaught.  Some tactics work better than others,
naturally, so I
> thought I'd share my results here.
>
>   
Robert,

I couldn't help noticing that the iXhash plugin I wrote and
the 
corresponding lists the plugin can use work quite well on
this type of spam.
I guess other content-hashing mechanisms (pyzor,razor etc.)
will perform 
similar.

You might want to give it a try. See http://ixhash.sf.net, and
sorry for 
the shameless advertising 

Dirk
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

Re: PDF spam solutions
country flaguser name
Canada
2007-08-20 04:10:19
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dirk Bonengel wrote:

> I couldn't help noticing that the iXhash plugin I wrote
and the 
> corresponding lists the plugin can use work quite well
on this type of spam.
> I guess other content-hashing mechanisms (pyzor,razor
etc.) will perform 
> similar.
> 
> You might want to give it a try. See http://ixhash.sf.net, and
sorry for 
> the shameless advertising 

Thanks Dirk!  Yes, I've been using your iXhash plugin for a
few weeks
now, and it's definitely triggering (703 times/day on
average).  I
didn't mention it in my list of PDF spam solutions for the
same reason
that I didn't mention Razor, Pyzor, or DCC--these are not
PDF-specific
solutions, they are general-purpose spam solutions.

On that note, though, I have a question for you.  Will there
be a spam
reporting mechanism for iXhash, as there is for
Razor/Pyzor/DCC?  At
present it appears to be a read-only database, but if you do
eventually
introduce a reporting mechanism I'd like to provide support
for it in
the process-quarantine.pl script.

- --
Robert LeBlanc <rjlrenaissoft.com>
Renaissoft, Inc.
Maia Mailguard <http://www.maiamail
guard.com/>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGyVp7GmqOER2NHewRAu1DAJ97SyHmBQbm8Zxn+ySSgLLxhEGWyACd
EzvD
g5HzasFXWbSig4AaHbML+NM=
=DjAO
-----END PGP SIGNATURE-----
_______________________________________________
Maia-users mailing list
Maia-usersrenaissoft.com
http://www.renaissoft.com/mailman/listinfo/maia-users

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )