List Info

Thread: bloom filter in netfilter?




bloom filter in netfilter?
country flaguser name
Belgium
2007-03-20 10:07:47
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi,


   I'm wondering if bloom filters could not improve
performance of the
conntracker. For a quick overwiew of bloom filters see
http://www.eecs.harvard.edu/~mich
aelm/NEWWORK/postscripts/BloomFilterSurvey.pdf

In a few words, a bloom filter is a data structure which
represents
concisely a set. When you have a set, you can decide very
quickly if an
element belongs to it.

I was then wondering if we could not get rid of these two
list_for_each_entry in the __nf_conntrack_confirm by using
the bloom
filters.

You can use it as a key for a hash table as well. And
therefore we may
use them at a later stage for the hash table as well. But
let's see
first if you're interested by the previous idea ;)

What do you think?


Regards,
Sebastien Tandel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQFF//jDw76McB8jGxkRClszAJ0Zp2Jw1ctmldL3EX4NPhNchz9OZQCg
luPh
i8edz1Zv/JuBUso/mqTx4EQ=
=xkDc
-----END PGP SIGNATURE-----


Re: bloom filter in netfilter?
country flaguser name
Germany
2007-03-20 10:20:30
Sebastien Tandel wrote:
>    I'm wondering if bloom filters could not improve
performance of the
> conntracker. For a quick overwiew of bloom filters see
> http://www.eecs.harvard.edu/~mich
aelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
> 
> In a few words, a bloom filter is a data structure
which represents
> concisely a set. When you have a set, you can decide
very quickly if an
> element belongs to it.
> 
> I was then wondering if we could not get rid of these
two
> list_for_each_entry in the __nf_conntrack_confirm by
using the bloom
> filters.


With a properly sizes hash table the buckets should have
only
a single entry on average. The problem with bloom filters
is
that its quite expensive to calculate enough hash bits for
a large filter with a low probability of false positives.

> You can use it as a key for a hash table as well. And
therefore we may
> use them at a later stage for the hash table as well.
But let's see
> first if you're interested by the previous idea ;)
> 
> What do you think?


I thought about using them for caching packet classification
results
(DROP/ACCEPT) some time ago. Never got around to try it
though.


Re: bloom filter in netfilter?
country flaguser name
Spain
2007-03-20 10:25:25
Hi Sebastien,

Sebastien Tandel wrote:
>    I'm wondering if bloom filters could not improve
performance of the
> conntracker. For a quick overwiew of bloom filters see
> http://www.eecs.harvard.edu/~mich
aelm/NEWWORK/postscripts/BloomFilterSurvey.pdf

Yes, I know that work.

> In a few words, a bloom filter is a data structure
which represents
> concisely a set. When you have a set, you can decide
very quickly if an
> element belongs to it.
> 
> I was then wondering if we could not get rid of these
two
> list_for_each_entry in the __nf_conntrack_confirm by
using the bloom
> filters.

We can't just get rid of it since bloom filters have false
positives, so 
it could happen that we could miss some new connections that
are not 
actually in the conntrack table.

-- 
The dawn of the fourth age of Linux firewalling is coming; a
time of 
great struggle and heroic deeds -- J.Kadlecsik got inspired
by J.Morris


Re: bloom filter in netfilter?
country flaguser name
Germany
2007-03-20 10:26:34
Pablo Neira Ayuso wrote:
>> I was then wondering if we could not get rid of
these two
>> list_for_each_entry in the __nf_conntrack_confirm
by using the bloom
>> filters.
> 
> 
> We can't just get rid of it since bloom filters have
false positives, so
> it could happen that we could miss some new connections
that are not
> actually in the conntrack table.


That wouldn't be a big problem in my opinion, you can freely
tune the
probability.



Re: bloom filter in netfilter?
user name
2007-03-20 10:34:02
> > We can't just get rid of it since bloom filters
have false positives, so
> > it could happen that we could miss some new
connections that are not
> > actually in the conntrack table.
> 
> That wouldn't be a big problem in my opinion, you can
freely tune the
> probability.

Wouldn't two conntrack entries in the hash for the same
tuples be
catastrophic somewhere? (I have no idea, actually)

The other thing about bloom filters that worries me, having
read
the wikipedia entry, is that you cannot remove elements.
Probability
of false positives thus increases as conntracks are added.
And a
future repetition of exactly the same tuple would certainly
be
a false positive. Tuples repeat pretty fast with things like
BGP,
and a filter regeneration seems to involve shuffling all
current
conntracks into a fresh bloom filter.

Either I misunderstand something fundamental, which is not
unlikely, or that kills the idea.

best regards
  Patrick



Re: bloom filter in netfilter?
country flaguser name
Germany
2007-03-20 10:43:30
Patrick Schaaf wrote:
>>>We can't just get rid of it since bloom filters
have false positives, so
>>>it could happen that we could miss some new
connections that are not
>>>actually in the conntrack table.
>>
>>That wouldn't be a big problem in my opinion, you
can freely tune the
>>probability.
> 
> 
> Wouldn't two conntrack entries in the hash for the same
tuples be
> catastrophic somewhere? (I have no idea, actually)


Yes, but bloom filters only have false positives (already
in
hash, so skip confirmation), never false negatives, so that
would still not happen.

> The other thing about bloom filters that worries me,
having read
> the wikipedia entry, is that you cannot remove
elements. Probability
> of false positives thus increases as conntracks are
added. And a
> future repetition of exactly the same tuple would
certainly be
> a false positive. Tuples repeat pretty fast with things
like BGP,
> and a filter regeneration seems to involve shuffling
all current
> conntracks into a fresh bloom filter.
> 
> Either I misunderstand something fundamental, which is
not
> unlikely, or that kills the idea.


You're right, tuple reuse is a problem, it seems counting
bloom
filters would be needed to deal with that, which are a lot
less
memory efficient. The increasing probabilities of false
positives
with increasing number of entries could be dealt with by
using a
second bloom filter that is filled with new entries once a
low
threshold is exceeded and replaces the active one once a
high
threshold is exceeded.

I have to admit that I'm a huge fan of bloom filters and
always
wanted to use them for something cool 



[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )