|
List Info
Thread: Ad-filter loading and QRegExp performance
|
|
| Ad-filter loading and QRegExp
performance |

|
2007-01-29 08:14:26 |
Hello,
Out of curiosity, I did a little profiling of Konqueror (
KDE 4 )
startup recently using callgrind.
I was surprised to see that according to the output, about
5-10% of
the time is spent processing regular expressions for ad
filters.
QRegExp seems to have a mechanism internally to delay
parsing the
expression until it is needed ( ie. until indexIn() ,
exactMatch() or
a similar method is called on the regexp). Whenever a
QRegExp is
copied, the engine preparation is performed first, so
copying a
QRegExp loses the benefits of this delayed parsing.
KHTML stores the ad-filters in a QVector<QRegExp>
internally, which
means copies inside Qt when using the append() method to add
a new
filter to the internal list. This forces all ad-filter
reg-exps to be
parsed on startup.
Konqueror ships with almost 200 ad filters out of the box,
and parsing
all of these takes some time.
Initially I tried replacing QVector<QRegExp> with
QVector<QRegExp*>
instead, but I realised that KHTMLSettings needs to be
copied, and so
these pointers would need to be shared somehow. I am not
sure of the
best way to do that.
The alternative, would be to modify the QRegExp code so that
the
assignment operator did not automatically parse the reg exp.
Who
should I get in touch with to discuss this?
There is also some discussion about whether it is right of
us to ship
any ad-filters, or 200 of them with Konqueror, but I want to
leave
that for another time.
Regards,
Robert.
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Ad-filter loading and QRegExp
performance |

|
2007-01-29 09:07:18 |
On Monday 29 January 2007 15:14, Robert Knight wrote:
[...]
> The alternative, would be to modify the QRegExp code so
that the
> assignment operator did not automatically parse the reg
exp. Who
> should I get in touch with to discuss this?
It would be practical if you mailed this suggestion to
qt-bugs trolltech.com
so it can possibly go into Qt, yupp.
Cheers,
Frans
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Ad-filter loading and QRegExp
performance |

|
2007-01-29 09:41:03 |
On Monday 29 January 2007 15:14, Robert Knight wrote:
> Hello,
>
> Out of curiosity, I did a little profiling of Konqueror
( KDE 4 )
> startup recently using callgrind.
> I was surprised to see that according to the output,
about 5-10% of
> the time is spent processing regular expressions for ad
filters.
Does this apply to KDE3 as well?
> QRegExp seems to have a mechanism internally to delay
parsing the
> expression until it is needed ( ie. until indexIn() ,
exactMatch() or
> a similar method is called on the regexp). Whenever a
QRegExp is
> copied, the engine preparation is performed first, so
copying a
> QRegExp loses the benefits of this delayed parsing.
> KHTML stores the ad-filters in a QVector<QRegExp>
internally, which
> means copies inside Qt when using the append() method
to add a new
> filter to the internal list. This forces all ad-filter
reg-exps to be
> parsed on startup.
>
> Konqueror ships with almost 200 ad filters out of the
box, and parsing
> all of these takes some time.
>
> Initially I tried replacing QVector<QRegExp> with
QVector<QRegExp*>
> instead, but I realised that KHTMLSettings needs to be
copied, and so
> these pointers would need to be shared somehow. I am
not sure of the
> best way to do that.
A simpler way should be storing also
QVector<QString>, initially have
QVector<QRegExp> empty and fill it only when it's
needed for the first time.
> The alternative, would be to modify the QRegExp code so
that the
> assignment operator did not automatically parse the reg
exp. Who
> should I get in touch with to discuss this?
That indeed looks like the best choice.
--
Lubos Lunak
KDE developer
------------------------------------------------------------
--
SUSE LINUX, s.r.o. e-mail: l.lunak suse.cz , l.lunak kde.org
Lihovarska 1060/12 tel: +420 284 028 972
190 00 Prague 9 fax: +420 284 028 951
Czech Republic http//www.suse.cz
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Ad-filter loading and QRegExp
performance |

|
2007-01-29 12:36:10 |
> Does this apply to KDE3 as well?
No, QRegExp handles copying differently in Qt 3.
> A simpler way should be storing also
QVector<QString>, initially have
> QVector<QRegExp> empty and fill it only when it's
needed for the first time.
That makes sense, although at since there is plenty of time
for TT to
improve QRegExp before KDE 4 is released, I will file a bug
report.
Even if/when they are able to make this change, the regular
expressions still have to be parsed at some point before
they can be
used to filter images out. It probably wouldn't be a bad
idea to cut
down on the number of filters supplied out of the box.
I wouldn't be surprised if a number of them are no longer
useful
(because ad providers changed their URLs since the filters
were
originally created, or because sites changed the way they
display
adverts to their users).
Regards,
Robert
On 29/01/07, Lubos Lunak <l.lunak suse.cz> wrote:
> On Monday 29 January 2007 15:14, Robert Knight wrote:
> > Hello,
> >
> > Out of curiosity, I did a little profiling of
Konqueror ( KDE 4 )
> > startup recently using callgrind.
> > I was surprised to see that according to the
output, about 5-10% of
> > the time is spent processing regular expressions
for ad filters.
>
> Does this apply to KDE3 as well?
>
> > QRegExp seems to have a mechanism internally to
delay parsing the
> > expression until it is needed ( ie. until
indexIn() , exactMatch() or
> > a similar method is called on the regexp).
Whenever a QRegExp is
> > copied, the engine preparation is performed first,
so copying a
> > QRegExp loses the benefits of this delayed
parsing.
> > KHTML stores the ad-filters in a
QVector<QRegExp> internally, which
> > means copies inside Qt when using the append()
method to add a new
> > filter to the internal list. This forces all
ad-filter reg-exps to be
> > parsed on startup.
> >
> > Konqueror ships with almost 200 ad filters out of
the box, and parsing
> > all of these takes some time.
> >
> > Initially I tried replacing QVector<QRegExp>
with QVector<QRegExp*>
> > instead, but I realised that KHTMLSettings needs
to be copied, and so
> > these pointers would need to be shared somehow. I
am not sure of the
> > best way to do that.
>
> A simpler way should be storing also
QVector<QString>, initially have
> QVector<QRegExp> empty and fill it only when it's
needed for the first time.
>
> > The alternative, would be to modify the QRegExp
code so that the
> > assignment operator did not automatically parse
the reg exp. Who
> > should I get in touch with to discuss this?
>
> That indeed looks like the best choice.
>
> --
> Lubos Lunak
> KDE developer
>
------------------------------------------------------------
--
> SUSE LINUX, s.r.o. e-mail: l.lunak suse.cz ,
l.lunak kde.org
> Lihovarska 1060/12 tel: +420 284 028 972
> 190 00 Prague 9 fax: +420 284 028 951
> Czech Republic http//www.suse.cz
> _______________________________________________
> Kde-optimize mailing list
> Kde-optimize kde.org
> ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
>
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Ad-filter loading and QRegExp
performance |

|
2007-02-02 17:22:35 |
On Thursday 01 February 2007 14:46, Thiago Macieira wrote:
> Lubos Lunak wrote:
> >> Initially I tried replacing
QVector<QRegExp> with
> >> QVector<QRegExp*> instead, but I
realised that KHTMLSettings needs
> >> to be copied, and so these pointers would need
to be shared
> >> somehow. I am not sure of the best way to do
that.
> >
> > A simpler way should be storing also
QVector<QString>, initially
> > have QVector<QRegExp> empty and fill it only
when it's needed for
> > the first time.
>
> Note that QVector stores the elements themselves in the
vector. Every
> time it grows, it must copy the elements to the new
array. Tulip
> classes are somewhat clever about their growth
strategy, but if you
> know beforehand how many items you'll need, you can
tell it. It
> should help a lot.
>
> Also note that QList does not have this problem. For a
complex type
> like QRegExp, QList does not store the elements
themselves in the
> vector. It stores a pointer to them. That means one
extra malloc(),
> but no object is copied during resizing.
FWIW, at work I noticed that using QList for lists with
millions of
entries (the entries are 3D-coordinates) is way slower than
QVector. In
particular, the destruction of the QList took ages while the
destruction of a QVector doesn't take any time. My test did
basically
the following for QList and QVector:
for ( run = 0; run < maxRuns; run++ ) {
start timer
{
CONTAINER<simple_struct_with_two_float_entries>
l;
for ( i = 0; i < numElements; i++ ) {
l.append( simple_struct_with_two_float_entries( i, i )
);
}
print timer before destruction
}
print timer after destruction
}
In debug mode and for numElements in the hundred thousands
QList was 100
times slower than QVector. In release mode it was just 10
times slower.
(FWIW, this is on Windows.) The slowness of QList seems to
be caused by
the fact that deleting hundred thousands objects takes a lot
of time.
So much for "That means one extra malloc()," It
means just one extra
malloc() per list.append(), but it means list.size()+1
free() when the
list is destroyed (compared to 1 free() in the case of
QVector).
But for a few hundred QStrings this is of course
irrelevant.
Regards,
Ingo
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
[1-5]
|
|