List Info

Thread: Mimetype optimization




Mimetype optimization
user name
2008-04-22 13:06:57
Hello,

profiling dolphin while opening /usr/bin shows that about
35% of time is spent 
in QRegExp matching called by kmimefactory.cpp:
matchFileName. This function 
already contains optimizations: for some simple (but
popular) patterns like 
*.something or something* it uses direct comparision instead
of QRegExp big 
gun. It looks like:
   if (pattern like *.something && filename long
enough to match the pattern)  
compare directly;. 
However for short names (lots of them in /usr/bin) it often
falls back to slow 
path. Attached patch changes is to:
if (pattern like *.something) { 
	if (filename not long enough to match) return false;
	else compare directly;
}

and detects patterns without wildcards (is checking for *,?
and ] enough?)

Results (measured by callgrind):
Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
- 5554 calls to QRegExp::exactMatch, which accounts for 35%
of CPU time

Patched,  for 150 calls to
KMimeTypeFactory::findFromFileName
- QRegExp matched 13 times.  findFromFileName takes 0.67% of
CPU time


_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

  
Re: Mimetype optimization
user name
2008-04-22 15:36:09
On Tuesday 22 April 2008, Jakub Stachowski wrote:
> Hello,
> 
> profiling dolphin while opening /usr/bin shows that
about 35% of time is spent 
> in QRegExp matching called by kmimefactory.cpp:
matchFileName. This function 
> already contains optimizations: for some simple (but
popular) patterns like 
> *.something or something* it uses direct comparision
instead of QRegExp big 
> gun. It looks like:
>    if (pattern like *.something && filename
long enough to match the pattern)  
> compare directly;. 
> However for short names (lots of them in /usr/bin) it
often falls back to slow 
> path. Attached patch changes is to:
> if (pattern like *.something) { 
> 	if (filename not long enough to match) return false;
> 	else compare directly;
> }
> 
> and detects patterns without wildcards (is checking for
*,? and ] enough?)
> 
> Results (measured by callgrind):
> Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> - 5554 calls to QRegExp::exactMatch, which accounts for
35% of CPU time
> 
> Patched,  for 150 calls to
KMimeTypeFactory::findFromFileName
> - QRegExp matched 13 times.  findFromFileName takes
0.67% of CPU time

Wow, excellent. Great find.
If the kdecore/tests/kmimetypetest unit test still passes,
please commit 

-- 
David Faure, faurekde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

Re: Mimetype optimization
user name
2008-04-22 16:21:45
Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > Hello,
> >
> > profiling dolphin while opening /usr/bin shows
that about 35% of time is
> > spent in QRegExp matching called by
kmimefactory.cpp: matchFileName. This
> > function already contains optimizations: for some
simple (but popular)
> > patterns like *.something or something* it uses
direct comparision
> > instead of QRegExp big gun. It looks like:
> >    if (pattern like *.something &&
filename long enough to match the
> > pattern) compare directly;.
> > However for short names (lots of them in /usr/bin)
it often falls back to
> > slow path. Attached patch changes is to:
> > if (pattern like *.something) {
> > 	if (filename not long enough to match) return
false;
> > 	else compare directly;
> > }
> >
> > and detects patterns without wildcards (is
checking for *,? and ]
> > enough?)
> >
> > Results (measured by callgrind):
> > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> >
> > Patched,  for 150 calls to
KMimeTypeFactory::findFromFileName
> > - QRegExp matched 13 times.  findFromFileName
takes 0.67% of CPU time
>
> Wow, excellent. Great find.
> If the kdecore/tests/kmimetypetest unit test still
passes, please commit 

Actually I get two failures (at lines 297 and 619, both
related to *.doc) with 
and without the patch. But nothing changed, so I'm
commiting.

_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
Re: Mimetype optimization
user name
2008-04-22 16:29:55
On Tuesday 22 April 2008, Jakub Stachowski wrote:
> Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> > On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > > Hello,
> > >
> > > profiling dolphin while opening /usr/bin
shows that about 35% of time is
> > > spent in QRegExp matching called by
kmimefactory.cpp: matchFileName. This
> > > function already contains optimizations: for
some simple (but popular)
> > > patterns like *.something or something* it
uses direct comparision
> > > instead of QRegExp big gun. It looks like:
> > >    if (pattern like *.something &&
filename long enough to match the
> > > pattern) compare directly;.
> > > However for short names (lots of them in
/usr/bin) it often falls back to
> > > slow path. Attached patch changes is to:
> > > if (pattern like *.something) {
> > > 	if (filename not long enough to match)
return false;
> > > 	else compare directly;
> > > }
> > >
> > > and detects patterns without wildcards (is
checking for *,? and ]
> > > enough?)
> > >
> > > Results (measured by callgrind):
> > > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> > >
> > > Patched,  for 150 calls to
KMimeTypeFactory::findFromFileName
> > > - QRegExp matched 13 times.  findFromFileName
takes 0.67% of CPU time
> >
> > Wow, excellent. Great find.
> > If the kdecore/tests/kmimetypetest unit test still
passes, please commit 
> 
> Actually I get two failures (at lines 297 and 619, both
related to *.doc) with 
> and without the patch. But nothing changed, so I'm
commiting.

With which version of shared-mime-info?

-- 
David Faure, faurekde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
Re: Mimetype optimization
user name
2008-04-22 16:01:05
Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> Index: kmimetypefactory.cpp
>
============================================================
=======
> --- kmimetypefactory.cpp        (wersja 799070)
> +++ kmimetypefactory.cpp        (kopia robocza)
>  -190,8 +190,10 
>      int len = filename.length();
>
>      // Patterns like "*~",
"*.extension"
> -    if (pattern[0] == '*' && len + 1 >=
pattern_len &&
> pattern.indexOf('[') == -1) +    if (pattern[0] == '*' 
&&
> pattern.indexOf('[') == -1)
>      {
> +        if ( len + 1 < pattern_len ) return false;
> +

What about patern like *foo*bar*
this will not match "foobar" anymore.
I think you should search for '*' in the pattern

(oh, and is there some escaping in regexp? such as
"*foo??" )


_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

Re: Mimetype optimization
user name
2008-04-22 16:41:53
Dnia wtorek, 22 kwietnia 2008, Jakub Stachowski napisał:
> Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> > On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > > Hello,
> > >
> > > profiling dolphin while opening /usr/bin
shows that about 35% of time
> > > is spent in QRegExp matching called by
kmimefactory.cpp: matchFileName.
> > > This function already contains optimizations:
for some simple (but
> > > popular) patterns like *.something or
something* it uses direct
> > > comparision instead of QRegExp big gun. It
looks like:
> > >    if (pattern like *.something &&
filename long enough to match the
> > > pattern) compare directly;.
> > > However for short names (lots of them in
/usr/bin) it often falls back
> > > to slow path. Attached patch changes is to:
> > > if (pattern like *.something) {
> > > 	if (filename not long enough to match)
return false;
> > > 	else compare directly;
> > > }
> > >
> > > and detects patterns without wildcards (is
checking for *,? and ]
> > > enough?)
> > >
> > > Results (measured by callgrind):
> > > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> > >
> > > Patched,  for 150 calls to
KMimeTypeFactory::findFromFileName
> > > - QRegExp matched 13 times.  findFromFileName
takes 0.67% of CPU time
> >
> > Wow, excellent. Great find.
> > If the kdecore/tests/kmimetypetest unit test still
passes, please commit
> > 
>
> Actually I get two failures (at lines 297 and 619, both
related to *.doc)
> with and without the patch. But nothing changed, so I'm
commiting.
>
> _______________________________________________
> Kde-optimize mailing list
> Kde-optimizekde.org
> ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

0.22 from open suse 10.3
_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
Re: Mimetype optimization
user name
2008-04-23 12:48:18
Dnia wtorek, 22 kwietnia 2008, Olivier Goffart napisał:
> Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> > Index: kmimetypefactory.cpp
> >
============================================================
=======
> > --- kmimetypefactory.cpp        (wersja 799070)
> > +++ kmimetypefactory.cpp        (kopia robocza)
> >  -190,8 +190,10 
> >      int len = filename.length();
> >
> >      // Patterns like "*~",
"*.extension"
> > -    if (pattern[0] == '*' && len + 1
>= pattern_len &&
> > pattern.indexOf('[') == -1) +    if (pattern[0] ==
'*'  &&
> > pattern.indexOf('[') == -1)
> >      {
> > +        if ( len + 1 < pattern_len ) return
false;
> > +
>
> What about patern like *foo*bar*
> this will not match "foobar" anymore.
> I think you should search for '*' in the pattern
Right. 
>
> (oh, and is there some escaping in regexp? such as
"*foo??" )

Right. Current parser is somehow limited - it does not
detect ? or ''. There 
are no pattern with these chars yet, but they may be added.

Attached is a patch to fix these problems.
I added some static QChar() vars (conversion char* ->
QChar took ~10% of 
matchFileName). Is it a problem?

_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

  
Re: Mimetype optimization
user name
2008-04-23 16:35:35
Le mercredi 23 avril 2008, Jakub Stachowski a écrit :
> Attached is a patch to fix these problems.
> I added some static QChar() vars (conversion char*
-> QChar took ~10% of
> matchFileName). Is it a problem?

You can do
pattern.contains(ushort('*'))

since the implicit convertion from ushort to QChar is inline
and immediate, it 
does with no cost at all.    (QChar is actually a ushort
internally)

_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

Re: Mimetype optimization
user name
2008-04-24 10:09:53
On Tuesday 22 April 2008, Olivier Goffart wrote:
> Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> > Index: kmimetypefactory.cpp
> >
============================================================
=======
> > --- kmimetypefactory.cpp        (wersja 799070)
> > +++ kmimetypefactory.cpp        (kopia robocza)
> >  -190,8 +190,10 
> >      int len = filename.length();
> >
> >      // Patterns like "*~",
"*.extension"
> > -    if (pattern[0] == '*' && len + 1
>= pattern_len &&
> > pattern.indexOf('[') == -1) +    if (pattern[0] ==
'*'  &&
> > pattern.indexOf('[') == -1)
> >      {
> > +        if ( len + 1 < pattern_len ) return
false;
> > +
> 
> What about patern like *foo*bar*
> this will not match "foobar" anymore.
> I think you should search for '*' in the pattern

Well, I would just like to point out that there is no such
pattern in any mimetype that I can ever think of...
Let's optimize for what is useful, we can always extend the
code if we need some strange mimetype glob to work.
Right now in both the fdo and the kde mimetypes, the * is
always there only once, and always at the beginning or end.

-- 
David Faure, faurekde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

Re: Mimetype optimization
user name
2008-04-24 15:02:17
Le mercredi 23 avril 2008, Olivier Goffart a écrit :
> Le mercredi 23 avril 2008, Jakub Stachowski a
écrit :
> > Attached is a patch to fix these problems.
> > I added some static QChar() vars (conversion char*
-> QChar took ~10% of
> > matchFileName). Is it a problem?
>
> You can do
> pattern.contains(ushort('*'))
>
> since the implicit convertion from ushort to QChar is
inline and immediate,
> it does with no cost at all.    (QChar is actually a
ushort internally)

Actually, QLatin1Char('*') is also perfectly optimized
If it show on your profiling information it is maybe because
Qt is compiled 
with debug, and the inline method are not inlined.


_______________________________________________
Kde-optimize mailing list
Kde-optimizekde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize

[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )