|
List Info
Thread: Mimetype optimization
|
|
| Mimetype optimization |

|
2008-04-22 13:06:57 |
Hello,
profiling dolphin while opening /usr/bin shows that about
35% of time is spent
in QRegExp matching called by kmimefactory.cpp:
matchFileName. This function
already contains optimizations: for some simple (but
popular) patterns like
*.something or something* it uses direct comparision instead
of QRegExp big
gun. It looks like:
if (pattern like *.something && filename long
enough to match the pattern)
compare directly;.
However for short names (lots of them in /usr/bin) it often
falls back to slow
path. Attached patch changes is to:
if (pattern like *.something) {
if (filename not long enough to match) return false;
else compare directly;
}
and detects patterns without wildcards (is checking for *,?
and ] enough?)
Results (measured by callgrind):
Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
- 5554 calls to QRegExp::exactMatch, which accounts for 35%
of CPU time
Patched, for 150 calls to
KMimeTypeFactory::findFromFileName
- QRegExp matched 13 times. findFromFileName takes 0.67% of
CPU time
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
|
| Re: Mimetype optimization |

|
2008-04-22 15:36:09 |
On Tuesday 22 April 2008, Jakub Stachowski wrote:
> Hello,
>
> profiling dolphin while opening /usr/bin shows that
about 35% of time is spent
> in QRegExp matching called by kmimefactory.cpp:
matchFileName. This function
> already contains optimizations: for some simple (but
popular) patterns like
> *.something or something* it uses direct comparision
instead of QRegExp big
> gun. It looks like:
> if (pattern like *.something && filename
long enough to match the pattern)
> compare directly;.
> However for short names (lots of them in /usr/bin) it
often falls back to slow
> path. Attached patch changes is to:
> if (pattern like *.something) {
> if (filename not long enough to match) return false;
> else compare directly;
> }
>
> and detects patterns without wildcards (is checking for
*,? and ] enough?)
>
> Results (measured by callgrind):
> Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> - 5554 calls to QRegExp::exactMatch, which accounts for
35% of CPU time
>
> Patched, for 150 calls to
KMimeTypeFactory::findFromFileName
> - QRegExp matched 13 times. findFromFileName takes
0.67% of CPU time
Wow, excellent. Great find.
If the kdecore/tests/kmimetypetest unit test still passes,
please commit
--
David Faure, faure kde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-22 16:21:45 |
Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > Hello,
> >
> > profiling dolphin while opening /usr/bin shows
that about 35% of time is
> > spent in QRegExp matching called by
kmimefactory.cpp: matchFileName. This
> > function already contains optimizations: for some
simple (but popular)
> > patterns like *.something or something* it uses
direct comparision
> > instead of QRegExp big gun. It looks like:
> > if (pattern like *.something &&
filename long enough to match the
> > pattern) compare directly;.
> > However for short names (lots of them in /usr/bin)
it often falls back to
> > slow path. Attached patch changes is to:
> > if (pattern like *.something) {
> > if (filename not long enough to match) return
false;
> > else compare directly;
> > }
> >
> > and detects patterns without wildcards (is
checking for *,? and ]
> > enough?)
> >
> > Results (measured by callgrind):
> > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> >
> > Patched, for 150 calls to
KMimeTypeFactory::findFromFileName
> > - QRegExp matched 13 times. findFromFileName
takes 0.67% of CPU time
>
> Wow, excellent. Great find.
> If the kdecore/tests/kmimetypetest unit test still
passes, please commit
Actually I get two failures (at lines 297 and 619, both
related to *.doc) with
and without the patch. But nothing changed, so I'm
commiting.
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-22 16:29:55 |
On Tuesday 22 April 2008, Jakub Stachowski wrote:
> Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> > On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > > Hello,
> > >
> > > profiling dolphin while opening /usr/bin
shows that about 35% of time is
> > > spent in QRegExp matching called by
kmimefactory.cpp: matchFileName. This
> > > function already contains optimizations: for
some simple (but popular)
> > > patterns like *.something or something* it
uses direct comparision
> > > instead of QRegExp big gun. It looks like:
> > > if (pattern like *.something &&
filename long enough to match the
> > > pattern) compare directly;.
> > > However for short names (lots of them in
/usr/bin) it often falls back to
> > > slow path. Attached patch changes is to:
> > > if (pattern like *.something) {
> > > if (filename not long enough to match)
return false;
> > > else compare directly;
> > > }
> > >
> > > and detects patterns without wildcards (is
checking for *,? and ]
> > > enough?)
> > >
> > > Results (measured by callgrind):
> > > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> > >
> > > Patched, for 150 calls to
KMimeTypeFactory::findFromFileName
> > > - QRegExp matched 13 times. findFromFileName
takes 0.67% of CPU time
> >
> > Wow, excellent. Great find.
> > If the kdecore/tests/kmimetypetest unit test still
passes, please commit
>
> Actually I get two failures (at lines 297 and 619, both
related to *.doc) with
> and without the patch. But nothing changed, so I'm
commiting.
With which version of shared-mime-info?
--
David Faure, faure kde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-22 16:01:05 |
Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> Index: kmimetypefactory.cpp
>
============================================================
=======
> --- kmimetypefactory.cpp (wersja 799070)
> +++ kmimetypefactory.cpp (kopia robocza)
>  -190,8 +190,10 
> int len = filename.length();
>
> // Patterns like "*~",
"*.extension"
> - if (pattern[0] == '*' && len + 1 >=
pattern_len &&
> pattern.indexOf('[') == -1) + if (pattern[0] == '*'
&&
> pattern.indexOf('[') == -1)
> {
> + if ( len + 1 < pattern_len ) return false;
> +
What about patern like *foo*bar*
this will not match "foobar" anymore.
I think you should search for '*' in the pattern
(oh, and is there some escaping in regexp? such as
"*foo??" )
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-22 16:41:53 |
Dnia wtorek, 22 kwietnia 2008, Jakub Stachowski napisał:
> Dnia wtorek, 22 kwietnia 2008, David Faure napisał:
> > On Tuesday 22 April 2008, Jakub Stachowski wrote:
> > > Hello,
> > >
> > > profiling dolphin while opening /usr/bin
shows that about 35% of time
> > > is spent in QRegExp matching called by
kmimefactory.cpp: matchFileName.
> > > This function already contains optimizations:
for some simple (but
> > > popular) patterns like *.something or
something* it uses direct
> > > comparision instead of QRegExp big gun. It
looks like:
> > > if (pattern like *.something &&
filename long enough to match the
> > > pattern) compare directly;.
> > > However for short names (lots of them in
/usr/bin) it often falls back
> > > to slow path. Attached patch changes is to:
> > > if (pattern like *.something) {
> > > if (filename not long enough to match)
return false;
> > > else compare directly;
> > > }
> > >
> > > and detects patterns without wildcards (is
checking for *,? and ]
> > > enough?)
> > >
> > > Results (measured by callgrind):
> > > Unpatched, for 179 calls to
KMimeTypeFactory::findFromFileName
> > > - 5554 calls to QRegExp::exactMatch, which
accounts for 35% of CPU time
> > >
> > > Patched, for 150 calls to
KMimeTypeFactory::findFromFileName
> > > - QRegExp matched 13 times. findFromFileName
takes 0.67% of CPU time
> >
> > Wow, excellent. Great find.
> > If the kdecore/tests/kmimetypetest unit test still
passes, please commit
> >
>
> Actually I get two failures (at lines 297 and 619, both
related to *.doc)
> with and without the patch. But nothing changed, so I'm
commiting.
>
> _______________________________________________
> Kde-optimize mailing list
> Kde-optimize kde.org
> ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
0.22 from open suse 10.3
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-23 12:48:18 |
Dnia wtorek, 22 kwietnia 2008, Olivier Goffart napisał:
> Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> > Index: kmimetypefactory.cpp
> >
============================================================
=======
> > --- kmimetypefactory.cpp (wersja 799070)
> > +++ kmimetypefactory.cpp (kopia robocza)
> >  -190,8 +190,10 
> > int len = filename.length();
> >
> > // Patterns like "*~",
"*.extension"
> > - if (pattern[0] == '*' && len + 1
>= pattern_len &&
> > pattern.indexOf('[') == -1) + if (pattern[0] ==
'*' &&
> > pattern.indexOf('[') == -1)
> > {
> > + if ( len + 1 < pattern_len ) return
false;
> > +
>
> What about patern like *foo*bar*
> this will not match "foobar" anymore.
> I think you should search for '*' in the pattern
Right.
>
> (oh, and is there some escaping in regexp? such as
"*foo??" )
Right. Current parser is somehow limited - it does not
detect ? or ''. There
are no pattern with these chars yet, but they may be added.
Attached is a patch to fix these problems.
I added some static QChar() vars (conversion char* ->
QChar took ~10% of
matchFileName). Is it a problem?
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
|
| Re: Mimetype optimization |

|
2008-04-23 16:35:35 |
Le mercredi 23 avril 2008, Jakub Stachowski a écrit :
> Attached is a patch to fix these problems.
> I added some static QChar() vars (conversion char*
-> QChar took ~10% of
> matchFileName). Is it a problem?
You can do
pattern.contains(ushort('*'))
since the implicit convertion from ushort to QChar is inline
and immediate, it
does with no cost at all. (QChar is actually a ushort
internally)
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-24 10:09:53 |
On Tuesday 22 April 2008, Olivier Goffart wrote:
> Le mardi 22 avril 2008, Jakub Stachowski a écrit :
> > Index: kmimetypefactory.cpp
> >
============================================================
=======
> > --- kmimetypefactory.cpp (wersja 799070)
> > +++ kmimetypefactory.cpp (kopia robocza)
> >  -190,8 +190,10 
> > int len = filename.length();
> >
> > // Patterns like "*~",
"*.extension"
> > - if (pattern[0] == '*' && len + 1
>= pattern_len &&
> > pattern.indexOf('[') == -1) + if (pattern[0] ==
'*' &&
> > pattern.indexOf('[') == -1)
> > {
> > + if ( len + 1 < pattern_len ) return
false;
> > +
>
> What about patern like *foo*bar*
> this will not match "foobar" anymore.
> I think you should search for '*' in the pattern
Well, I would just like to point out that there is no such
pattern in any mimetype that I can ever think of...
Let's optimize for what is useful, we can always extend the
code if we need some strange mimetype glob to work.
Right now in both the fdo and the kde mimetypes, the * is
always there only once, and always at the beginning or end.
--
David Faure, faure kde.org, sponsored by Trolltech to work on
KDE,
Konqueror (http://www.konqueror.org
), and KOffice (http://www.koffice.org).
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
| Re: Mimetype optimization |

|
2008-04-24 15:02:17 |
Le mercredi 23 avril 2008, Olivier Goffart a écrit :
> Le mercredi 23 avril 2008, Jakub Stachowski a
écrit :
> > Attached is a patch to fix these problems.
> > I added some static QChar() vars (conversion char*
-> QChar took ~10% of
> > matchFileName). Is it a problem?
>
> You can do
> pattern.contains(ushort('*'))
>
> since the implicit convertion from ushort to QChar is
inline and immediate,
> it does with no cost at all. (QChar is actually a
ushort internally)
Actually, QLatin1Char('*') is also perfectly optimized
If it show on your profiling information it is maybe because
Qt is compiled
with debug, and the inline method are not inlined.
_______________________________________________
Kde-optimize mailing list
Kde-optimize kde.org
ht
tps://mail.kde.org/mailman/listinfo/kde-optimize
|
|
[1-10]
|
|