List Info

Thread: Setting up xpath_sifter.py in config.ini




Setting up xpath_sifter.py in config.ini
user name
2006-11-02 21:07:17
Harry Fuecks wrote:
>> Just a guess, but perhaps the problem is the cache?
>>
>> Filters can be used to stop new entries from being
written to the cache,
>> but don't remove entries that are already there. 
If in your exploration
>> you ever included too much, subsequent changes to
the filter won't fix
>> this.  Of course, over time, this will work itself
out as the existing
>> entries move down the page and ultimately fall off
the bottom.
> 
> It's strange. Using reconstitute.py as per earlier
examples, it works
> correctly - I only get entries for Jürg Stuker. But I
get all when
> using planet.py
> 
> I've been deleting the cache (the entire directory in
fact) and still
> all entries come through. The log reports it's checking
the feed and
> running the XPath filter...
> 
> INFO:planet.runner:Updating feed http://blog.namics.co
m/atom.xml 
> http://blog.namics.co
m/atom.xml
> DEBUG:planet.runner:E-Tag:
"8f0ce6-9843-e40dc900"
> DEBUG:planet.runner:Last Modified: Wed Nov  1 18:44:20
2006
> DEBUG:planet.runner:Processing filter
> /root/venus/filters/xpath_sifter.py using py
> DEBUG:planet.runner:Processing filter
> /root/venus/filters/xpath_sifter.py using py
> [ same line 12 more times ]
> 
> Then adding some simple debugging to xpath_filter.py, I
see it's
> getting the --require option correctly and it's the
full xpath
> statement which has been correctly decoded back to
UTF-8.
> 
> But still the output (http://webtuesday.ch/pla
net/) shows all authors
> from the namics feed while it's successfully filtered
down to "HarryF"
> for the sitepoint feed. And there's definately only one
entry for
> http://blog.namics.co
m/atom.xml in config.ini (containing the filter).
> 
> In spider.py, running the filters...
> 
>        for filter in config.filters(feed):
>            output = shell.run(filter, output,
mode="filter")
>            if not output: break
> 
> Wonder if some combination of factors is causing
xpath_sifter.py is
> emitting _something_ on STDOUT even though an entry
failed a require
> test? That said, it filter correctly with this xpath
statement when
> using tests/reconstitute.py, so strange.

Filters are allowed to modify the entry, so output from the
filter would 
be treated as the modified entry.  The symptoms you are
describing 
indicate that the filter is not only outputting something,
it is 
outputting the entry itself.

I've tried creating a planet with just those two entries,
and only saw 
entries from the people indicated in the filters.  I've run
it 
successfully on both Ubuntu and Windows.

> Perhaps a check for the return code in
planet/shell/py.py would catch
> this? Don't won't to mess around much more with the
"live" planet -
> when I get more time will try to reproduce the problem
locally.

Try copying your config.ini, and changing two lines:
cache_directory and 
output_dir.

- Sam Ruby
-- 
devel mailing list
devellists.planetplanet.org

http://lists.planetplanet.org/mailman/listinfo/devel
Setting up xpath_sifter.py in config.ini
user name
2006-11-03 12:51:01
Think I've figured it out - it looks like
planet/reconstitute.py (I
guess) is injecting an author into the entry below an atom
source tag,
which it got from config.ini

Because my xpath statement is using a //atom, if I have the
following
in config.ini;

[http://blog.namics.co
m/atom.xml]
name = Jürg Stuker
filters =
xpath_sifter.py?require=//atom%3Aauthor/atom%3Aname%3D'J%C3%
BCrg%20Stuker'

I get _all_ entries from this feed. Dumping the input entry
document
passed xpath_sifter.py I see this;

<author>
<name>Luzia Hafen</name>
</author>
<source>
<id>tag:blog.namics.com,2006://1</id>
<author>
<name>Jürg Stuker</name>
</author>

If, by chance, I use a different name in config.ini it works
correctly e.g.

[http://blog.namics.co
m/atom.xml]
name = Nobby Clark
filters =
xpath_sifter.py?require=//atom%3Aauthor/atom%3Aname%3D'J%C3%
BCrg%20Stuker'

So the fix looks like I just need to drop one of the the
initial
foward slashes in the xpath statement.

On 11/2/06, Sam Ruby <rubysintertwingly.net> wrote:
> Harry Fuecks wrote:
> >> Just a guess, but perhaps the problem is the
cache?
> >>
> >> Filters can be used to stop new entries from
being written to the cache,
> >> but don't remove entries that are already
there.  If in your exploration
> >> you ever included too much, subsequent changes
to the filter won't fix
> >> this.  Of course, over time, this will work
itself out as the existing
> >> entries move down the page and ultimately fall
off the bottom.
> >
> > It's strange. Using reconstitute.py as per earlier
examples, it works
> > correctly - I only get entries for Jürg Stuker.
But I get all when
> > using planet.py
> >
> > I've been deleting the cache (the entire directory
in fact) and still
> > all entries come through. The log reports it's
checking the feed and
> > running the XPath filter...
> >
> > INFO:planet.runner:Updating feed http://blog.namics.co
m/atom.xml 
> > http://blog.namics.co
m/atom.xml
> > DEBUG:planet.runner:E-Tag:
"8f0ce6-9843-e40dc900"
> > DEBUG:planet.runner:Last Modified: Wed Nov  1
18:44:20 2006
> > DEBUG:planet.runner:Processing filter
> > /root/venus/filters/xpath_sifter.py using py
> > DEBUG:planet.runner:Processing filter
> > /root/venus/filters/xpath_sifter.py using py
> > [ same line 12 more times ]
> >
> > Then adding some simple debugging to
xpath_filter.py, I see it's
> > getting the --require option correctly and it's
the full xpath
> > statement which has been correctly decoded back to
UTF-8.
> >
> > But still the output (http://webtuesday.ch/pla
net/) shows all authors
> > from the namics feed while it's successfully
filtered down to "HarryF"
> > for the sitepoint feed. And there's definately
only one entry for
> > http://blog.namics.co
m/atom.xml in config.ini (containing the filter).
> >
> > In spider.py, running the filters...
> >
> >        for filter in config.filters(feed):
> >            output = shell.run(filter, output,
mode="filter")
> >            if not output: break
> >
> > Wonder if some combination of factors is causing
xpath_sifter.py is
> > emitting _something_ on STDOUT even though an
entry failed a require
> > test? That said, it filter correctly with this
xpath statement when
> > using tests/reconstitute.py, so strange.
>
> Filters are allowed to modify the entry, so output from
the filter would
> be treated as the modified entry.  The symptoms you are
describing
> indicate that the filter is not only outputting
something, it is
> outputting the entry itself.
>
> I've tried creating a planet with just those two
entries, and only saw
> entries from the people indicated in the filters.  I've
run it
> successfully on both Ubuntu and Windows.
>
> > Perhaps a check for the return code in
planet/shell/py.py would catch
> > this? Don't won't to mess around much more with
the "live" planet -
> > when I get more time will try to reproduce the
problem locally.
>
> Try copying your config.ini, and changing two lines:
cache_directory and
> output_dir.
>
> - Sam Ruby
>
-- 
devel mailing list
devellists.planetplanet.org

http://lists.planetplanet.org/mailman/listinfo/devel
Setting up xpath_sifter.py in config.ini
user name
2006-11-04 17:40:50
Harry Fuecks wrote:
> Think I've figured it out - it looks like
planet/reconstitute.py (I
> guess) is injecting an author into the entry below an
atom source tag,
> which it got from config.ini

Good catch.  I've fixed it so that this no longer occurs on
entries 
which have author information.

- Sam Ruby


-- 
devel mailing list
devellists.planetplanet.org

http://lists.planetplanet.org/mailman/listinfo/devel
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )