Think I've figured it out - it looks like
planet/reconstitute.py (I
guess) is injecting an author into the entry below an atom
source tag,
which it got from config.ini
Because my xpath statement is using a //atom, if I have the
following
in config.ini;
[http://blog.namics.co
m/atom.xml]
name = Jürg Stuker
filters =
xpath_sifter.py?require=//atom%3Aauthor/atom%3Aname%3D'J%C3%
BCrg%20Stuker'
I get _all_ entries from this feed. Dumping the input entry
document
passed xpath_sifter.py I see this;
<author>
<name>Luzia Hafen</name>
</author>
<source>
<id>tag:blog.namics.com,2006://1</id>
<author>
<name>Jürg Stuker</name>
</author>
If, by chance, I use a different name in config.ini it works
correctly e.g.
[http://blog.namics.co
m/atom.xml]
name = Nobby Clark
filters =
xpath_sifter.py?require=//atom%3Aauthor/atom%3Aname%3D'J%C3%
BCrg%20Stuker'
So the fix looks like I just need to drop one of the the
initial
foward slashes in the xpath statement.
On 11/2/06, Sam Ruby <rubys intertwingly.net> wrote:
> Harry Fuecks wrote:
> >> Just a guess, but perhaps the problem is the
cache?
> >>
> >> Filters can be used to stop new entries from
being written to the cache,
> >> but don't remove entries that are already
there. If in your exploration
> >> you ever included too much, subsequent changes
to the filter won't fix
> >> this. Of course, over time, this will work
itself out as the existing
> >> entries move down the page and ultimately fall
off the bottom.
> >
> > It's strange. Using reconstitute.py as per earlier
examples, it works
> > correctly - I only get entries for Jürg Stuker.
But I get all when
> > using planet.py
> >
> > I've been deleting the cache (the entire directory
in fact) and still
> > all entries come through. The log reports it's
checking the feed and
> > running the XPath filter...
> >
> > INFO:planet.runner:Updating feed http://blog.namics.co
m/atom.xml
> > http://blog.namics.co
m/atom.xml
> > DEBUG:planet.runner:E-Tag:
"8f0ce6-9843-e40dc900"
> > DEBUG:planet.runner:Last Modified: Wed Nov 1
18:44:20 2006
> > DEBUG:planet.runner:Processing filter
> > /root/venus/filters/xpath_sifter.py using py
> > DEBUG:planet.runner:Processing filter
> > /root/venus/filters/xpath_sifter.py using py
> > [ same line 12 more times ]
> >
> > Then adding some simple debugging to
xpath_filter.py, I see it's
> > getting the --require option correctly and it's
the full xpath
> > statement which has been correctly decoded back to
UTF-8.
> >
> > But still the output (http://webtuesday.ch/pla
net/) shows all authors
> > from the namics feed while it's successfully
filtered down to "HarryF"
> > for the sitepoint feed. And there's definately
only one entry for
> > http://blog.namics.co
m/atom.xml in config.ini (containing the filter).
> >
> > In spider.py, running the filters...
> >
> > for filter in config.filters(feed):
> > output = shell.run(filter, output,
mode="filter")
> > if not output: break
> >
> > Wonder if some combination of factors is causing
xpath_sifter.py is
> > emitting _something_ on STDOUT even though an
entry failed a require
> > test? That said, it filter correctly with this
xpath statement when
> > using tests/reconstitute.py, so strange.
>
> Filters are allowed to modify the entry, so output from
the filter would
> be treated as the modified entry. The symptoms you are
describing
> indicate that the filter is not only outputting
something, it is
> outputting the entry itself.
>
> I've tried creating a planet with just those two
entries, and only saw
> entries from the people indicated in the filters. I've
run it
> successfully on both Ubuntu and Windows.
>
> > Perhaps a check for the return code in
planet/shell/py.py would catch
> > this? Don't won't to mess around much more with
the "live" planet -
> > when I get more time will try to reproduce the
problem locally.
>
> Try copying your config.ini, and changing two lines:
cache_directory and
> output_dir.
>
> - Sam Ruby
>
--
devel mailing list
devel lists.planetplanet.org
http://lists.planetplanet.org/mailman/listinfo/devel
|