Joe Gregorio wrote:
> On 11/4/06, Sam Ruby <rubys intertwingly.net> wrote:
>> I'm still seeing this.
>
> You will have to clear out "sources/http",
the non-absolute
> location: uri was stored there. The fixed version of
httplib2
> was patched to store the right value, not to
> absolutize the one retrieved from the cache.
Cool.
>> Furthermore, once such an error occurs, it
>> appears that that thread no longer services
requests.
>
> Yes, any uncaught exceptions in a thread terminate the
thread.
What puzzles me is that the logic that I see appears to take
great care
not to hae any uncaught exceptions, both _spider_proc and
the code that
processes the work_queue catch Exception, log it, and should
(to my
reading) continue with the loop.
>> > The second type of problem was a failure to
resolve the server
>> > name. Now fixed, that type of exception is now
caught and logged as an
>> > error.
>>
>> Should I be able to use IRIs if spider_threads is
set to a non-zero
>> value?
>
> I would doubt it, httplib2 only understands URIs, I've
done nothing
> to enable IRIs.
All it takes is code like the following. If done within
httplib2, every
user of that library would benefit:
# iri support
try:
if isinstance(url,unicode):
url = url.encode('idna')
else:
url = url.decode('utf-8').encode('idna')
except:
pass
The above should e safe. The Python libraries are smart
enough to only
operate on the host portion of the URI. If the host portion
of the URI
does not have any high bit characters, nothing is done.
Also if the
input url is not valid utf-8 (a requirement for IRIs), then
again,
nothing is done.
>> Also a new bug report: the way the change to the
feed parser was made
>> causes it to not longer respect the default value
for xml:base.
>
> I will work on that. (Adding unit tests for these
changes to both
> feedparser and venus are on my list of things to do).
It occurs to me that no change to the feed parser is
required. The feed
parser is set up to handle arbitrary "file like"
objects - gotta love
duck typing. Here's a rough sketch:
data = StringIO(content)
setattr(data,'url',feed)
setattr(data,'headers',resp_headers)
feedparser.parse(data)
Of course, if httplib2 takes care of unzipping and
deflating, headers
like content-encoding may need to be removed lest the
feedparser tries
to unzip the results.
Another thing to care about in the case of redirects, is
that the url
property should be set to the value of the location header.
Again, it seems to me that such logic may benefit other
users of httplib2.
- Sam Ruby
--
devel mailing list
devel lists.planetplanet.org
http://lists.planetplanet.org/mailman/listinfo/devel
|