List Info

Thread: Re: beautifulSoup and .next iteration




Re: beautifulSoup and .next iteration
country flaguser name
United States
2007-04-13 20:29:27
> anchors = soup.findAll('a', { 'name' :
re.compile('^A.*$')})
> for x in anchors:
>    print x
>    x = x.next
>    while getattr(x, 'name') != 'a':
>      print x

> And get into endless loops. I can't help thinking there
are simple and 
> obvious ways to do this, probably many, but as a rank
beginner, they are 
> escaping me.


Hi Jon,

Whenever I hear "infinite loop", I look for
"while" loops.  There's 
something funky with the while loop in the above code.

     while getattr(x, 'name') != 'a':
         print x

If we assume that the test of while loop ever succeeds, then
there's a 
problem: when does the test ever fail?  Nothing in the body
of the for 
loop does anything to change the situation.  So that part
doesn't quite 
work.  So, for the moment, strip out the while loop.  Let's
simplify the 
behavior so that it only shows the anchors:

############################################################

anchors = soup.findAll('a', { 'name' :
re.compile('^A.*$')})
for anchor in anchors:
     print anchor
############################################################


This shouldn't raise any infinite loops.



>From your question, it sounds like you want to get a
list of the sibling 
elements after each particular anchor.  The documentation
at:

http://www.cr
ummy.com/software/BeautifulSoup/documentation.html#nextSibli
ng%20and%20previousSibling

doesn't make this as clear as I'd like, but what you want is
probably not 
the 'next' attribute of an object, but a 'nextSibling'
attribute.


You might find the following definitions helpful:

############################################################
#
def get_siblings_to_next_anchor(anchor):
     """Anchor Tag -> element list

     Given an anchor element, returns all the nextSiblings
elements up to
     (but not including) the next anchor as a list of either
Tags or
     NavigatableStrings."""

     elt = anchor.nextSibling
     results = []
     while (elt != None) and (not is_anchor(elt)):
         results.append(elt)
         elt = elt.nextSibling
     return results


def is_anchor(elt):
     """element -> boolean
     Returns true if the element is an anchor
Tag."""

     if isinstance(elt, NavigableString):
         return False
     else:
         return elt.name == 'a'
############################################################
#

They should help you get the results you want.


Good luck!
_______________________________________________
Tutor maillist  -  Tutorpython.org
http://
mail.python.org/mailman/listinfo/tutor

Re: beautifulSoup and .next iteration
country flaguser name
United States
2007-04-15 13:41:15
Daniel,

It was kind of you to respond, and your response was a model
of clarity. 
You correctly surmised from my awkward framing of the
question, that what 
I wanted was a list of sibling elements between one named
anchor and the 
next. My problem was, in part, that I still don't think in
terms of 
functional programming, thus, the function defs you proposed
are very 
helpful models for me. I do, however, still need to work out
how to make 
is_anchor() return true only if the anchor name attribute
satisfies a 
given regex. Starting with your model, I'm sure I'll be able
to figure 
this out.

Thanks so much for your time!

Jon

>
> You might find the following definitions helpful:
>
>
############################################################
#
> def get_siblings_to_next_anchor(anchor):
>    """Anchor Tag -> element list
>
>    Given an anchor element, returns all the
nextSiblings elements up to
>    (but not including) the next anchor as a list of
either Tags or
>    NavigatableStrings."""
>
>    elt = anchor.nextSibling
>    results = []
>    while (elt != None) and (not is_anchor(elt)):
>        results.append(elt)
>        elt = elt.nextSibling
>    return results
>
>
> def is_anchor(elt):
>    """element -> boolean
>    Returns true if the element is an anchor
Tag."""
>
>    if isinstance(elt, NavigableString):
>        return False
>    else:
>        return elt.name == 'a'
>
############################################################
#
_______________________________________________
Tutor maillist  -  Tutorpython.org
http://
mail.python.org/mailman/listinfo/tutor

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )