On Tuesday 04 April 2006 21:44, Mock, George wrote:
> Perl-XML Experts,
>
> I'm parsing large XML files, so I have to use an event
based parser.
> Sticking with SAX parsers for now. I have found some
really good
> articles on using SAX parsers. They helped me write a
filter that
> prunes one tag, and all its related data and sub-tags,
from a large
> XML file. Great! Now I need to do something more
sophisticated.
>
> Here is a simple XML fragment ...
>
> <ofd>
> <dwarf>
> <section>
> ... many tags later ...
> <name>fred</name>
> ...
> </section>
> <section>
> ...
> <name>wilma</name>
> ...
> </section>
> <section>
> ...
> <name>pebbles</name>
> ...
> </section>
> ...
> </dwarf>
> </ofd>
>
> I want to keep the section named "wilma"
and prune the rest of them.
> These sections are very large, thus somehow buffering
in memory is not
> an option.
>
> The decision to keep or prune a section cannot be made
until the name
> is seen. But by the time the name is seen, many parts
of the section
> have already been processed. It seems I need to
somehow lookahead
> and know the name of section just as the
<section> tag is being
> processed.
> I can't figure out how to do that.
>
> I'd appreciate any pointers to articles, tutorials,
whatever, that do
> something similar with SAX parsers.
>
> One limitation is that I cannot change the XML (like
add a name
> attribute to <section>). It is auto-generated
from a tool that has
> already been released.
>
> A more unfortunate limitation ... I am limited to
modules that can
> easily be installed via "ppm". Why? I'd
be happy to tell you. But
> it is pretty boring, and there is nothing I can do
about it. Don't
> worry with this limitation for the moment. Just know
this could be
> why I may not be able to use your favorite module.
Hi Mr. Mock!
Well, the only idea I came up with is to process the XML
twice using SAX.
First keep a count of the sections and see in what section
number you get the
right <name> tag. Then, start over and once you reach
the right section
number process it, and keep the parts of it that you need.
Using this
solution, you don't need anything except SAX.
Regards,
Shlomi Fish
------------------------------------------------------------
---------
Shlomi Fish shlomif iglu.org.il
Homepage: http://www.shlomifish.org/
95% of the programmers consider 95% of the code they did not
write, in the
bottom 5%.
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|