I'd make use of temp files:
Write everything up until the first <section> to one
temp file.
Starting with the fist <section>, write to another.
While you're writing this you will discover whether it's
the right section ... if it's not, just delete the file and
wait for the next <section>.
When you hit the </dwarf> tag, concatenate the first
two files and then append all the remaining XML to this
file.
It's not fancy, but it will get the job done.
Forrest
not speaking for merrill corporation
> -----Original Message-----
> From: perl-xml-bounces listserv.ActiveState.com
> [mailto:perl-xml-bounces listserv.ActiveState.com]
On Behalf
> Of Mock, George
> Sent: Tuesday, April 04, 2006 1:44 PM
> To: perl-xml listserv.ActiveState.com
> Subject: lookahead pruning with event-based parsers
>
> Perl-XML Experts,
>
> I'm parsing large XML files, so I have to use an event
based parser.
> Sticking with SAX parsers for now. I have found some
really
> good articles on using SAX parsers. They helped me
write a
> filter that prunes one tag, and all its related data
and
> sub-tags, from a large XML file. Great! Now I need to
do
> something more sophisticated.
>
> Here is a simple XML fragment ...
>
> <ofd>
> <dwarf>
> <section>
> ... many tags later ...
> <name>fred</name>
> ...
> </section>
> <section>
> ...
> <name>wilma</name>
> ...
> </section>
> <section>
> ...
> <name>pebbles</name>
> ...
> </section>
> ...
> </dwarf>
> </ofd>
>
> I want to keep the section named "wilma"
and prune the rest of them.
> These sections are very large, thus somehow buffering
in
> memory is not an option.
>
> The decision to keep or prune a section cannot be made
until
> the name is seen. But by the time the name is seen,
many
> parts of the section have already been processed. It
seems I
> need to somehow lookahead and know the name of section
just
> as the <section> tag is being processed.
> I can't figure out how to do that.
>
> I'd appreciate any pointers to articles, tutorials,
whatever,
> that do something similar with SAX parsers.
>
> One limitation is that I cannot change the XML (like
add a
> name attribute to <section>). It is
auto-generated from a
> tool that has already been released.
>
> A more unfortunate limitation ... I am limited to
modules
> that can easily be installed via "ppm".
Why? I'd be happy
> to tell you. But it is pretty boring, and there is
nothing I
> can do about it. Don't worry with this limitation for
the
> moment. Just know this could be why I may not be able
to use
> your favorite module.
>
> Many thanks!
>
> -George
>
>
> _______________________________________________
> Perl-XML mailing list
> Perl-XML listserv.ActiveState.com
> To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
>
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|