List Info

Thread: XML::SAX::PurePerl misses some CDATA end ]]> sequence




XML::SAX::PurePerl misses some CDATA end ]]> sequence
user name
2006-05-10 15:59:48
PurePerl reads buffered input.  If a CDATA end
sequence "]]>" is split between two read
buffers, it's
missed.

I attach a patch.

The same issue likely effects comment end sequences
--> and possibly other areas, but this patch is
limited to fixing the CDATA issue.

--
Matthew van Eerde

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection
around 
http://mail.yahoo.com
--- PurePerl.pm.original	2006-05-09 17:06:37.000000000 -0700
+++ PurePerl.pm	2006-05-09 17:07:49.000000000 -0700
 -308,22
+308,45 
     
     $self->start_cdata({});
     
+    my $split_end = "";
     $data = $reader->data;
     while (1) {
         $self->parser_error("EOF looking for CDATA
section end", $reader)
             unless length($data);
         
+        # watch out for ]]> split between data segments
+        if ($split_end ne "")
+        {
+            if ($split_end eq "]" and $data =~
/^\]>/) {
+                $reader->move_along(2);
+                last;
+            }
+
+            if ($split_end eq "]]" and $data =~
/^>/) {
+                $reader->move_along(1);
+                last;
+            }
+
+            # rescue false positives on real ] and ]]
+            $self->characters({ Data => $split_end
});
+            $split_end = "";
+        }
+
         if ($data =~ /^(.*?)\]\]>/s) {
             my $chars = $1;
             $reader->move_along(length($chars) + 3);
             $self->characters({Data => $chars});
             last;
         }
-        else {
-            $self->characters({Data => $data});
-            $reader->move_along(length($data));
-            $data = $reader->data;
+
+        $reader->move_along(length($data)); # do move
past ] or ]] in the reader
+
+        if ($data =~ s/(\]\]?)\z//s) {
+            $split_end = $1;
         }
+
+        $self->characters({Data => $data}); # don't
write ] or ]] (yet) to the writer
+        $data = $reader->data;
     }
     $self->end_cdata({});
     return 1;
_______________________________________________
Perl-XML mailing list
Perl-XMLlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
XML::SAX::PurePerl misses some CDATA end ]]> sequence
user name
2006-05-11 09:04:02
On Wednesday 10 May 2006 18:59, Matthew van Eerde wrote:
> PurePerl reads buffered input.  If a CDATA end
> sequence "]]>" is split between two read
buffers, it's
> missed.
>
> I attach a patch.
>
> The same issue likely effects comment end sequences
> --> and possibly other areas, but this patch is
> limited to fixing the CDATA issue.
>

Very nice. Just a few comments:

1. You should probably file a bug for the patch in the
module's Request 
Tracker:

http://rt.cpan.org/Public/Dist/Display.html?Nam
e=XML-SAX-PurePerl

That way, it won't be forgotten or lost in confusion.

2. It would be a good idea to supply a regression test
script that will fail 
on the bug before the patch, and succeed after the patch is
applied.

I should note that the maintainer of this module is very
busy, and has many 
open bugs in his modules. It might be a good idea to suggest
him to make you 
a co-maintainer of this module.

Regards,

	Shlomi Fish

------------------------------------------------------------
---------
Shlomi Fish      shlomifiglu.org.il
Homepage:        http://www.shlomifish.org/


95% of the programmers consider 95% of the code they did not
write, in the
bottom 5%.
_______________________________________________
Perl-XML mailing list
Perl-XMLlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )