List Info

Thread: xmlwriter, attributes and encoding




xmlwriter, attributes and encoding
user name
2006-08-15 23:06:48
When using xmlwriter attributes , document encoding that has
been set is 
not passed to xmlAttrSerializeTxtContent, so character
references are 
written when needed.
The xmlDocPtr passed to xmlAttrSerializeTxtContent is only
used to check 
encoding, so I was wondering if the code in the attached
patch is fine 
to fix this (it fakes a document so any changes within 
xmlAttrSerializeTxtContent to manipulate the passed document
could 
possibly blow up - though i dont see any need/reason to ever
do this).

Rob

Index: xmlwriter.c
============================================================
=======
RCS file: /cvs/gnome/gnome-xml/xmlwriter.c,v
retrieving revision 1.36
diff -c -r1.36 xmlwriter.c
*** xmlwriter.c	27 Apr 2006 08:10:25 -0000	1.36
--- xmlwriter.c	11 Aug 2006 20:02:29 -0000
***************
*** 1425,1432 ****
                      break;
                  case XML_TEXTWRITER_ATTRIBUTE:
                      buf = NULL;
!                    
xmlAttrSerializeTxtContent(writer->out->buffer, NULL,
!                                                NULL,
content);
                      break;
  		default:
  		    break;
--- 1425,1441 ----
                      break;
                  case XML_TEXTWRITER_ATTRIBUTE:
                      buf = NULL;
! 					
!                     if (writer->out->encoder) {
!                         xmlDoc doc;
!                         memset(&doc, 0,
sizeof(xmlDoc));
!                         doc.encoding = (xmlChar
*)writer->out->encoder->name;
!                        
xmlAttrSerializeTxtContent(writer->out->buffer,
&doc,
!                                                    NULL,
content);
!                     } else {
!                        
xmlAttrSerializeTxtContent(writer->out->buffer, NULL,
!                                                    NULL,
content);
!                     }
                      break;
  		default:
  		    break;
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
xmlwriter, attributes and encoding
user name
2006-08-16 08:49:40
On Tue, Aug 15, 2006 at 07:06:48PM -0400, Rob Richards
wrote:
> When using xmlwriter attributes , document encoding
that has been set is 
> not passed to xmlAttrSerializeTxtContent, so character
references are 
> written when needed.
> The xmlDocPtr passed to xmlAttrSerializeTxtContent is
only used to check 
> encoding, so I was wondering if the code in the
attached patch is fine 
> to fix this (it fakes a document so any changes within 
> xmlAttrSerializeTxtContent to manipulate the passed
document could 
> possibly blow up - though i dont see any need/reason to
ever do this).

  Hum, that's a bit nasty, because none of the fields in
the document
structure are initialized (except encoding), so it
internally use an
API based not on the API description but on its
implementation. If at
some point someone add something like entity lookup to
xmlAttrSerializeTxtContent or just check the document type,
suddently
the writer would crash in an unexpected way. 
  So fields should at least be zero initialized, which is
probably
a bit costly if done for every single attribute emitted. So
while passing
a document might be a good idea, it's better to create and
initialize that
document when getting the xmlTextWriterPtr. Maybe adding a
doc in 
struct _xmlTextWriter and making sure it's at least
minimally initialized
and properly freed with the xmlTextWriter is a good idea.
That sounds
far more clean to me, but a bit more work indeed 

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
xmlwriter, attributes and encoding
user name
2006-08-16 11:54:49
Daniel Veillard wrote:
> On Tue, Aug 15, 2006 at 07:06:48PM -0400, Rob Richards
wrote:
>   
>> When using xmlwriter attributes , document encoding
that has been set is 
>> not passed to xmlAttrSerializeTxtContent, so
character references are 
>> written when needed.
>> The xmlDocPtr passed to xmlAttrSerializeTxtContent
is only used to check 
>> encoding, so I was wondering if the code in the
attached patch is fine 
>> to fix this (it fakes a document so any changes
within 
>> xmlAttrSerializeTxtContent to manipulate the passed
document could 
>> possibly blow up - though i dont see any
need/reason to ever do this).
>>     
>
>   Hum, that's a bit nasty, because none of the fields
in the document
> structure are initialized (except encoding), so it
internally use an
> API based not on the API description but on its
implementation. If at
> some point someone add something like entity lookup to
> xmlAttrSerializeTxtContent or just check the document
type, suddently
> the writer would crash in an unexpected way. 
>   So fields should at least be zero initialized, which
is probably
> a bit costly if done for every single attribute
emitted. So while passing
> a document might be a good idea, it's better to create
and initialize that
> document when getting the xmlTextWriterPtr. Maybe
adding a doc in 
> struct _xmlTextWriter and making sure it's at least
minimally initialized
> and properly freed with the xmlTextWriter is a good
idea. That sounds
> far more clean to me, but a bit more work indeed 
>
> Daniel
>   

Was tying to keep the change as minimal as possible, so how
about this one.
A doc has been added to the struct and is always created
within 
xmlNewTextWriter(). This would at least provide a doc that
could be used 
for this type of thing, rather than checking if one exists
and having to 
create one if needed at different points within the
xmlwriter API. This 
doc is also completely independent of any doc that may exist
within a 
context.

Rob
Index: xmlwriter.c
============================================================
=======
RCS file: /cvs/gnome/gnome-xml/xmlwriter.c,v
retrieving revision 1.36
diff -c -r1.36 xmlwriter.c
*** xmlwriter.c	27 Apr 2006 08:10:25 -0000	1.36
--- xmlwriter.c	16 Aug 2006 10:14:40 -0000
***************
*** 91,96 ****
--- 91,97 ----
      char qchar;                 /* character used for
quoting attribute values */
      xmlParserCtxtPtr ctxt;
      int no_doc_free;
+ 	xmlDocPtr doc;
  };
  
  static void xmlFreeTextWriterStackEntry(xmlLinkPtr lk);
***************
*** 215,220 ****
--- 216,224 ----
                          "xmlNewTextWriter : out of
memory!\n");
          return NULL;
      }
+ 
+ 	ret->doc = xmlNewDoc(NULL);
+ 
      ret->no_doc_free = 0;
  
      return ret;
***************
*** 487,492 ****
--- 491,499 ----
          xmlFreeParserCtxt(writer->ctxt);
      }
  
+ 	if (writer->doc != NULL)
+ 		xmlFreeDoc(writer->doc);
+ 
      if (writer->ichar != NULL)
          xmlFree(writer->ichar);
      xmlFree(writer);
***************
*** 539,544 ****
--- 546,553 ----
      if (encoder != NULL) {
          writer->out->conv =
xmlBufferCreateSize(4000);
          xmlCharEncOutFunc(encoder,
writer->out->conv, NULL);
+ 		if (writer->doc)
+ 			writer->doc->encoding = xmlStrdup((xmlChar
*)writer->out->encoder->name);
      } else
          writer->out->conv = NULL;
  
***************
*** 1425,1432 ****
                      break;
                  case XML_TEXTWRITER_ATTRIBUTE:
                      buf = NULL;
!                    
xmlAttrSerializeTxtContent(writer->out->buffer, NULL,
!                                                NULL,
content);
                      break;
  		default:
  		    break;
--- 1434,1441 ----
                      break;
                  case XML_TEXTWRITER_ATTRIBUTE:
                      buf = NULL;
!                    
xmlAttrSerializeTxtContent(writer->out->buffer,
writer->doc,
!                                                    NULL,
content);
                      break;
  		default:
  		    break;
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
xmlwriter, attributes and encoding
user name
2006-08-16 12:14:37
On Wed, Aug 16, 2006 at 07:54:49AM -0400, Rob Richards
wrote:
> Daniel Veillard wrote:
> >On Tue, Aug 15, 2006 at 07:06:48PM -0400, Rob
Richards wrote:
> >  
> >>When using xmlwriter attributes , document
encoding that has been set is 
> >>not passed to xmlAttrSerializeTxtContent, so
character references are 
> >>written when needed.
> >>The xmlDocPtr passed to
xmlAttrSerializeTxtContent is only used to check 
> >>encoding, so I was wondering if the code in the
attached patch is fine 
> >>to fix this (it fakes a document so any changes
within 
> >>xmlAttrSerializeTxtContent to manipulate the
passed document could 
> >>possibly blow up - though i dont see any
need/reason to ever do this).
> >>    
> >
> >  Hum, that's a bit nasty, because none of the
fields in the document
> >structure are initialized (except encoding), so it
internally use an
> >API based not on the API description but on its
implementation. If at
> >some point someone add something like entity lookup
to
> >xmlAttrSerializeTxtContent or just check the
document type, suddently
> >the writer would crash in an unexpected way. 
> >  So fields should at least be zero initialized,
which is probably
> >a bit costly if done for every single attribute
emitted. So while passing
> >a document might be a good idea, it's better to
create and initialize that
> >document when getting the xmlTextWriterPtr. Maybe
adding a doc in 
> >struct _xmlTextWriter and making sure it's at
least minimally initialized
> >and properly freed with the xmlTextWriter is a good
idea. That sounds
> >far more clean to me, but a bit more work indeed

> >
> >Daniel
> >  
> 
> Was tying to keep the change as minimal as possible, so
how about this one.
> A doc has been added to the struct and is always
created within 
> xmlNewTextWriter(). This would at least provide a doc
that could be used 
> for this type of thing, rather than checking if one
exists and having to 
> create one if needed at different points within the
xmlwriter API. This 
> doc is also completely independent of any doc that may
exist within a 
> context.
> ***************
> *** 539,544 ****
> --- 546,553 ----
>       if (encoder != NULL) {
>           writer->out->conv =
xmlBufferCreateSize(4000);
>           xmlCharEncOutFunc(encoder,
writer->out->conv, NULL);
> + 		if (writer->doc)
> + 			writer->doc->encoding = xmlStrdup((xmlChar
*)writer->out->encoder->name);
>       } else
>           writer->out->conv = NULL;

  I would suggest to guard that a bit better as
    if ((writer->doc != NULL) &&
(writer->doc->encoding == NULL))

just in case writer->doc gets inherited in some way in
the future or if
a default encoding shows up.
Otherwise it looks fine to me, just run the regression tests
for good measure
before commiting, one never knows 

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )