|
List Info
Thread: RE: EMPTY_DOC thread stability issues
|
|
| RE: EMPTY_DOC thread stability issues |

|
2007-09-04 16:50:17 |
Hi All,
Joining this thread on Vinh's recommendation.
This issue I believe stems from both
htt
ps://issues.apache.org/jira/browse/XERCESJ-911
where large amounts of Muse use this functionality.
Regarding the
ThreadLocal solution it doesn't work to cache documents
themeselves.
The sdk is very clear that only DocumentFactory.getInstance
is thread
safe. You can use ThreadLocal with DocumentBuilders
however. (new
DocumentBuilders is quite expensive, but new documents are
much cheaper)
The solution below (combined with TLS DocumentBuilders)
should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from
xerces. I
read in the rest of the thread that Vinh had a simple test
case with
multiple threads. I'm happy to do a quick refactor in the
code to put
these two approaches into the code base, if you could send
me the test
case Vinh, I'll get started on it tomorrow.
NB whilst this "should work" it is still against
the spirit of xerces
which is share nothing. I'd also note that due to the
nodecache
optimisation within Xerces, using NodeLists its just not
possible to be
thread safe even for reads.
copied from muse-user:
Hi,
>From what I could work out, from within the list
comments and the code,
the state is stored in the Document itself, and as cloneNode
uses
Object.clone and then sets the doc it won't work. Using
importNode
helps a little (as it uses
getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.
getAllElements just does the same, calls getChildNodes and
then forces
the cache to be used. Deleting the cache just stops the
null for the
parent, it doesn't stop incorrect nodes being returned or
race
conditions with other nulls.
The simple thing is to stop using getChildNodes, from what I
can see in
the code there isn't a need for it. The only place I've
seen that
doesn't require all of the nodes anyway is in
EndpointReference's
getNumberOfParameters, but that behaviour can be safely
cached (its not
used directly in the project anyway).
Looking further at the use cases in Muse only the
IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes()
on the
document node, for it to force that synchronizeChildren be
called (its
cached from then on in each node). Then every other piece
of code can
simply pointer chase with the
getFirstChild()/getNextSibling() approach.
No synchronization required.
re using other jaxp's, the DOM itself makes no statement
about even read
thread safety. All of the jaxp impls suffer some form of
threading
problem. Considering all of the problems with fighting
against
namespace problems (much worse IMO) it makes sense to stick
with the
devil you know :-<.
Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for
the
getChildNodes problem. Its a shame that the xerces guys are
very much
against any form of thread safety (except application
enforced). Going
with the standard approach the only safe thing is to always
serialize to
objects / keep the strings around, which would overly
complicate the
code.
I'm willing to give it a try and send you patched libs to
try out (I
don't have a test case for this yet) if its quick to
reproduce, just let
me know. If it works out I can raise a jira with the
patches.
cheers,
Chris
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
|
|
| RE: EMPTY_DOC thread stability issues |
  United States |
2007-09-05 02:51:53 |
Hi all,
After doing more testing, the issue is correctly because of
the Xerces
limitation.
Both a Document and a Node/Element are *NOT* thread-safe.
The following shows that Document.importNode() is not
thread-safe:
17:26:49,730 ERROR [STDERR] java.lang.NullPointerException
17:26:49,730 ERROR [STDERR] at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
17:26:49,730 ERROR [STDERR] at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.addressing.EndpointReference.<init>
(EndpointReference
.java:186)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.setProduc
erReference(SimpleNotificationMessage.java:209)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManag
er.publish(S
impleSubscriptionManager.java:256)
The following shows that reading the children of a
Node/Element is not
thread-safe:
00:16:53,400 ERROR [STDERR] java.lang.NullPointerException
00:16:53,400 ERROR [STDERR] at
org.apache.xerces.dom.ParentNode.nodeListItem(Unknown
Source)
00:16:53,400 ERROR [STDERR] at
org.apache.xerces.dom.ParentNode.item(Unknown Source)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:883)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:815)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:791)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:974)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:977)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:940)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.toXML(Sim
pleNotificationMessage.java:291)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.toXML(Sim
pleNotificationMessage.java:239)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.remote.NotificationConsumerC
lient.notify
(NotificationConsumerClient.java:97)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManag
er.publish(S
impleSubscriptionManager.java:267)
To work around these problem, we should follow these rules:
1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes.
2) Do not access the children of a Node/Element from
multiple threads.
3) Do not create a Node/Element with its owner as
XmlUtils.EMPTY_DOC, if
the node's children will be accessed.
4) Do not create a Node/Element with its owner as
XmlUtils.EMPTY_DOC, if
the node itself can be accessed from multiple threads.
This most likely will affect a lot of Muse code. As Chris
pointed out,
Oliver's ThreadLocal solution for the EMPTY_DOC won't work.
EMPTY_DOC
can still be used, but any code which uses it must follow
the rules
above. Also, the DocumentBuilderFactory can't be a
singleton either
since it's also not thread-safe.
NOTIFICATION FIXES:
The following updates finally fix the multi-thread
notifications issue
for me:
1) Updated EndpointReference.java. Changed all
XmlUtils.EMPTY_DOC
references to XmlUtil.createDocument(). This is because a
resource's
EPR and its Element representation can be accessed from
multiple
threads, so all rules above apply.
2) Updated SimpleNotificationMessage.toXML(). Changed
XmlUtil.EMPTY_DOC
reference to XmlUtil.createDocument(). This is because a
root Element
is created and XmlUtils.getAllNamespaces(root) is called on
it, so rule
#3 apply.
3) Updated
NotificationConsumerClient.notify(NotificationMessage[]
messages) to not assume that messages[x].toXML() will return
an Element
whose owner is EMPTY_DOC.
Changed these lines:
Element notify =
XmlUtils.createElement(WsnConstants.NOTIFY_QNAME);
for (int n = 0; n < messages.length; ++n)
notify.appendChild(messages[n].toXML(), true);
To these lines:
Document doc = XmlUtils.createDocument();
Element notify = XmlUtils.createElement(doc,
WsnConstants.NOTIFY_QNAME);
for (int n = 0; n < messages.length; ++n)
notify.appendChild(doc.importNode(messages[n].toXML(),
true));
So now, I no longer get any exceptions when generating
notifications
simultaneously from multiple producers.
BUT, now I have another problem: notifications are still
being lost
somehow, yet no errors appear at all! I have to investigate
this
further to see what's causing this problem. If anyone
encounters this
problem, please post to the group
-----Original Message-----
From: Chris.Twiner swisscom.com [mailto:Chris.Twiner swisscom.com]
Sent: Tuesday, September 04, 2007 2:50 PM
To: muse-dev ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues
Hi All,
Joining this thread on Vinh's recommendation.
This issue I believe stems from both
htt
ps://issues.apache.org/jira/browse/XERCESJ-911
where large amounts of Muse use this functionality.
Regarding the
ThreadLocal solution it doesn't work to cache documents
themeselves.
The sdk is very clear that only DocumentFactory.getInstance
is thread
safe. You can use ThreadLocal with DocumentBuilders
however. (new
DocumentBuilders is quite expensive, but new documents are
much cheaper)
The solution below (combined with TLS DocumentBuilders)
should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from
xerces. I
read in the rest of the thread that Vinh had a simple test
case with
multiple threads. I'm happy to do a quick refactor in the
code to put
these two approaches into the code base, if you could send
me the test
case Vinh, I'll get started on it tomorrow.
NB whilst this "should work" it is still against
the spirit of xerces
which is share nothing. I'd also note that due to the
nodecache
optimisation within Xerces, using NodeLists its just not
possible to be
thread safe even for reads.
copied from muse-user:
Hi,
>From what I could work out, from within the list
comments and the code,
the state is stored in the Document itself, and as cloneNode
uses
Object.clone and then sets the doc it won't work. Using
importNode
helps a little (as it uses
getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.
getAllElements just does the same, calls getChildNodes and
then forces
the cache to be used. Deleting the cache just stops the
null for the
parent, it doesn't stop incorrect nodes being returned or
race
conditions with other nulls.
The simple thing is to stop using getChildNodes, from what I
can see in
the code there isn't a need for it. The only place I've
seen that
doesn't require all of the nodes anyway is in
EndpointReference's
getNumberOfParameters, but that behaviour can be safely
cached (its not
used directly in the project anyway).
Looking further at the use cases in Muse only the
IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes()
on the
document node, for it to force that synchronizeChildren be
called (its
cached from then on in each node). Then every other piece
of code can
simply pointer chase with the
getFirstChild()/getNextSibling() approach.
No synchronization required.
re using other jaxp's, the DOM itself makes no statement
about even read
thread safety. All of the jaxp impls suffer some form of
threading
problem. Considering all of the problems with fighting
against
namespace problems (much worse IMO) it makes sense to stick
with the
devil you know :-<.
Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for
the
getChildNodes problem. Its a shame that the xerces guys are
very much
against any form of thread safety (except application
enforced). Going
with the standard approach the only safe thing is to always
serialize to
objects / keep the strings around, which would overly
complicate the
code.
I'm willing to give it a try and send you patched libs to
try out (I
don't have a test case for this yet) if its quick to
reproduce, just let
me know. If it works out I can raise a jira with the
patches.
cheers,
Chris
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
|
|
| RE: EMPTY_DOC thread stability issues |

|
2007-09-05 05:56:11 |
Hi Vinh,
Could you add your test to the Jira so we can download and
help out?
cheers,
Chris
-----Original Message-----
From: Vinh Nguyen (vinguye2) [mailto:vinguye2 cisco.com]
Sent: Wednesday, September 05, 2007 9:52 AM
To: muse-dev ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues
Hi all,
After doing more testing, the issue is correctly because of
the Xerces
limitation.
Both a Document and a Node/Element are *NOT* thread-safe.
The following shows that Document.importNode() is not
thread-safe:
17:26:49,730 ERROR [STDERR] java.lang.NullPointerException
17:26:49,730 ERROR [STDERR] at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
17:26:49,730 ERROR [STDERR] at
org.apache.xerces.dom.CoreDocumentImpl.importNode(Unknown
Source)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.addressing.EndpointReference.<init>
(EndpointReference
.java:186)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.setProduc
erReference(SimpleNotificationMessage.java:209)
17:26:49,730 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManag
er.publish(S
impleSubscriptionManager.java:256)
The following shows that reading the children of a
Node/Element is not
thread-safe:
00:16:53,400 ERROR [STDERR] java.lang.NullPointerException
00:16:53,400 ERROR [STDERR] at
org.apache.xerces.dom.ParentNode.nodeListItem(Unknown
Source)
00:16:53,400 ERROR [STDERR] at
org.apache.xerces.dom.ParentNode.item(Unknown Source)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:883)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:815)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllElements(XmlUtils.ja
va:791)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:974)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:977)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.util.xml.XmlUtils.getAllNamespaces(XmlUtils.
java:940)
00:16:53,400 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.toXML(Sim
pleNotificationMessage.java:291)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleNotificationMessa
ge.toXML(Sim
pleNotificationMessage.java:239)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.remote.NotificationConsumerC
lient.notify
(NotificationConsumerClient.java:97)
00:16:53,415 ERROR [STDERR] at
org.apache.muse.ws.notification.impl.SimpleSubscriptionManag
er.publish(S
impleSubscriptionManager.java:267)
To work around these problem, we should follow these rules:
1) Do not use XmlUtils.EMPTY_DOC in multi-thread processes.
2) Do not access the children of a Node/Element from
multiple threads.
3) Do not create a Node/Element with its owner as
XmlUtils.EMPTY_DOC, if
the node's children will be accessed.
4) Do not create a Node/Element with its owner as
XmlUtils.EMPTY_DOC, if
the node itself can be accessed from multiple threads.
This most likely will affect a lot of Muse code. As Chris
pointed out,
Oliver's ThreadLocal solution for the EMPTY_DOC won't work.
EMPTY_DOC
can still be used, but any code which uses it must follow
the rules
above. Also, the DocumentBuilderFactory can't be a
singleton either
since it's also not thread-safe.
NOTIFICATION FIXES:
The following updates finally fix the multi-thread
notifications issue
for me:
1) Updated EndpointReference.java. Changed all
XmlUtils.EMPTY_DOC
references to XmlUtil.createDocument(). This is because a
resource's
EPR and its Element representation can be accessed from
multiple
threads, so all rules above apply.
2) Updated SimpleNotificationMessage.toXML(). Changed
XmlUtil.EMPTY_DOC
reference to XmlUtil.createDocument(). This is because a
root Element
is created and XmlUtils.getAllNamespaces(root) is called on
it, so rule
#3 apply.
3) Updated
NotificationConsumerClient.notify(NotificationMessage[]
messages) to not assume that messages[x].toXML() will return
an Element
whose owner is EMPTY_DOC.
Changed these lines:
Element notify =
XmlUtils.createElement(WsnConstants.NOTIFY_QNAME);
for (int n = 0; n < messages.length; ++n)
notify.appendChild(messages[n].toXML(), true);
To these
lines:
Document doc = XmlUtils.createDocument();
Element notify = XmlUtils.createElement(doc,
WsnConstants.NOTIFY_QNAME);
for (int n = 0; n < messages.length; ++n)
notify.appendChild(doc.importNode(messages[n].toXML(),
true));
So now, I no longer get any exceptions when generating
notifications
simultaneously from multiple producers.
BUT, now I have another problem: notifications are still
being lost
somehow, yet no errors appear at all! I have to investigate
this
further to see what's causing this problem. If anyone
encounters this
problem, please post to the group
-----Original Message-----
From: Chris.Twiner swisscom.com [mailto:Chris.Twiner swisscom.com]
Sent: Tuesday, September 04, 2007 2:50 PM
To: muse-dev ws.apache.org
Subject: RE: EMPTY_DOC thread stability issues
Hi All,
Joining this thread on Vinh's recommendation.
This issue I believe stems from both
htt
ps://issues.apache.org/jira/browse/XERCESJ-911
where large amounts of Muse use this functionality.
Regarding the
ThreadLocal solution it doesn't work to cache documents
themeselves.
The sdk is very clear that only DocumentFactory.getInstance
is thread
safe. You can use ThreadLocal with DocumentBuilders
however. (new
DocumentBuilders is quite expensive, but new documents are
much cheaper)
The solution below (combined with TLS DocumentBuilders)
should resolve
the issue with both WRONG_DOCUMENTs and the 911 issue from
xerces. I
read in the rest of the thread that Vinh had a simple test
case with
multiple threads. I'm happy to do a quick refactor in the
code to put
these two approaches into the code base, if you could send
me the test
case Vinh, I'll get started on it tomorrow.
NB whilst this "should work" it is still against
the spirit of xerces
which is share nothing. I'd also note that due to the
nodecache
optimisation within Xerces, using NodeLists its just not
possible to be
thread safe even for reads.
copied from muse-user:
Hi,
>From what I could work out, from within the list
comments and the code,
the state is stored in the Document itself, and as cloneNode
uses
Object.clone and then sets the doc it won't work. Using
importNode
helps a little (as it uses
getFirstChild()/getNextSibling()), but it
just puts the problem to a later stage.
getAllElements just does the same, calls getChildNodes and
then forces
the cache to be used. Deleting the cache just stops the
null for the
parent, it doesn't stop incorrect nodes being returned or
race
conditions with other nulls.
The simple thing is to stop using getChildNodes, from what I
can see in
the code there isn't a need for it. The only place I've
seen that
doesn't require all of the nodes anyway is in
EndpointReference's
getNumberOfParameters, but that behaviour can be safely
cached (its not
used directly in the project anyway).
Looking further at the use cases in Muse only the
IsolationLayer
(because of the DeferredImpl) needs to call hasChildNodes()
on the
document node, for it to force that synchronizeChildren be
called (its
cached from then on in each node). Then every other piece
of code can
simply pointer chase with the
getFirstChild()/getNextSibling() approach.
No synchronization required.
re using other jaxp's, the DOM itself makes no statement
about even read
thread safety. All of the jaxp impls suffer some form of
threading
problem. Considering all of the problems with fighting
against
namespace problems (much worse IMO) it makes sense to stick
with the
devil you know :-<.
Again for most of the xerces releases using the
getFirstChild()/getNextSibling() is a seamless dropin for
the
getChildNodes problem. Its a shame that the xerces guys are
very much
against any form of thread safety (except application
enforced). Going
with the standard approach the only safe thing is to always
serialize to
objects / keep the strings around, which would overly
complicate the
code.
I'm willing to give it a try and send you patched libs to
try out (I
don't have a test case for this yet) if its quick to
reproduce, just let
me know. If it works out I can raise a jira with the
patches.
cheers,
Chris
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
------------------------------------------------------------
---------
To unsubscribe, e-mail: muse-dev-unsubscribe ws.apache.org
For additional commands, e-mail: muse-dev-help ws.apache.org
|
|
[1-3]
|
|