Hello,
After an number of fixes and wild goose chases this bug can
now be
considered fixed.
There were a number of issues that need resolving.
First the Java VM heap size was too small. It is now set to
500megs via
the -Xms500m -Xmx500m arguments to the VM.
Second the perm gen space of the VM where Java stores its
internal
reference to Classes was way to small.
This was the initial reason for the crash. It is now set at
128megs with
a max size of 256megs via the -XX:PermSize=128m
-XX:MaxPermSize=256m
arguments to the VM. The memory size of the perm gen space
can probably
be reduced. More testing is needed to find the right
combination.
After those two issues were resolved the VM still crashed
consistently
at around the 800th event added to a collection.
Puzzling?
After changing the garbage collector heuristics the VM
started reporting
an IOException: too many open files.
This results when the number of open file-handles exceeds
the number
allowed by the user. In this case it was 1024.
What was weird was that this issue only applied to the Sun
JVM. Running
Cosmo under Jrockit on qacosmo did not result in a crash.
If it really was an open file-handle issue then both VM's
should have
reported an error since file-handles are managed by the
Operating System.
As a test, the number of file-handles allowed was increased
to 4096 from
1024.
Thanks to bear who put in a good deal of effort to configure
qacosmo to
raise the ulimit for file-handles. It is not a trivial task.
Well that test failed
Cosmo still crashed at around the 800th event added to a
calendar
collection. So the issue was not file-handles after all.
I looked at the core dump logs from all the Cosmo crashes
and even
though some had exceeded memory limits and others had too
many files
open there was one constant between them. This line:
Current CompileTask:
opto:1623
org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Obj
ect; (99 bytes)
Typing
"org.apache.lucene.index.IndexReader$1.doBody()Ljava/l
ang/Object"
in to Google revealed that others were also seeing this
crashing issue
with Lucene under the Sun JVM.
It turns out there is a bug in the 1.5 Sun JVM that causes
it to crash
when optimizing (compiling) certain Classes in a background
thread. The
Lucene IndexReader is one of them. The crash is unique to
the 1.5.0
release and is a known bug at Sun.
So what is the work around?
I found two ways to prevent Cosmo from crashing.
First, by passing the "-Xcomp -Xbatch" arguments
to the VM which tell it
to optimize 100% all native compiles and to do so in the
foreground
thread. The VM by default does native compile optimizations
in the
background thread.
This fixed the issue but resulted in very long startup times
in excess
of one minute. The JVM was doing all the Class level
optimizations
immediately at startup instead of iteratively while the
server was running.
The second way was much harder to figure out but produced
the results
desired.
-XX:CompileCommand=exclude,org/apache/lucene/index/IndexRead
er$1,doBody
This command tells the VM not to compile the IndexReader$1
classes
doBody method.
With this flag the crash goes away.
So putting it all together here is how Cosmo now needs to be
started:
/home/osafuser/jre1.5.0_06/bin/java -XX:PermSize=128m
-XX:MaxPermSize=256m -Xms500m -Xmx500m
-XX:CompileCommand=exclude,org/apache/lucene/index/IndexRead
er$1,doBody
-server -Dcom.sun.management.jmxremote
-Dical4j.unfolding.relaxed=true
-Dderby.stream.error.file=logs/derby.log
-Dderby.infolog.append=true
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogMa
nager
-Djava.endorsed.dirs=/home/osafuser/cosmo/tomcat/common/endo
rsed
-classpath
:/home/osafuser/cosmo/tomcat/bin/bootstrap.jar:/home/osafuse
r/cosmo/tomcat/bin/commons-logging-api.jar
-Dcatalina.base=/home/osafuser/cosmo/tomcat
-Dcatalina.home=/home/osafuser/cosmo/tomcat
-Djava.io.tmpdir=/home/osafuser/cosmo/tomcat/temp
org.apache.catalina.startup.Bootstrap start
I have changed the .bashrc arguments for the osafuser on
qacosmo to
leverage these new VM commands and am marking this bug
fixed.
JAVA_HOME=/home/osafuser/jre1.5.0_06
export JAVA_HOME
GC_OPTS=''
#GC_OPTS='-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution '
COMPILE_OPTS=''
#COMPILE_OPTS='-XX:+PrintCompilation
-XX:CICompilerCount=1'
FIX_OPTS=''
#FIX_OPTS='-Xcomp -Xbatch'
FIX_OPTS='-XX:CompileCommand=exclude,org/apache/lucene/inde
x/IndexReader$1,doBody'
JAVA_OPTS="-XX:PermSize=128m -XX:MaxPermSize=256m
-Xms500m -Xmx500m
$ $ $"
export JAVA_OPTS
We still need to provide some user documentation detailing
the
workaround for this Sun JVM bug.
According to Sun the issue will be resolved in the 1.5.0_07
release.
--Brian
Brian Kirsch - Cosmo Developer / Chandler
Internationalization Engineer
Open Source Applications Foundation
543 Howard St. 5th Floor
San Francisco, CA 94105
http://www.osafoundation
.org
John Townsend wrote:
> Hi Brian,
>
> Any news on this bug? I am hoping that we can put out a
release
> candidate of Cosmo 0.3 by the middle of this week.
>
> Thanks,
> --> John
>
_______________________________________________
cosmo-dev mailing list
cosmo-dev lists.osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/cosmo-d
ev
|