List Info

Thread: Bug 5643: Cosmo crashes after multiple large event PUTs




Bug 5643: Cosmo crashes after multiple large event PUTs
user name
2006-04-18 23:46:39
Hello,
After an number of fixes and wild goose chases this bug can
now be 
considered fixed.

There were a number of issues that need resolving.

First the Java VM heap size was too small. It is now set to
500megs via 
the -Xms500m -Xmx500m arguments to the VM.

Second the perm gen space of the VM where Java stores its
internal 
reference to Classes was way to small.
This was the initial reason for the crash. It is now set at
128megs with 
a max size of 256megs via the -XX:PermSize=128m
-XX:MaxPermSize=256m 
arguments to the VM. The memory size of the perm gen space
can probably 
be reduced. More testing is needed to find the right
combination.


After those two issues were resolved the VM still crashed
consistently 
at around the 800th event added to a collection.

Puzzling?

After changing the garbage collector heuristics the VM
started reporting 
an IOException: too many open files.

This results when the number of open file-handles exceeds
the number 
allowed by the user. In this case it was 1024.

What was weird was that this issue only applied to the Sun
JVM. Running 
Cosmo under Jrockit on qacosmo did not result in a crash.

If it really was an open file-handle issue then both VM's
should have 
reported an error since file-handles are managed by the
Operating System.

As a test, the number of file-handles allowed was increased
to 4096 from 
1024.

Thanks to bear who put in a good deal of effort to configure
qacosmo to 
raise the ulimit for file-handles. It is not a trivial task.

Well that test failed 

Cosmo still crashed at around the 800th event added to a
calendar 
collection. So the issue was not file-handles after all.

I looked at the core dump logs from all the Cosmo crashes
and even 
though some had exceeded memory limits and others had too
many files 
open there was one constant between them. This line:

Current CompileTask:
opto:1623      
org.apache.lucene.index.IndexReader$1.doBody()Ljava/lang/Obj
ect; (99 bytes)
  
Typing
"org.apache.lucene.index.IndexReader$1.doBody()Ljava/l
ang/Object" 
in to Google revealed that others were also seeing this
crashing issue 
with Lucene under the Sun JVM.

It turns out there is a bug in the 1.5 Sun JVM that causes
it to crash 
when optimizing (compiling) certain Classes in a background
thread. The 
Lucene IndexReader is one of them. The crash is unique to
the 1.5.0 
release and is a known bug at Sun.

So what is the work around?

I found two ways to prevent Cosmo from crashing.

First, by passing the "-Xcomp -Xbatch" arguments
to the VM which tell it 
to optimize 100% all native compiles and to do so in the
foreground 
thread. The VM by default does native compile optimizations
in the 
background thread.

This fixed the issue but resulted in very long startup times
in excess 
of one minute. The JVM was doing all the Class level
optimizations 
immediately at startup instead of iteratively while the
server was running.

The second way was much harder to figure out but produced
the results 
desired.

-XX:CompileCommand=exclude,org/apache/lucene/index/IndexRead
er$1,doBody

This command tells the VM not to compile the IndexReader$1
classes 
doBody method.
With this flag the crash goes away.

So putting it all together here is how Cosmo now needs to be
started:

/home/osafuser/jre1.5.0_06/bin/java -XX:PermSize=128m 
-XX:MaxPermSize=256m -Xms500m -Xmx500m 
-XX:CompileCommand=exclude,org/apache/lucene/index/IndexRead
er$1,doBody 
-server -Dcom.sun.management.jmxremote
-Dical4j.unfolding.relaxed=true 
-Dderby.stream.error.file=logs/derby.log
-Dderby.infolog.append=true 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogMa
nager 
-Djava.endorsed.dirs=/home/osafuser/cosmo/tomcat/common/endo
rsed 
-classpath 
:/home/osafuser/cosmo/tomcat/bin/bootstrap.jar:/home/osafuse
r/cosmo/tomcat/bin/commons-logging-api.jar 
-Dcatalina.base=/home/osafuser/cosmo/tomcat 
-Dcatalina.home=/home/osafuser/cosmo/tomcat 
-Djava.io.tmpdir=/home/osafuser/cosmo/tomcat/temp 
org.apache.catalina.startup.Bootstrap start 

I have changed the .bashrc arguments for the osafuser on
qacosmo to 
leverage these new VM commands and am marking this bug
fixed.

JAVA_HOME=/home/osafuser/jre1.5.0_06
export JAVA_HOME

GC_OPTS=''
#GC_OPTS='-verbose:gc -XX:+PrintGCTimeStamps
-XX:+PrintGCDetails 
-XX:+PrintTenuringDistribution '

COMPILE_OPTS=''
#COMPILE_OPTS='-XX:+PrintCompilation
-XX:CICompilerCount=1'

FIX_OPTS=''
#FIX_OPTS='-Xcomp -Xbatch'
FIX_OPTS='-XX:CompileCommand=exclude,org/apache/lucene/inde
x/IndexReader$1,doBody'


JAVA_OPTS="-XX:PermSize=128m -XX:MaxPermSize=256m
-Xms500m -Xmx500m 
$ $ $"
export JAVA_OPTS    


We still need to provide some user documentation detailing
the 
workaround for this Sun JVM bug.
According to Sun the issue will be resolved in the 1.5.0_07
release.




--Brian


Brian Kirsch -  Cosmo Developer / Chandler
Internationalization Engineer
Open Source Applications Foundation
543 Howard St. 5th Floor
San Francisco, CA 94105
http://www.osafoundation
.org



John Townsend wrote:

> Hi Brian,
>
> Any news on this bug? I am hoping that we can put out a
release  
> candidate of Cosmo 0.3 by the middle of this week.
>
> Thanks,
> --> John
>
_______________________________________________
cosmo-dev mailing list
cosmo-devlists.osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/cosmo-d
ev
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )