List Info

Thread: Assembly_contd




Assembly_contd
user name
2006-07-07 00:46:10
Hi; thanks very much for reply;

There seems to be two main streams in assembly, apart from
that of 
TIGR, one is CAP3 and the other one is PHRAP (http://www.phrap.org/). 
For running jobs more efficiently, there is also PaCE (see
Kalyanaraman 
et al. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 14, 
NO. 12, DECEMBER 2003) and probaly other tools/algorithms
also.

We do prefer the PHRAP assembly suite of tools. It is
available for 
several platforms, and apart from setting up a SOLARIS/SPARC
SMP for 
that purpose, we aimed to try it locally in a small Mac
cluster. I do 
not know if PHRAP is compatible with SGE and'll find out as
soon as the 
cluster's on by trying a small set of sequences for
assembly. PHRAP 
runs OK with smaller sets of sequences on a single Mac.

The qstat -f command outputs:

queuename                      qtype used/tot. load_avg arch
         states
------------------------------------------------------------
----------------
all.qmac2                  BIP   0/2       -NA-     -NA-  
       au
------------------------------------------------------------
----------------
all.qmac1   BIP   0/1       -NA-     -NA-          au
------------------------------------------------------------
----------------
all.qmac3                     BIP   0/2       -NA-    
-NA-          au


For the use of qmaster messages (many of them already!)
I'll try and 
find out what's what...

Being able to run the simple.sh will already be something,
thanks!

- François Fauteux

**********


Have not had time time to dig into this further but I'm
pulling these  
app names from notes I had taken during a recent
conversation about  
assembly with someone ...

The person was a current heavy user of "CAP3" on
a 32GB Solaris/sparc  
system and was looking at a program called
"PCAP" as a way of running  
across a cluster since the 32GB memory machine was no longer
 
performing well on large assembly problems. Also mentioned
repeatedly  
as a possible parallel-and-low-memory-requirements
alternative was EULER

CAP3: http:/
/www.genome.org/cgi/content/full/9/9/868

PCAP and CAP3 seem to be from the same authors but the main
website  
cited by google seems to be down at the moment.

EULER looks pretty interesting and seems to live here:
http://nbcr.sdsc.edu/eule
r/


-Chris



On Jul 6, 2006, at 7:26 PM, Joe Landman wrote:

> Hi folks:
>
>   Was asked recently about genome assembly, and I gave
the answer  
> that Chris gave below.  What bugs me is that I haven't
followed the  
> assembly work for a while, and all I remember are the
TIGR tools.
>
>   Basically what I am asking is whether or not people
have built  
> assembly algorithms to run on smaller memory machines,
or do we  
> still need  large memory SMPs to do the job?  64GB and
up, or can  we 
> run some set of tools in under 16 GB on lots of cluster
nodes?
>
>   Thanks!
>
> Joe
>
> Chris Dagdigian wrote:
>> Hi François,
>> First off, what assembly program are you trying to
run on your  
>> cluster? Are you sure it is even capable of running
in parallel  
>> across many machines? Most people I know doing
assembly are doing  
>> it within a single large SMP system because shared
memory is  
>> easier/faster and (I think...) there is a relative
lack of "true  
>> parallel" assembly algorithms.
>> Here are some official grid engine helpful URLs:
>> - http://gridengine.sun
source.net (main site for the codebase)
>> - http://docs.
sun.com/app/docs/coll/1017.3  (official  documentation
site)
>> I also run a site at http://gridengine.info but
that may not be  
>> helpful until you are at least up and running.
>> Some specific suggestions for you and your current
setup:
>> (1) Ignore the 'qmon' GUI. You won't be using it
anyway with your  
>> assembler and it just gets in the way of the more
flexible command  
>> line programs. Stick with the unix binaries like
"qstat", "qrsh"  
>> and "qsub".   You won't be able to use
SGE to its fullest unless  
>> you are comfortable with the command line programs
>> (2) Send us (or me) the output of the command
"qstat -f" when run  
>> on your system. It may explain why you could not
run the simple.sh  
>> example job.
>> (3) Learn where your spool logs are, they will be
invaluable in  
>> debugging failures. The default location is
something along the  
>> lines of $SGE_ROOT/<cell>/spool/ -- in
particular you want to look  
>> at the last few lines of
"qmaster/messages", "qmaster/schedd/ 
>> messages" and any messages files belonging to
exec hosts that are  
>> not behaving.
>> Regards,
>> Chris
>> On Jul 6, 2006, at 4:42 PM, francois.fauteux2mail.mcgill.ca wrote:
>>> Hi;
>>>
>>> I am totally new to grid computing. I recently
tried to run some  
>>> sequence assembly process on a G5 (8Gb RAM) but
the process did  
>>> require more memory.
>>>
>>> I installed N1SGE6 on 3 MACs G5 under 10.4.7
(connected trough a  
>>> router) (alltogheter 13Gb RAM) and I would like
to run the  
>>> assembly process in parallel trough the cluster
hoping that  memory 
>>> resources would be sufficient for the process
to complete.
>>>
>>> I would appreciate hints as to
"for-dummies-fast-how-to"  configure 
>>> the cluster / submit the job properly.
>>>
>>> I installed master and hosts with defaults
settings. First try  
>>> with examples/simple.sh returns (w. qmon):
>>> No free slots for interactive job!
>>> while 5 PCUs are available.
>>>
>>> Any hint as to how to properly configure the
cluster/project/ 
>>> queues/parallel environments; or to use qsub
with usefull options  
>>> -for a fast getting started- would be greatly
appreciated; thanks.
>>>
>>> François
>>>
>>> _______________________________________________
>>> Bioclusters maillist  -  Bioclustersbioinformatics.org
>>> https://bioinformatics.org/mailman/listinfo/bioclusters
>> _______________________________________________
>> Bioclusters maillist  -  Bioclustersbioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landmanscalableinformatics.com
> web  : http://www.scalabl
einformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615
> _______________________________________________
> Bioclusters maillist  -  Bioclustersbioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

_______________________________________________
Bioclusters maillist  -  Bioclustersbioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters


_______________________________________________
Bioclusters maillist  -  Bioclustersbioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
Assembly_contd
user name
2006-07-07 10:57:17
Hello,

This is your problem:

On Jul 6, 2006, at 8:46 PM, francois.fauteux2mail.mcgill.ca wrote:

> The qstat -f command outputs:
>
> queuename                      qtype used/tot. load_avg
 
> arch          states
>
------------------------------------------------------------
---------- 
> ------
> all.qmac2                  BIP   0/2       -NA-     -NA-  
       au
>
------------------------------------------------------------
---------- 
> ------
> all.qmac1   BIP   0/1       -NA-     -NA-          au
>
------------------------------------------------------------
---------- 
> ------
> all.qmac3                     BIP   0/2       -NA-     - 
> NA-          au


The reason you can't run jobs is that you have no available
job  
slots. The reason you have no job slots is because Grid
Engine may  
not be running on your three systems - or if it is running
it is  
having firewall, routing or nameserver issues.

The main indication here is the "au" entry in
the state column for  
each of your queue instances. State "au" means
'alarm + unreachable'  
or 'alarm + unheard' and it means that the SGE qmaster
process has  
not been receiving periodic state and staus reports from the
 
sge_execd daemons running on the compute nodes.

On working clusters this almost always means that SGE is
simply not  
running on the cluster node and the fix is to simply restart
SGE on  
the nodes in question.

Not sure about the root cause on your system, since this is
a new  
install this could also be an artifact of a configuration
problem or  
installation issue. Typically this would be caused by a
firewall  
blocking ports that SGE uses, a routing issue or (very very
common)  
some sort of hostname or DNS lookup issue.

Hopefully this is just a "sge is not running"
issue -- to check this,  
login to one of the compute nodes and do a "ps ax |
grep sge" command  
-- you should at least see a "sge_execd" daemon
running on each  
compute node. If you don't see this, simply run the SGE
startup  
script and redo the "qstat -f" command. If SGE
starts up OK you will  
see the "au" status dissapear and you will see
real numbers instead  
of "-NA-".





_______________________________________________
Bioclusters maillist  -  Bioclustersbioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
Assembly_contd
user name
2006-07-07 01:50:29
I have a wrapper for phrap that will take sequences and
assemble them  
in batches in parallel using the SGE.

The wrapper is here:
http://
phage.sdsu.edu/~rob/software/pPhrap.pl

and it uses the Schedule::SGE interface available from CPAN,
and you  
will need to supply phrap, of course.

This was written for a specific assembly problem, but I
think it may  
work for others. Basically it takes a fasta file and quality
scores  
file, and assembles that in user defined subset of
sequences. Then it  
takes all those sequences, and can assemble those too. The
key is to  
randomize the input order of the sequences each time.

Usually disclaimers apply about using with caution, not
guaranteed  
under any circumstances, the assemblies may be wrong, etc
etc. But it  
may work and they may not be 

Rob



On Jul 6, 2006, at 5:46 PM, francois.fauteux2mail.mcgill.ca wrote:

> Hi; thanks very much for reply;
>
> There seems to be two main streams in assembly, apart
from that of
> TIGR, one is CAP3 and the other one is PHRAP (http://www.phrap.org/).
> For running jobs more efficiently, there is also PaCE
(see  
> Kalyanaraman
> et al. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 14,
> NO. 12, DECEMBER 2003) and probaly other
tools/algorithms also.
>
> We do prefer the PHRAP assembly suite of tools. It is
available for
> several platforms, and apart from setting up a
SOLARIS/SPARC SMP for
> that purpose, we aimed to try it locally in a small Mac
cluster. I do
> not know if PHRAP is compatible with SGE and'll find
out as soon as  
> the
> cluster's on by trying a small set of sequences for
assembly. PHRAP
> runs OK with smaller sets of sequences on a single Mac.

>

_______________________________________________
Bioclusters maillist  -  Bioclustersbioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )