List Info

Thread: LAM: caused collective abort of all ranks




Re: LAM: caused collective abort of all ranks
country flaguser name
Germany
2008-02-14 14:07:49
On Thu, 14 Feb 2008, fahad saeed wrote:

> node1 may run --------> ./binary -in file1 -out
file1-output
> node2 may run --------> ./binary -in file2 -out
file2-output

This is very much not MPI, has nothing to do with message
passing.

You have to look for a batch/queueing system like Torque,
SGE, SLURM, 
etc. Some of them (SGE for some time, Torque in development)
have 
support for running job arrays, concept which fits very well
with your 
description above (same job with different inputs and
outputs). But 
even if you don't use job arrays, they offer a lot more
options to let 
you decide where and what to run.

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg,
Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescuiwr.uni-heidelberg.de
_______________________________________________
This list is archived at http://www.l
am-mpi.org/MailArchives/lam/

Re: LAM: caused collective abort of all ranks
country flaguser name
United States
2008-02-14 14:12:43
Thanks alot.

>; Date: Thu, 14 Feb 2008 21:07:49 +0100
> From: Bogdan.Costescuiwr.uni-heidelberg.de
> To: lamlam-mpi.org
> Subject: Re: LAM: caused collective abort of all ranks
>
> On Thu, 14 Feb 2008, fahad saeed wrote:
>;
> > node1 may run --------> ./binary -in file1 -out file1-output
> > node2 may run --------> ./binary -in file2 -out file2-output
>
> This is very much not MPI, has nothing to do with message passing.
>
> You have to look for a batch/queueing system like Torque, SGE, SLURM,
> etc. Some of them (SGE for some time, Torque in development) have
> support for running job arrays, concept which fits very well with your
> description above (same job with different inputs and outputs). But
> even if you don't use job arrays, they offer a lot more options to let
> you decide where and what to run.
>
> --
> Bogdan Costescu
>
> IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
> Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
> E-mail: bogdan.costescuiwr.uni-heidelberg.de
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/


Helping your favorite cause is as easy as instant messaging. You IM, we give. Learn more.
Re: LAM: caused collective abort of all ranks
country flaguser name
United States
2008-02-14 14:27:13
Thanks alot.

>; Date: Thu, 14 Feb 2008 21:07:49 +0100
> From: Bogdan.Costescuiwr.uni-heidelberg.de
> To: lamlam-mpi.org
> Subject: Re: LAM: caused collective abort of all ranks
>
> On Thu, 14 Feb 2008, fahad saeed wrote:
>;
> > node1 may run --------> ./binary -in file1 -out file1-output
> > node2 may run --------> ./binary -in file2 -out file2-output
>
> This is very much not MPI, has nothing to do with message passing.
>
> You have to look for a batch/queueing system like Torque, SGE, SLURM,
> etc. Some of them (SGE for some time, Torque in development) have
> support for running job arrays, concept which fits very well with your
> description above (same job with different inputs and outputs). But
> even if you don't use job arrays, they offer a lot more options to let
> you decide where and what to run.
>
> --
> Bogdan Costescu
>
> IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
> Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
> E-mail: bogdan.costescuiwr.uni-heidelberg.de
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/


Helping your favorite cause is as easy as instant messaging. You IM, we give. Learn more.
[1-10] [11-13]

about | contact  Other archives ( Real Estate discussion Medical topics )