BAS Main Index
  [Science]   [BAS home]   [Met home]   [Beowulf home] Antarctic Meteorology 


Propagating the environment in MPP

The model requires numerous environmental variables to be set in order to run. In the "standard" version, the file ...parexe... is written, containing the setup of the environment, followed by execution of the executable. Spiffy. On a cray, and who knows under scali/myrinet, presumably that works. Under mpich, it doesn't: jobs are simply rsh'd onto remote nodes, and *don't* pick up a copy of the environment of their spawning process.

Andy Heaps and I solved this independantly. His version (available where?) makes PE0 send the env variables to the other processors. My version makes all the processors read their env from a file. This file has to be created first, and in an irritating wrinkle, the rsh'd jobs don't even start up in the same directory so the file needs a global name.

So my solution is to compile the job, then run (STEP=4) it. This puts all the appropriate control files in all the right places (mostly in tmp), but the run crashes because the env variables file is missing. So, once its crashed, you need to:

  1. in mods-for-the-executable in the umui, include wmc-set-env.f and wmc-set-env.c (one is a fortran mod, one c; guess which...). Get these from modsets page.
  2. go to the compile directory (usually a subdir of DATAW); untar the source; edit the file-created-by-my-mod called wsetenv.f and change the filename there (called something/test-env-RUNID) to whatever name you plan to use
  3. make -f makefile.compile (recompiles wsetenv.f, and rebuilds libum.a)
  4. make -f makefile.link (rebuilds exec)
  5. go to DATAW, and convert the *parexe* into the env vars file:

       make-test-env.pl *parexe* > ~/test-env-RUNID
    

    using whatever name you used in editing wsetenv.f. If you're only doing one run at once, you can just leave wsetenv.f unedited and skip the recompile steps.

  6. Run the model under mpich as "usual":
       mpirun -np n -machinefile mf -nolocal RUNID.exe 1>one 2>two </dev/null &
    
    Note that this has the slight advantage that *you* can run it by hand instead of from within a script.
  7. This works fine for NRUNs. CRUNs are a bit of a pain; you have to remember to save the temphist file.

And there we are. Did you use it? Did it work for you? Yes/No, let me know... wmc@bas.ac.uk

Past last modified: 16/4/2002   /   wmc@bas.ac.uk

© Copyright Natural Environment Research Council - British Antarctic Survey 2001