XMM
De e-Ciencia
Tabla de contenidos |
Support to Users of XMM
Currently the group of people running on the Grid infrastructure is made of the following people
- Xavier Barcons
- Francisco Carrera
- Maite Ceballos
- José Ramón Rodón
- Francesca Panessa
- Rodrigo Gil-Merino
- Jacobo Enebro
- Amalia Corral
- Angel Ruiz
If you are a member of the XMM group and want to join the group have a look at this
[General Support Documentation]
first, and then contact Isabel Campos.
Submission of Serial Batch Jobs
In what follows we will describe the submission of serial batch jobs to the Grid resources at IFCA whatever the project is. The example contains a restriction line which forces the job to end up on our machines locally, both at the level of CPU and at the level of Storage.
Advanced users can try to remove this restriction and go the Grid.
- Simple Job without Storage Elements (transferred data under 20 MB)
To submit a serial job to the grid batch queues of IFCA the job has to be described in the language of the Grid batch system, this is called Job Description Language (JDL). This is an example:
# Mandatory attributes
Executable = "myexe";
StdOutput = "myexe.out";
StdError = "myexe.err";
# I/O files to be staged from/to the User Interface
InputSandbox = {"myexe","input.dat"};
OutputSandbox = {"myexe.out","myexe.err","myoutput.dat"};
Requirements= other.GlueCEUniqueID == "egeece01.ifca.es:2119/jobmanager-lcgpbs-planck";
This is the description of the job:
1. Executes the binary myexe with needs the file input.dat to work.
2. Writes the STDOUT and STDERR in myexe.out and myexe.err
3. The output of the program is myoutput.dat
The size of data which can be carried on SandBoxes is limited to a few Mbytes. Evidently there are few real scientific applications that can work on such basic premises. However we strongly suggest users to try and succeed dummy tests like this before getting to more complicated scenarios.
- Using the Storage Elements for massive Input/Output
In most real applications one needs to deal with data sizes that do not fit in the SandBox limitation. The way to proceed is then based on three steps
1. Store in the Grid in the apropriate place the Input data our program needs to run
Every VO, in particular planck has a dedicated directory for storage under /grid In order to access that directory first export the variable LFC_HOST to be at the central host for file catalogs of egee, and then use the following commands
[rodon@egeeui01 i2g-test]$ export LFC_HOST=lfcserver.cnaf.infn.it [rodon@egeeui01 i2g-test]$ lfc-ls /grid/planck ceballos rodon corral
Create a directory to store input and protect it acording to your needs. For example, you might consider to stored all your maps in a directory like XMMDATA and let read access to every planck member
[rodon@egeeui01 i2g-test]$ lfc-mkdir /grid/planck/XMMDATA [rodon@egeeui01 i2g-test]$ lfc-chmod 744 /grid/planck/dummy [rodon@egeeui01 i2g-test]$ lfc-ls -l /grid/planck/ drwxr--r-- 0 102 106 0 Feb 15 16:15 XMMDATA
Now let us copy a big data file from your home directory in the User Interface to this directory
lcg-cr --vo planck -l lfn:/grid/planck/rodon/mytarball.tar file:///home/rodon/mytarball.tar
the file mytarball.tar will be accesible for Jobs running on the grid when referencing it as lfn:/grid/planck/rodon/mytarball.tar
2. Design a shell script that used inside a JDL script makes the input available to the job, and puts the output available on a Storage Element once the Job has completed.
This in principle implies to copy the files we need before the job executes from a Storage Element to the actual node that is carrying the calculation. When the job completes, one option is to tar/gzip the result and copy it to a Storage Element. This is an example script which would do the job
cat my-script.sh #!/bin/sh # Debug info echo \| Execution start: `date` \| Host: `hostname` \| User: `whoami` \| Path: `pwd` \| #Let us first get the input of the simulation in place lcg-cp --vo planck lfn:/grid/planck/XMMDATA/data.tar.gz file:///tmp/map.tar.gz #Let us create a directory there (it is not mandatory!) mkdir RUN_febrero15 cp /tmp/map.tar.gz RUN_febrero15/. cp executable RUN_febrero15/. cd RUN_febrero15/ tar xzvf map.tar.gz rm map.tar.gz ./executable #Once here the program has ended, we tar the result and put it on an accesible place cd $HOME tar czvf run_febrero15.tgz RUN_febrero15/* lcg-cr --vo planck -l lfn:/grid/planck/rodon/run_febrero_15.tgz file://$HOME/run_febrero15.tgz # Debug info echo \| Execution end: `date` \|
The result is now accesible from the User Interface where the user is logged, and can be retrieve by the command lcg-cp
lcg-cp --vo planck lfn:/grid/planck/rodon/run_febrero15.tgz file://$HOME/gridruns/run_febrero15.tgz
Finally, this would be the JDL description of this job. Notice we have replaced the executable by the name of the script, my-script.sh
# Mandatory attributes
Executable = "my-script.sh";
StdOutput = "my-script.out";
StdError = "my-script.err";
# Environment variables
Environment = {"LFC_HOST=lfcserver.cnaf.infn.it"};
# I/O files to be staged from/to the User Interface
InputSandbox = {"my-script.sh","executable"};
OutputSandbox = {"my-script.out","my-script.err"};
Requirements= other.GlueCEUniqueID == "eggeece01.ifca.es:2119/jobmanager-lcgpbs-ifusion";
The command edg-job-submit will submit this job to the IFCA resources and use our File Catalog
as reference for file stanging.
Submitting long jobs
An important point to consider is that the proxy of the user that has submitted the job has to be valid all the period that the job is supossed to queue and to run.
The initialization of the proxy is made by default for 12 hours. It is possible though to initialize the proxy for a longer period.
However the elegant solution to avoid that the job dies because the user proxy has expired is to use the mechanism of automatic job renovation by a proxy server. In order to use this facility one has to add to do the following
#Create and Store a long term proxy on the proxy server [rodon@egeeui01 i2g-test]$ myproxy-init -s i2gpx01.ifca.es -d -n Your identity: /C=ES/O=DATAGRID-ES/O=BIFI/CN=Jose_Ramon_Rodon_Ortiz Enter GRID pass phrase for this identity: Creating proxy .......................................... Done Proxy Verify OK Your proxy is valid until: Thu Feb 22 17:12:25 2007 A proxy valid for 168 hours (7.0 days) for user /C=ES/O=DATAGRID-ES/O=BIFI/CN=Jose_Ramon_Rodon_Ortiz now exists on i2gpx01.ifca.es.
The commanda myproxy-init has created a one-week valid proxy on the IFCA proxy server, i2gpx.ifca.es Next the user has to a line in the JDL job description informing the job about the location of the proxy
MyProxyServer=i2gpx.ifca.es
Once the job is done, destroy that long proxy, because in general it is not safe to have long proxies impersonating us over the network.
[rodon@egeeui01 i2g-test]$ myproxy-destroy -s i2gpx01.ifca.es -d Default MyProxy credential for user /C=ES/O=DATAGRID-ES/O=BIFI/CN=Jose_Ramon_Rodon_Ortiz was successfully removed.
Submission of Parallel MPI Batch Jobs
Dealing with Storage Elements
Currently the following Storage Elements are available to the planck Virtual Organization
[rodon@egeeui01 XMM]$ lcg-infosites --vo planck se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 105471484 4780676 n.a se-ieg.bifi.unizar.es 193471324 923487 n.a se.i2g.cesga.es 30540000000 7440000000 n.a dpm.cyf-kr.edu.pl 1131455168 122804 n.a i2gse01.ifca.es 1926285861 6972 n.a dcache01.lip.pt 72982596 32840 n.a i2g-se01.lip.pt
By default the Storage Element employed is the one at IFCA. A different one can be specified using the
flag '-d
lcg-cr --vo planck -d se.i2g.cesga.es -l lfn:/grid/planck/rodon/testing file:///home/rodon/FH/FeynHiggs
