Grid admin guide
De e-Ciencia
Guide for some common issues of the IFCA Grid site. Also visit the Troubleshooting
Tabla de contenidos |
Yaim VO configurator
https://cic.gridops.org/yaimtool/yaimtool.py
Enable SWEtest VO
For more information have a look at http://swevo.ific.uv.es/vo/swetest/vo-swetest.html
Steps taken to give support at IFCA:
users.conf
23110:swetest001:23110:swetest:swetest:: 23111:swetest002:23110:swetest:swetest:: 23112:swetest003:23110:swetest:swetest:: 23113:swetest004:23110:swetest:swetest:: 23114:swetest005:23110:swetest:swetest:: 23115:swetest006:23110:swetest:swetest:: 23116:swetest007:23110:swetest:swetest:: 23117:swetest008:23110:swetest:swetest:: 23118:swetest009:23110:swetest:swetest:: 23119:swetest010:23110:swetest:swetest:: 23120:swetest011:23110:swetest:swetest:: 23121:swetest012:23110:swetest:swetest:: 23122:swetest013:23110:swetest:swetest:: 23123:swetest014:23110:swetest:swetest:: 23124:swetest015:23110:swetest:swetest:: 23125:swetest016:23110:swetest:swetest:: 23126:swetest017:23110:swetest:swetest:: 23127:swetest018:23110:swetest:swetest:: 23128:swetest019:23110:swetest:swetest:: 23129:swetest020:23110:swetest:swetest:: 23500:swetestsgm:23500,23110:swetestsgm,swetest:swetest:sgm:
site-info.def
VOS:"... swetest ..." (...) VO_SWETEST_USERS="ldap://swevo.ific.uv.es/ou=lcg1,o=swetest,dc=swe,dc=lcg,dc=org" VO_SWETEST_VOMS_SERVERS="'vomss://swevo.ific.uv.es:8443/voms/swetest?/swetest/'" VO_SWETEST_SGM="ldap://swevo.ific.uv.es/ou=swadmin,o=swetest,dc=swe,dc=lcg,dc=org" VO_SWETEST_QUEUES="swetest" VO_SWETEST_SW_DIR=$VO_SW_DIR/swetest VO_SWETEST_DEFAULT_SE=$SE_HOST VO_SWETEST_STORAGE_DIR=$CLASSIC_STORAGE_DIR/swetest VO_SWETEST_VOMS_EXTRA_MAPS="" VO_SWETEST_VOMSES="'swetest swevo.ific.uv.es 14000 /C=ES/O=DATAGRID-ES/O=IFIC/CN=swevo.ific.uv.es swetest'" VO_SWETEST_VOMS_POOL_PATH="" SWETEST_GROUP_ENABLE="swetest"
groups.conf
"/VO=swetest/GROUP=/swetest/ROLE=lcgadmin":::sgm: "/VO=swetest/GROUP=/swetest/ROLE=production":::prd: "/VO=swetest/GROUP=/swetest":::
- Download
http://swevo.ific.uv.es/vo/files/swevo.ific.uv.es-oct2006.peminto/etc/grid-security/vomsdir - and reconfigure your implied nodes.
Update the certificates
When the certificate of a given machines is being updated, it is necesary not only to install it on /etc/grid-security but also on /opt/glite/var/rgma/.certs/hostcert.pem (MON, CE, SE) and on /etc/tomcat5/ (MON).
EGEE and Int.EU.Grid interoperability
See Int.eu.grid and EGEE Interoperability at IFCA guide.
Accounting data export using NFS
We need to export /var/spool/pbs_priv/accounting from our PBS server (torque00) to our CEs (gridce01 and ingrid02 (these names are temporary!!!!)):
On torque00 edit /etc/exports and add:
/var/spool/pbs/server_priv/ gridce01.ifca.es(ro,no_root_squash) /var/spool/pbs/server_priv/ ingrid02.ifca.es(ro,no_root_squash)
On both CEs edit /etc/fstab and add
torque00.ifca.es:/var/spool/pbs/server_priv/ /var/spool/pbs/server_priv/ nfs ro 0 0
Maybe you should create /var/spool/pbs/server_priv/ if it doesn't exist.
Passwordless SSH between CE/BATCH_SYSTEM and WNs
On the CE/BATCH_SYSTEM:
1. The following parameters have to be active in /etc/ssh/sshd_config:
IgnoreUserKnownHosts yes HostbasedAuthentication yes
2. Create the file /opt/edg/etc/edg-pbs-shostsequiv.conf from the template:
cp /opt/edg/etc/edg-pbs-shostsequiv.conf.template /opt/edg/etc/edg-pbs-shostsequiv.conf
3. Edit /opt/edg/etc/edg-pbs-shostsequiv.conf setting the hostnames of the CE, BATCH_SYSTEM (in case it is separated from the CE) and SE in the NODES parameter
4. Delete /etc/ssh/shosts.equiv file (if it exists):
rm -f /etc/ssh/shosts.equiv
5. Run the following command to create the file /etc/ssh/shosts.equiv:
/opt/edg/sbin/edg-pbs-shostsequiv
6. Create the file /opt/edg/etc/edg-pbs-knownhosts.conf from the template:
cp /opt/edg/etc/edg-pbs-knownhosts.conf.template /opt/edg/etc/edg-pbs-knownhosts.conf
7. Edit /opt/edg/etc/edg-pbs-knownhosts.conf with the following information:
NODES <hostnames of the CE, BATCH_SYSTEM (in case it is separated from the CE) and SE> KEYTYPES = rsa1,rsa,dsa KNOWNHOSTS = /etc/ssh/ssh_known_hosts
8. Delete /etc/ssh/ssh_known_hosts file (if it exists):
rm -f /etc/ssh/ssh_known_hosts
9. Run the following command to create the file /etc/ssh/ssh_known_hosts:
/opt/edg/sbin/edg-pbs-knownhosts
On the WNs:
Passwordless between CE/BATCH_SYSTEM and WNs:
- Make sure every WN fulfills step 1) and then follow the steps 6) to 9) from the CE/BATCH_SYSTEM SSH passwordless configuration
Passwordless between WNs:
- Edit/create
/etc/ssh/shosts.equivwith the CE, BATCH_SYSTEM (in case it is separated from CE) and SE hostnames
Testing
- Follow the instructions to check communication between nodes without asking for any password:
1. Log as non-root user:
su - cms001
2. Try to ssh to the same account in the node you want to check:
ssh gridce01
You should not be asked for any password
Get your nodes monitored with GridIce
In some cases, like new WN installation with gLite 3.1 under SLC4, gridice sensor packages are not included by default. This results in no monitoring of those nodes with GridIce tool.
To avoid this situation, the following RPM packages must be installed on your nodes:
- edg-fabricMonitoring-2.5.4-4
- gridice-sensor-1.6.0-23
Let's do it (get this files from CERN repository):
rpm -ivh http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/RPMS.Release3.0/edg-fabricMonitoring-2.5.4-4.i386.rpm
rpm -ivh http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/RPMS.externals/gridice-sensor-1.6.0-23.i386.rpm
Note: at the moment of this installation we needed to install a dependecy package for edg-fabricMonitoring. This was compat-libstdc++ library:
rpm -ivh http://linuxsoft.cern.ch/cern/slc4X/i386/SL/RPMS/compat-libstdc++-296-2.96-132.7.2.i386.rpm
Now it's time to configure the service that acts as the GridIce collector on this node. As a first step we need to place the daemon configuration file (edg-fmon-agent) in the path /opt/edg/var/etc. To accomplish this copy the template file, located in GridIce directory, to the path above:
cp /opt/gridice/monitoring/etc/edg-fmon-agent.conf-wn.template /opt/edg/var/etc/edg-fmon-agent.conf
Note: gLite 3.1 WN package does not create the etc directory in /opt/edg/var/ path. If this is your case, you must manually create the directory:
mkdir /opt/edg/var/etc
Edit /opt/edg/var/etc/edg-fmon-agent.conf file by replacing LEMON-COLLECTOR with the complete hostname of your collector node (in our case the MonBox).
The second step in our node configuration deals with the services that will be monitored on this node. We have to define those services in a file called gridice-role-<NODE_TYPE>.cfg, where <NODE_TYPE> is substituted for the type of your node functionallity (worker-node, ce-access-node, se-access-node, ...). Go to /opt/gridice/monitoring/etc directory and check the default services file specification for your node and once revised make the following symbolic link:
ln -s /opt/gridice/monitoring/etc/gridice-role-worker-node.cfg /opt/gridice/monitoring/etc/gridice-role.cfg
Note: please check that every service definition contains the characters ^[\w,\/,-] like in the following example:
fmon-agent ^[\w,\/,-]*fmon-agent
Now we are ready to start edg-fmon-agent and configure the start of this daemon:
service edg-fmon-agent start
chkconfig edg-fmon-agent on
How to switch to another VOMS server?
- Go to
/opt/glite/etc/vomsesand edit the VO-related config file to use the VOMS server you want to.
- Internal Note: look for valid config files in
workspace/configs/voms.
