I2G-EGEE Node Installation
De e-Ciencia
EGEE
Worker Node (WN) Parallel Installation & Configuration Guide
This section lists the installation and configuration steps for EGEE Worker Nodes with gLite 3.1 running under Scientific Linux 4.5 in IFCA site.
Since this task usually involves the management of several machines, we will present a parallel version of Worker Nodes installation and configuration.
We will need a head node from where we will run every instruction that follows. In our case this machine is cabezon.ifca.es.
Concerning the machines that will act as Worker Nodes, they are 5 machines whose hostnames are ingrid01-...-ingrid05 (with the domain name ifca.es).
1. Operating System installation
gLite version 3.1 runs on Scientific Linux 4, so we have installed the most recent update , SLC 4.5, for i386 (32 bit) machines since it's the only supported architecture at the moment.
- OS installation repositories
The official repositories can be found in:
http://ftp.scientificlinux.org/linux/scientific/45/i386/
Besides, we run our own repository for OS installation:
http://toots.ifca.es/scientificlinuxcern/slc45/i386/
2. Setting the head node
- Define an environment variable with the hostname of the machines (Worker Nodes) we are about to install:
NODE_SET="ingrid01 ingrid02 ingrid03 ingrid04 ingrid05"
- Set SSH keys ready to work
Include all authorized keys in your OS installation so that you don't need to copy them.
In this case you only have to run the following commands in order to access the nodes without being asked for any password:
eval `ssh-agent` ssh-add
Test access to nodes:
for nodename in $NODE_SET ; do ssh $nodename "hostname" ; done
3. Firewall configuration
- Check your firewall open ports for the required services to be able to run properly. Some of this ports are:
- 22 (for SSH)
- 15001/15002/15003 (for Torque
pbs_momservice) - 30000:30101 (port range to allow callbacks for RFIO)
- We are going to overwrite the firewall definition (
/etc/sysconfig/iptables) in every node, so it must be useful to backup those files:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; cp -avf /etc/sysconfig/iptables /etc/sysconfig/iptables.0" ; done
- Make your firewall definition and copy to every node. In our case the file is
/root/FIREWALL/pro/iptables.wn:
for nodename in $NODE_SET ; do scp /root/FIREWALL/pro/iptables.wn $nodename:/etc/sysconfig/iptables ; done
- Restart the firewall in each node and configure it for automatic start:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; chkconfig iptables on ; service iptables stop ; service iptables start" ; done
- Finally check all nodes' firewall activity:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; iptables -L ; echo '---------------------------------'" ; done
4. Node Synchronization
To carry out this issue we need the latest version of ntp. If you don't have it try to obtain it with APT or YUM.
- Add the following lines at the end of the NTP configuration file (
/etc/ntp.conf):
restrict 130.206.3.166 mask 255.255.255.0 nomodify notrap noquery server hora.rediris.es
To accomplish this, run the following instruction (backup your default ntp.conf):
for nodename in $NODE_SET ; do ssh $nodename "hostname ; cp -a /etc/ntp.conf /etc/ntp.conf.0 ; echo -e \"restrict 130.206.3.166 mask 255.255.255.0 nomodify notrap noquery\nserver hora.rediris.es\" >> /etc/ntp.conf" ; done
Check the results of previous command:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; tail -2 /etc/ntp.conf" ; done
- Now we need to configure the service
ntpd:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; ntpdate hora.rediris.es ; chkconfig ntpd on ; service ntpd stop ; service ntpd start" ; done
Note that our time server is hora.rediris.es.
- Test correct NTP configuration:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; ntpq -p" ; done
5. Java installation
Download and install latest Java JDK version from Sun. As we can see in [1], this version has to be 1.5 or greater.
At IFCA site there is a repository for JAVA releases:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; rpm -ivh http://cabezon.ifca.es/java/jdk-1_5_0_12-linux-i586.rpm" ; done
6. Node installation with YUM
- Certification Authorities (CA)
Create a temporal /tmp/lcg-ca.wn.repo file with the content:
[CA] name=CAs baseurl=http://linuxsoft.cern.ch/LCG-CAs/current
Copy this file to the YUM path in each node:
for nodename in $NODE_SET ; do scp /tmp/lcg-ca.wn.repo $nodename:/etc/yum.repos.d/lcg-ca.repo ; done
Yum is slow, I prefer to use this repository with APT:
http://grid-deployment.web.cern.ch/grid-deployment/yaim/repos/lcg-CA.list
Install the latest CA rpms to upgrade the Worker Node CA list:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; yum install -y lcg-CA" ; done
- Node packages
We want to install Worker Nodes with Torque support as a client, so we will create a new file (temporal) with the location of those packages (/tmp/glite.wn.repo):
[glite-WN] name=gLite 3.1 Worker Node baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-WN/sl4/i386/ enabled=1 [glite-TORQUE_client] name=Torque clients baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-TORQUE_client/sl4/i386/ enabled=1
Copy /tmp/glite.wn.repo file to each node and rename it with /etc/yum.repos.d/glite.repo:
for nodename in $NODE_SET ; do scp /tmp/glite.wn.repo $nodename:/etc/yum.repos.d/glite.repo ; done
And now we are able to install them with YUM:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; yum install -y glite-WN glite-TORQUE_client" ; done
Note: by the time we installed these nodes, we obtained the following error when we ran the previous instruction:
Processing Dependency: log4j >= 1.2.8 for package: glError: Missing Dependency: log4j >= 1.2.8 is needed by package glite-rgma-stubs-servlet-java
When you get java dependency errors like the one above, you should check that your jpackage repository is correctly set. Take a look to the Generic Installation and Configuration Guide of your gLite version at https://twiki.cern.ch/twiki/bin/view/LCG/LcgDocs.
7. Node configuration with YAIM
- Copy YAIM site configuration file
Create/copy site-info.def, users.conf, groups.conf, wn-list.conf and vo.d directory in /root/INSTALL/EGEE/yaim/siteinfo/ and check site-info.def variables in order to properly configure the nodes:
YAIM_VERSION=3.1.0-3 JAVA_LOCATION="/usr/java/jdk1.5.0_12" LCG_REPOSITORY="'baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.1/'" REPOSITORY_TYPE="yum"
Besides of that it is necessary to include an extra variable, BATCH_SERVER, to avoid errors in TORQUE_client configuration. It's value is the batch system's hostname:
BATCH_SERVER=torque00.$MY_DOMAIN
Create /root/INSTALL in each node:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; mkdir -v /root/INSTALL" ; done
Now copy the whole directory (with the YAIM files) to every node:
for nodename in $NODE_SET ; do scp -r /root/INSTALL/EGEE $nodename:/root/INSTALL ; done
- Run YAIM configuration
With the following instruction:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; /opt/glite/yaim/bin/yaim -c -s /root/INSTALL/GRID/yaim/siteinfo/site-info_glite31_WN_GRID_070724_175500.def -n glite-WN -n TORQUE_client" ; done
Note that with the previous instruction we will run a serial configuration for the nodes. We can run simultaneously configuration with:
for nodename in $NODE_SET ; do ( ssh $nodename "hostname ; /opt/glite/yaim/bin/yaim -c -s /root/INSTALL/GRID/yaim/siteinfo/site-info_glite31_WN_GRID_070724_175500.def -n glite-WN -n TORQUE_client" &) ; done
8. SSH Configuration
See Passwordless SSH section in Grid Administration Guide.
9. Troubleshooting
- Check service
pbs_momis enabled.
- You should have the batch system's hostname in
/var/spool/pbs/server_name, as well as in$restrictedand$pbsservervariables that can be found in/var/spool/pbs/mom_priv/configfile.
Int.eu.grid
This section deals with Int.eu.grid Worker Nodes installation and configuration running under Scientific Linux 4.5 in IFCA site. For administration facilities, we will do it as well in parallel mode.
1. Follow installation and configuration steps as provided in EGEE section.
2. Create/update Int.eu.grid repository:
- Create a temporal
/tmp/i2g.wn.listfile with the content:
rpm http://savannah.fzk.de repository/i2g/production i386 noarch
- Copy this file to the APT path in each node:
for nodename in $NODE_SET ; do scp /tmp/i2g.wn.list $nodename:/etc/apt/sources.list.d/i2g.list ; done
3. Install Int.eu.grid i2g-WN meta-package with APT tools:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; apt-get update ; apt-get -y install i2g-WN" ; done
4. Check out your installation
- Type the following command:
rpm -qa | grep i2g
and check your system packages are the same as the ones published here:
i2g-openmpi-1.2.2-1 i2g-WN-1.2-0 i2g-profile-0.0.15-1 i2g-mpi-start-0.0.34-1 i2g-version-1.2-0 i2g-vomscerts-1.1.0-1 i2g-yaim-sysconfig-0.0.5-1
