This wiki has been deprecated and will be removed soon.

The new Advanced Computing and e-Science wiki is located at http://grid.ifca.es/wiki.

Please update your bookmarks.

I2G-EGEE Node Installation

De e-Ciencia

EGEE

Worker Node (WN) Parallel Installation & Configuration Guide

This section lists the installation and configuration steps for EGEE Worker Nodes with gLite 3.1 running under Scientific Linux 4.5 in IFCA site.

Since this task usually involves the management of several machines, we will present a parallel version of Worker Nodes installation and configuration.

We will need a head node from where we will run every instruction that follows. In our case this machine is cabezon.ifca.es.

Concerning the machines that will act as Worker Nodes, they are 5 machines whose hostnames are ingrid01-...-ingrid05 (with the domain name ifca.es).

1. Operating System installation

gLite version 3.1 runs on Scientific Linux 4, so we have installed the most recent update , SLC 4.5, for i386 (32 bit) machines since it's the only supported architecture at the moment.

  • OS installation repositories

The official repositories can be found in:

http://ftp.scientificlinux.org/linux/scientific/45/i386/

Besides, we run our own repository for OS installation:

http://toots.ifca.es/scientificlinuxcern/slc45/i386/

2. Setting the head node

  • Define an environment variable with the hostname of the machines (Worker Nodes) we are about to install:
NODE_SET="ingrid01 ingrid02 ingrid03 ingrid04 ingrid05"
  • Set SSH keys ready to work

Include all authorized keys in your OS installation so that you don't need to copy them.

In this case you only have to run the following commands in order to access the nodes without being asked for any password:

eval `ssh-agent`
ssh-add 

Test access to nodes:

for nodename in $NODE_SET ; do ssh $nodename "hostname" ; done

3. Firewall configuration

  • Check your firewall open ports for the required services to be able to run properly. Some of this ports are:
    • 22 (for SSH)
    • 15001/15002/15003 (for Torque pbs_mom service)
    • 30000:30101 (port range to allow callbacks for RFIO)
  • We are going to overwrite the firewall definition (/etc/sysconfig/iptables) in every node, so it must be useful to backup those files:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; cp -avf /etc/sysconfig/iptables /etc/sysconfig/iptables.0" ; done
  • Make your firewall definition and copy to every node. In our case the file is /root/FIREWALL/pro/iptables.wn:
for nodename in $NODE_SET ; do scp /root/FIREWALL/pro/iptables.wn $nodename:/etc/sysconfig/iptables ; done
  • Restart the firewall in each node and configure it for automatic start:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; chkconfig iptables on ; service iptables stop ; service iptables start" ; done
  • Finally check all nodes' firewall activity:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; iptables -L ; echo '---------------------------------'" ; done

4. Node Synchronization

To carry out this issue we need the latest version of ntp. If you don't have it try to obtain it with APT or YUM.

  • Add the following lines at the end of the NTP configuration file (/etc/ntp.conf):
restrict 130.206.3.166 mask 255.255.255.0 nomodify notrap noquery
server hora.rediris.es

To accomplish this, run the following instruction (backup your default ntp.conf):

for nodename in $NODE_SET ; do ssh $nodename "hostname ; cp -a /etc/ntp.conf /etc/ntp.conf.0 ; echo -e \"restrict 130.206.3.166 mask 255.255.255.0
nomodify notrap noquery\nserver hora.rediris.es\" >> /etc/ntp.conf" ; done

Check the results of previous command:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; tail -2 /etc/ntp.conf" ; done
  • Now we need to configure the service ntpd:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; ntpdate hora.rediris.es ; chkconfig ntpd on ; service ntpd stop ; service ntpd start" ; done

Note that our time server is hora.rediris.es.

  • Test correct NTP configuration:
for nodename in $NODE_SET ; do ssh $nodename "hostname ; ntpq -p" ; done

5. Java installation

Download and install latest Java JDK version from Sun. As we can see in [1], this version has to be 1.5 or greater.

At IFCA site there is a repository for JAVA releases:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; rpm -ivh http://cabezon.ifca.es/java/jdk-1_5_0_12-linux-i586.rpm" ; done

6. Node installation with YUM

  • Certification Authorities (CA)

Create a temporal /tmp/lcg-ca.wn.repo file with the content:

[CA]
name=CAs
baseurl=http://linuxsoft.cern.ch/LCG-CAs/current

Copy this file to the YUM path in each node:

for nodename in $NODE_SET ; do scp /tmp/lcg-ca.wn.repo $nodename:/etc/yum.repos.d/lcg-ca.repo ; done

Yum is slow, I prefer to use this repository with APT:

http://grid-deployment.web.cern.ch/grid-deployment/yaim/repos/lcg-CA.list

Install the latest CA rpms to upgrade the Worker Node CA list:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; yum install -y lcg-CA" ; done
  • Node packages

We want to install Worker Nodes with Torque support as a client, so we will create a new file (temporal) with the location of those packages (/tmp/glite.wn.repo):

[glite-WN]
name=gLite 3.1 Worker Node
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-WN/sl4/i386/
enabled=1

[glite-TORQUE_client]
name=Torque clients
baseurl=http://linuxsoft.cern.ch/EGEE/gLite/R3.1/glite-TORQUE_client/sl4/i386/
enabled=1

Copy /tmp/glite.wn.repo file to each node and rename it with /etc/yum.repos.d/glite.repo:

for nodename in $NODE_SET ; do scp /tmp/glite.wn.repo $nodename:/etc/yum.repos.d/glite.repo ; done 

And now we are able to install them with YUM:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; yum install -y glite-WN glite-TORQUE_client" ; done

Note: by the time we installed these nodes, we obtained the following error when we ran the previous instruction:

Processing Dependency: log4j >= 1.2.8 for package: glError: Missing Dependency: log4j >= 1.2.8 is needed by package glite-rgma-stubs-servlet-java

When you get java dependency errors like the one above, you should check that your jpackage repository is correctly set. Take a look to the Generic Installation and Configuration Guide of your gLite version at https://twiki.cern.ch/twiki/bin/view/LCG/LcgDocs.

7. Node configuration with YAIM

  • Copy YAIM site configuration file

Create/copy site-info.def, users.conf, groups.conf, wn-list.conf and vo.d directory in /root/INSTALL/EGEE/yaim/siteinfo/ and check site-info.def variables in order to properly configure the nodes:

YAIM_VERSION=3.1.0-3
JAVA_LOCATION="/usr/java/jdk1.5.0_12"
LCG_REPOSITORY="'baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.1/'"
REPOSITORY_TYPE="yum"

Besides of that it is necessary to include an extra variable, BATCH_SERVER, to avoid errors in TORQUE_client configuration. It's value is the batch system's hostname:

BATCH_SERVER=torque00.$MY_DOMAIN

Create /root/INSTALL in each node:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; mkdir -v /root/INSTALL" ; done

Now copy the whole directory (with the YAIM files) to every node:

for nodename in $NODE_SET ; do scp -r /root/INSTALL/EGEE $nodename:/root/INSTALL ; done
  • Run YAIM configuration

With the following instruction:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; /opt/glite/yaim/bin/yaim -c -s
/root/INSTALL/GRID/yaim/siteinfo/site-info_glite31_WN_GRID_070724_175500.def -n glite-WN -n TORQUE_client" ; done

Note that with the previous instruction we will run a serial configuration for the nodes. We can run simultaneously configuration with:

for nodename in $NODE_SET ; do ( ssh $nodename "hostname ; /opt/glite/yaim/bin/yaim -c -s
/root/INSTALL/GRID/yaim/siteinfo/site-info_glite31_WN_GRID_070724_175500.def -n glite-WN -n TORQUE_client" &) ; done

8. SSH Configuration

See Passwordless SSH section in Grid Administration Guide.

9. Troubleshooting

  • Check service pbs_mom is enabled.
  • You should have the batch system's hostname in /var/spool/pbs/server_name, as well as in $restricted and $pbsserver variables that can be found in /var/spool/pbs/mom_priv/config file.


Int.eu.grid

This section deals with Int.eu.grid Worker Nodes installation and configuration running under Scientific Linux 4.5 in IFCA site. For administration facilities, we will do it as well in parallel mode.

1. Follow installation and configuration steps as provided in EGEE section.

2. Create/update Int.eu.grid repository:

  • Create a temporal /tmp/i2g.wn.list file with the content:
rpm http://savannah.fzk.de repository/i2g/production i386 noarch
  • Copy this file to the APT path in each node:
for nodename in $NODE_SET ; do scp /tmp/i2g.wn.list $nodename:/etc/apt/sources.list.d/i2g.list ; done

3. Install Int.eu.grid i2g-WN meta-package with APT tools:

for nodename in $NODE_SET ; do ssh $nodename "hostname ; apt-get update ; apt-get -y install i2g-WN" ; done

4. Check out your installation

  • Type the following command:
rpm -qa | grep i2g

and check your system packages are the same as the ones published here:

i2g-openmpi-1.2.2-1
i2g-WN-1.2-0
i2g-profile-0.0.15-1
i2g-mpi-start-0.0.34-1
i2g-version-1.2-0
i2g-vomscerts-1.1.0-1
i2g-yaim-sysconfig-0.0.5-1
Grid Administration
Users Support