(844) 825-5971   Contact Us SimuTech Group’s YouTube Channel SimuTech Group on Facebook

Get Our Latest Tips & Tricks

Receive the latest updates to our popular Tips & Tricks articles and videos by subscribing to our Monthly Newsletter and Subscribe to the SimuTech Group YouTube channel channel.
For even more advanced ANSYS instruction, check out our consulting and training services. Or simply contact us to discuss any of your ANSYS needs.

Configuring an open-source “HPC platform” with ANSYS RSM and a PBS/Torque job scheduler under OpenSuse

One of the main advantages of using the Remote Solver Manager (RSM) with ANSYS software is to free-up the personal computer by sending jobs to a remote machine, enabling the engineer to continue working normally locally.

To go one step further, RSM can be installed on a “High Performance Computing (HPC) machine” to run much faster than on the local machine and to take full advantage of parallel computing.

When choosing an HPC platform, one has to consider the following components:

  • Hardware (e.g., Intel, AMD, GPU, Xeon Phi, …)
  • OS (e.g., Linux, Windows HPC, …)
  • Network (e.g., Ethernet, Infiniband, …)
  • Job Management Software = Scheduler + Head Node (e.g., LSF (IBM), Microsoft HPC (limited to the only MSMPI), SGE, Altair PBS Pro, Adaptive Computing with Moab/Torque, ANSYS RSM … )

There is a lot of competition and the choice of a correct setup is crucial to get the most of the power available on the machine. When the HPC platform is shared among several engineers, the problem of scheduling and distributing resources appears really quickly so a correct job scheduler has to be deployed to reserve access to a job in a correct manner.

One thing that needs to be considered when choosing a correct HPC platform is the cost of all the components:

Hardware + OS + Network + Job Management Software + Ansys Software= $$$

One good thing is that with a basic knowledge of Linux commands, you will be able to save on two of those requisites to be able to invest more on what matters: better hardware, better network, and more ANSYS licenses!

One of the advantages of Linux is that it is free and open-source! Considering the architecture, with the same calculations running on the same hardware, studies have shown that the computation time is quite the same. However, when it comes to parallel computing, the choice of a right MPI is critical: according to some of our tests, for a ANSYS FLUENT multiphase analysis running on 20 cores, the same calculation running on the same hardware takes twice as much time on MSMPI than on Intel MPI. Knowing that Windows HPC is limited to MSMPI, your choice of a correct HPC platform becomes crucial.

So this guide will describe how to develop an HPC platform under a Linux distribution (OpenSuse 13) with the OpenSource version of Altair PBS known as PBS/Torque.

For this guide, we will be using the Job Manager PBS/Torque v2.5 and the integrated scheduler pbs_scheduler. Note that there is also an improved version of pbs_scheduler called Maui (also open source) but this version is not compatible with the way RSM sends jobs so we are looking here at a basic scheduling of jobs that can be done through RSM or/and pbs_scheduler.

  • The current guide is done under OpenSuse which is freely distributed under the GNU license but is reproducible with your preferred version of Linux or even with professional editions of Linux (SLES, …).
  • The current setup will consider several machines :
    • Your Windows client machine where ANSYS 15 is already installed and running
    • A Linux manager called HPC-MANAGER where will be installed :
    • RSM v15 as well as ANSYS Suite R15
    • PBS/TORQUE with PBS Scheduler
    • One or several Linux clusters called CLUSTER0*


We will consider for this part that the HPC-MANAGER is NOT a compute machine. If you want to add the manager as a cluster you can follow the steps for the cluster on the manager.


Part I – Installing Open-Suse on HPC-MANAGER

  • On the HPC-MANAGER ready to format, insert and boot on the OpenSuse Installation CD
  • Follow the on screen instructions
  • Unstick : automatic configuration
  • Prepare your partition with at least a 10Go SWAP and a large root (/) space in order to use the /tmp for calculations
  • Enable REMOTE ADMINISTRATION under VNC (will be useful for latter) and SSH
  • Assign Hostname to loopback
  • As a username we will use STG-HPC (it is important to choose the same username for all you MANAGER/CLUSTERS in order to not have to recreate this user manually each time)
  • Install ANSYS R15 with RSM packages : mount .iso through:

mkdir /mnt/disk1
mount –o loop /…../path/ansys_disk1.iso /mnt/disk1

  • etc … for disk2 disk3
  • launch the install
  • this guide will pass the configuration of ANSYS since this is similar to windows, just look in the installation guide for common packages that could be missing if you intend to use the graphical version of ANSYS on Linux (that we will not do here)
  • one useful is lsb for opensuse which provide QT for running graphics windows
  • If your installation did not create the symbolic link /ansys_inc, create it by typing:


ln –s /usr/ansys_inc /ansys_inc

Part II – Prepare the shared folders

  • On HPC-MANAGERinstall a NFS SERVER
    • In a super-user terminal type :

zypper  install nfsserver

    • once you have installed it, you will have the GUI in YAST
  • Make the temp folder for calculations, in a super-user (SU) terminal type :


mkdir /rsm_tmp
chmod 777 /rsm_tmp

  • As STG-HPC, create a folder called ClusterPackages in your /home folder (complete path : /home/stg-hpc/ClusterPackages)
  • Start the NFS Server  in YAST

 

Open Source HPC 1

 

  • Insert your domain details (you can put the same domain as your Windows domain or create a different one for your Linux machines, since I want the Linux machines accessible from windows computer, I will put the same) and enable nfsv4

Open Source HPC 2

  • Add the temp folder for RSM and ClusterPackages to export and leave the common settings :

Open Source HPC 3 

 

  • If you have an error, just ignore it and open a SU terminal and type:

service nfs-server restart

Optional : Make windows shares for Windows-to-Linux transfers

If you are sending simulations from Windows, it is possible to avoid subsequent transfers between computers and folder and it is recommended to totally share the /rsm_tmp with windows through Samba.

For that, still in Yast, open Samba-server and share the folder /rsm_tmp 2 times (since in our case the Working directory and the project Directory will be the same = faster)

Open Source HPC 4
And add the following parameters to the shares

[RSM_Mgr]
path = /rsm_tmp
browseable = yes
writable = yes
create mode = 0664
directory mode = 0775
guest ok = no

[RSM_CS]
path = /rsm_tmp
browseable = yes
writable = yes
create mode = 0664
directory mode = 0775
guest ok = no

You can also directly add those lines to the file /etc/samba/smb.conf and restart the samba server

Part III – Installing the PBS/Torque Job Scheduler

III.1 – Installing the Manager

For help, you can use this guide provided by Adaptive Computing

http://docs.adaptivecomputing.com/torque/2-5-12/torqueAdminGuide-2.5.12.pdf

I will summarize here the important steps to simplify things:

On the HPC-MANAGER, Download the version 2.5 of Torque to your personal download folder

As a SUPER-USER:

  • move this packages to the root personal folder :

mv torque2.5.tar /root/

  • unzip it

tar –xvf torque2.5.tar

  • install the prerequisites

zipper install libxml2-devel openssl-devel gcc gcc-c++ boost-devel

  • go into the torque folder

configure
make
make install

make install installs all files in /usr/local/bin, /usr/local/lib, /usr/local/sbin, /usr/local/include, and /usr/local/man and the main folder in /var/spool/torque

  • Install torque as a service

cp contrib/init.d/suse.pbs_server /etc/init.d/pbs_server
insserv -d pbs_server
cp contrib/init.d/suse.pbs_sched /etc/init.d/pbs_sched
insserv -d pbs_sched

  • Create torque.conf with libraries

echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
ldconfig

  • Kill the server if it is running :

qterm

  • Create the basic server configuration file by using inside the torque binaries folder :

./torque.setup root

If you want root to be the torque manager

  • Verify your configuration by :

qmgr -c 'p s'



III.2 – Creating cluster packages

Still in the torque folder, type as SU:

make packages

Copy the desired to your shared location

cp torque-package-mom-linux-i686.sh /home/stg-hpc/ClusterPackages
cp torque-package-clients-linux-i686.sh /home/stg-hpc/ClusterPackages
cp torque-package-devel-linux-i686.sh /home/stg-hpc/ClusterPackages
cp contrib/init.d/suse.pbs_mom /home/stg-hpc/ClusterPackages

III.3 – Installing Torque on compute Nodes

Torque use MOM to communicate with the compute nodes.

In order to install it on the compute node, install open-suse the same way you did for the head node as well as Ansys v15 on each compute node with the same username, then :

  • Mount the 2 nfs shared from the manager node with the NFS Client :
    • Mount /rsm_tmp to the same /rsm_tmp on the local node
    • Mount …/ClusterPackages to /mnt/ClusterPackages

Open Source HPC 5

  • Install MOM by performing the following actions :

cd /mnt/ClusterPackages
./torque-package-mom-linux-i686.sh --install
./torque-package-clients-linux-i686.sh --install
./torque-package-devel-linux-i686.sh –install
cp suse.pbs_mom /etc/init.d/pbs_mom
insserv –d pbs_mom
echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
ldconfig

  • Verify that in /var/spool/torque the server_name file contains the correct Manager Node name
  • When Clusters communicate with the head node, they need to send files to each other. The usual way to send it through ssh (with scp) but we will use the advantages of a shared NFS folder to make things even faster. To tell torque to copy files instead of sending them through ssh, create a file on each cluster node in /var/spool/torque/mom_priv/ called config with :


echo “$usecp *:/rsm_tmp /tmp”>> config

  • Start MOM by using:


service pbs_mom start



III.4 – Adding configured nodes

  • Back on the HPC-MANAGER, go into /var/spool/torque/server_priv
  • Create a file called nodes by using:

vim nodes

  • And add a line for each cluster with its number of cpus (can be also auto-determined):

# Nodes 001 to 004 are cluster nodes
cluster01 np=4
cluster02 np=4
cluster03 np=4
cluster04 np=4

  • Start the services :

service pbs_server start
service pbs_sched start


The PBS TORQUE server should be up and running, you should look at the folder ./server_logs to see if the nodes are correctly responding and by typing nodes –a to see their status.

This guide cannot cover in depth the configuration of torque but the provided steps are the minimum steps required to run torque with compute nodes.

Part IV – Installing RSM on the Manager Node

Now that Torque is up and running on the HPC-MANAGER, we need RSM running as a service on the same machine.

  • Go into /ansys_inc/v150/RSM/Config/tools/linux
  • Run :

./rsmconfig –mgr –svr –xmlrpc


The installation will create a user called rsmadmin, only member of the group rsmadmins.

For an unknown reason, the user rsmadmin is not authorized to administer the RSM so the best thing to do is to add STG-HPC to the rsmadmins group.

If you do not want that any user using the RSM be admin, you can create a custom user MYRSMADMIN and add it as an alternate account in the RSM Manager, it will only be used to configure the server but will not be allowed to run simulation if you do not perform every action of this tutorial with this user also.

  • When you have done that, you could run the RSM Manager either from linux (inside the Config/tools/linux/rsmadmin) or windows by adding the remote manager.
  • Once the HPC is added, you need to specify an alternate account for this machine member of the rsmadmins group (refer to the Tips and Tricks : RSM of Simutech website to perform those actions) to obtain the following configuration:


Open Source HPC 6

  • The first thing to do after is to change the Project Directory to be the same as the Working Directory. To perform this action, Right-Click on the HPC Node > Properties and change as the following:

 

Open Source HPC 7



Part V – Configuring a Custom Cluster

RSM is not fully compatible with old version of Torque and has now been configured to work with MOAB which use is different job submission system. In order to make it compatible with Torque, we have to create a Custom Cluster with a special submission argument.

To do that refer to the RSM Users Guide and by following the following actions:

  • Edit the properties of the Compute Server newly added to RSM as follow:


Open Source HPC 8

Open Source HPC 9

 

  • To create CUSTOM CLUSTER we need to have the corresponding
    • hpc_commands_PBS_Torque.xml
    • submit_PBS_Torque.xml
    • GenericJobCode_PBS_Torque.xml
  • Navigate to /ansys_inc/v150/RSM/Config/xml/EXAMPLES and:

cp hpc_commands_PBS_EXAMPLE.xml hpc_commands_PBS_Torque.xml
cp GenericJobCode_PBS_EXAMPLE.xml GenericJobCode_PBS_Torque.xml

  • Edit GenericJobCode_PBS_Torque.xml and replace PBS_EXAMPLE by PBS_Torque (2 times)
  • Edit hpc_commands_PBS_Torque.xml and replace PBS_EXAMPLE by PBS_Torque (1 time)
  • Move those 2 files to ../xml/
  • Go into ../Config/scripts/EXAMPLES


cp submit_PBS_EXAMPLE.xml submit_PBS_Torque.xml

  • Edit submit_PBS_Torque.xml and change the following lines

if _distributed == None or _distributed == "FALSE":
    _qsub += " -l select=1:ncpus=$RSM_HPC_CORES:mpiprocs=$RSM_HPC_CORES"
else:
    _qsub += " -l select=$RSM_HPC_CORES:ncpus=1:mpiprocs=1"

BY :
_qsub += " -l procs=$RSM_HPC_CORES "

  • That’s it
  • Test your configuration in the RSM Manager window to see if the correct script files are used.


Part VI – Optional: Configuring a Multi-Nodes HPC by configuring the MPI

In order for your clusters to communicate between them, you need to allow them to SSH into one another without password and be authorized. In a SSH connection, a client connect to a server which have to know the client and his key.

In MPI connections, each clusters/head node is alternatively client and server so you will have to configure every possible connection i.e. cluster1 as client to cluster2,3,4 and Manager but also for cluster2,3,4 as client connecting to cluster1 as a server

To be able to ssh passworlessly, you need for each client-to-server connection as STG-HPC:

It is very important that those actions are performed with the current user and not root.

  • Create a RSA key on the client
  • ssh-keygen –t rsa (leave all blank/enter 3 times)
  • You will obtain in the folder /home/stg-hpc/.ssh/ 2 files : id_rsa id_rsa.pub
  • You have to transfer id_rsa.pub to each server and add the content to /home/stg-hpc/.ssh/authorized. You can do both actions in 2 line with:

ssh server "mkdir .ssh; chmod 0700 .ssh"
cat id_rsa.pub | ssh stg-hpc@clusterXX \"cat - >> ~/.ssh/authorized_keys\"

After adding the key of the client to the authorized_keys file, you need to add the host to the known_hosts file. This can be done automatically by testing the connection to the server. So from each client, open an ssh connection to the server.

ssh cluster1

The authenticity of host 'example.com (12.33.45.678)' can't be established. RSA key fingerprint is 3c:6d:5c:99:5d:b5:c6:25:5a:d3:78:8e:d2:f5:7a:01. Are you sure you want to continue connecting (yes/no)?

Type yes and the client will be added to the known_hosts file on the server

Considering a N clusters configuration with 1 Manager Node, you will have to perform this action N-1 times for each cluster-cluster connection and 1 time for the cluster-Manager connection, and thus on each machine so you will have to do this N2 times.