Introduction to cluster computing
E. Imamagić, Damir Danijel Žagar, Srce - University
Computing Centre, Croatia
Abstract:
Main goal of this tutorial is to describe computer clusters,
which are common platform used for execution of parallel applications.
Tutorial will focus on application execution by using cluster job management
system.
First, we will make short introduction to parallel computing
and cluster job management systems. Then, we will describe various types
of jobs that users can execute on clusters. We will provide information
how to describe, execute and manage jobs. In the end, we will describe
additional tools that users can utilize to monitor their jobs.
For the tutorial, we will provide real cluster with eight
compute nodes. Cluster is installed in University Computing Centre, with
Rocks cluster distribution. For job management, Sun Grid Engine (SGE)
system is used.
Tutorial overview
- Introduction to clusters & parallel computing
- Cluster environment
- Submitting jobs
- job description
- batch jobs
- interactive jobs
- parallel jobs
- job arrays
- environment variables
- advanced job description
- Job management
- monitoring
- changing job description
- suspending jobs
- stopping jobs
- Other tools - Ganglia monitoring
- direct access to nodes
- executing OS commands
- listing processes
- stopping processes
Biography
Emir Imamagić graduated from the Department of Electronics,
Microelectronics, Computer and Intelligent Systems, Faculty of Electrical
Engineering and Computing, University of Zagreb in May 2004. His research
interests are high performance computing, distributed computing, computer
clusters and grid systems.
Before graduation, he has worked on the AliEn Grid project at CERN, Switzerland
in summer 2003 and on the MidArc middleware project at Ericsson Nicola
Tesla in summer 2002. He is currently working as a researcher on the CRO-GRID
Infrastructure project at University Computing Centre.
|