Abstract
We will present the original new Debian Cluster Distribution design
methodology and implementation for High performance clusters.
Linux clusters are being more and more adopted as supercomputing infrastructure
facilities in scientific research laboratories, but also in other kinds
of research work where high computing power is essential. With their
price/performance ratio linux clusters more and more take over the market
share of specialized super-computers.
In order to build a "linux cluster" from a number of standalone
PCs, one must upgrade a standard linux distribution with some extra
functionality which will provide easy installation, administration,
enforcing the security policy and monitoring of elementary resources
within a linux cluster. Although some of the tools already exist, there
are a very few complete distributions of linux targeting High-Perfomance
users. The real problem is that all the cluster distributions are based
on RedHat as a base Linux distribution.
There are many linux distributions available on the market today, and
every one of them has some pros and cons. But, some quality characteristics
make Debian GNU/Linux a prime choice when selecting a platform for intensive
"Mission-critical" applications.
From practical experience, it is very well known that Debian handles
the security patches in a very unique, admin-friendly way. The average
system can be upgraded to up-to-date package versions in a few minutes,
issuing only one single command. This is essential, because no administrator
wants to spend his/her time manually resolving inter-package dependencies.
There are some tools developed for this task targeted for other distributions,
but APT (Debian's package tool) has been integrated into the system
a long time ago, and is prooved to be very reliable tool, which is capable
of resolving most complex inter-package dependecy problems.
On the other hand, debian security team is one of the most responsive
security teams among other linux distribution's security teams. All
the disclosed security issues are patched and put to the official Debian
APT mirror sites in less than 48 hours. Debian is often the first linux
distribution that releases a patched package when a security problem
occurs. In order to keep the system's security level high, this is a
very important issue. Legal aspect also suggests Debian. Debian tries
hard to be the 'purest' GNU distribution. The social contract assures
that all the developed software is to be held within Open Source. Since
it is driven by 'phylosophy', rather than the market, the concept is
fully functional for more a decade. Unlike some other distributions,
the first goal for Debian is quality of released software, and since
it is not driven by the market, there is always plenty of time to assure
the quality of software.
Looking from technical, legal and security points of view, Debian makes
the first choice when selecting a linux distribution for mission-critical
deployment.
Debian Cluster Distribution structure
Debian Cluster Distribution (DCD) is meant to be an add-on bunch of
packages that are to be installed on top of a regular Debian Sarge distribution.
By adding an entry to the sources list configuration file, and issuing
one single command one can have a fully functional cluster-front-node
ready for automatic installation and deployment
of working-nodes in the cluster.
There are several problems that DCD tries to cover:
- Automatic installation and deployment of working-nodes
- Automatic queueing system configuration
- User files distribution in the cluster
- Monitoring the resources and alarming in critical situations
- Managing NIS information within a cluster
All the automation mechanisms and extra functionality will be available
as debian packages on the APT repository, in order to make it easy to
'promote' a stand-alone Debian PC into a frontend-node capable of controling
and managing the whole cluster of working nodes.
Expected results
We expect to develop a toolset for easier cluster management, based
on Debian GNU/Linux distribution. This involves the development of automation
mechanisms which provide a flexible platform for high-performance computation
tasks, but also provide a system-administrator to have a secure, easy
to maintain, reliable and good supported cluster distribution.