Getting Started With Condor
Condor is a specialized batch system for managing compute-intensive jobs. Like most batch systems, Condor provides a queueing mechanism, scheduling policy, priority scheme, and resource classifications. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result. But unlike traditional batch systems, Condor is also designed to effectively utilize non-dedicated machines to run jobs. By being told to only run compute jobs on machines which are currently not being used (no keyboard activity, no load average, no active telnet users, etc), Condor can effectively harness otherwise idle machines throughout the network.
How do I know my machine is running Condor?
Type:
$ condor_status
You should see a list of available servers:
Name | OpSys | Arch | State | Activity | LoadAv | Mem | ActvtyTimeA |
aquarius.phys | LINUX | INTEL | Claimed | Suspended | 0.000 | 61 | 0+00:08:40 |
aries.phys.uv | LINUX | INTEL | Owner | Idle | 0.080 | 61 | 0+00:24:09 |
cancer.phys.u | LINUX | INTEL | Unclaimed | Idle | 0.220 | 61 | 0+00:00:37 |
capricorn.phy | LINUX | INTEL | Claimed | Busy | 0.840 | 61 | 0+00:05:56 |
If not, check with your system administrator to see if Condor is installed on your machine.
How do I use the Condor system?
The road to effectively using Condor is short one. The basics are quickly and easily learned.
Using Condor can be broken down into the following steps:
Job Preparation.
First, you will need to prepare your job for Condor. This involves preparing it to run as a background batch job, deciding which Condor runtime
environment (or Universe) to use, and possibly relinking your program with the Condor library via the condor_compile command.
Submit to Condor.
Next, you'll submit your program to Condor via the condor_submit command. With condor_submit you'll tell Condor information about the run,
such as what executable to run, what filenames to use for keyboard and screen (stdin and stdout) data, and where to send email when the job
completes. You can also tell Condor how many times to run a program; many users may want to run the same program multiple times with
multiple different data files. Finally, you'll also describe to Condor what type of machine you want to run your program.
Condor Runs the Job.
Once submitted, you'll monitor your job's progress via the condor_q and condor_status commands, and/or possibly modify the order in which
Condor will run your jobs with condor_prio. If desired, Condor can even inform you every time your job is checkpointed and/or migrated to a
different machine.
Job Completion.
When your program completes, Condor will tell you (via email if preferred) the exit status of your program and how much CPU and wall clock
time the program used. You can remove a job from the queue prematurely with condor_rm.
A Condor universe is an execution environment for your job. The three available universes are the Standard Universe, the Vanilla Universe and the PVM Universe. The Standard Universe provides more services for your job and is generally preferable, but is only available if you can link your application's object code to the Condor libraries. If your job is an executable program where there is no source code or it is impractical to relink (e.g. IRAF), then you must use the Vanilla Universe. The PVM Universe provides PVM communication and synchronization serices to allow true parallel processing. Your job must already incorporate PVM routines for this universe to be useful.
An example of a service provided by the standard universe is: you have a job running on machine X, and someone logs into that machine. In the Standard Universe, Condor can save the status of the application (called checkpointing), and resume it where it left off on machine Y. If this was a Vanilla job, Condor could only suspend the job, or start it from the beginning on machine Y. This is a good reason to use the Standard Universe whenever possible.
You can read more about universes here.
The following code calculates the 499th Fibonacci number:
/*fibonacci.c - calculates fibonacci numbers
*
*/
#define FIB_MAX_NUM 499 /* How many numbers to calculate */
#include <stdio.h>
#include <math.h>
int main() {
int i;
double fibo=1, fib=1, temp=0;
for (i=2;i<FIB_MAX_NUM;i++) {
temp = fib;
fib += fibo;
fibo = temp;
}
printf ("The %dth calculated fibonacci number is: %g\n",i,fib);
}
To compile this code into the Condor Standard Universe, one would use the command:
condor_compile cc fibonacci.c -o fibonacci
To submit the file to the Condor system, a submit description file must be created:
##############
# fibonacci.sdf - Fibonacci demo for condor - submit description file
##############
Executable = fibonacci
Output = fib.out
Log = foo.log
Queue 1
Note that:
To submit the executable, type the command:
condor_submit fibonacci.sdf
Condor should respond with some status information:
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 3.
A more advanced submit description file
##############
#
# Fibonacci demo for condor - advanced submit description file
#
##############
Executable = fibonacci
Requirements = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"
Rank = Memory
Image_Size = 28 Meg
Error = err.$(Process)
Input = in.$(Process)
Output = fib.out
Log = foo.log
Queue 5
For a complete submit description file reference, read the condor-submit manpage.
Each feature of a machine that is published by Condor (Mem, Arch, Mips, etc.) is called a ClassAd in Condor-terminology. It is like a advertisement in a newspaper for the features of the machine, which you can use to determine what machine is most suitable for you.
Statistics for available Condor machines
Here is a list of the lab machines that will accept Condor jobs.
Name | OpSys | CPU | Arch | Memory | Disk | Mips | MFlops | |
aquarius | LINUX | K6II/450 | i586 | 64 | 537 | 52 | ||
cancer | LINUX | K6II/450 | i586 | 64 | 537 | 54 | ||
capricorn | LINUX | K6II/450 | i586 | 64 | 538 | 54 | ||
cod | LINUX | XP1900+ | i686 | 512 | 1915 | 640 | ||
eel | LINUX | Duron/600 | i686 | 256 | 1528 | 512 | ||
gemini | LINUX | K6II/450 | i586 | 64 | 539 | 48 | ||
gull | LINUX | K7/550 | i686 | 128 | 703 | 217 | ||
lab16 | LINUX | P54/200 | i586 | 32 | 188 | 24 | ||
lab30 | LINUX | P55/200 | i586 | 64 | 184 | 24 | ||
lab33 | LINUX | P55/200 | i586 | 64 | 185 | 23 | ||
lab36 | LINUX | P55/200 | i586 | 64 | 183 | 22 | ||
lab37 | LINUX | P55/200 | i586 | 64 | 183 | 22 | ||
leo | LINUX | K6II/450 | i586 | 64 | 520 | 41 | ||
libra | LINUX | K6II/450 | i586 | 64 | 539 | 51 | ||
pisces | LINUX | K6II/450 | i586 | 64 | 522 | 40 | ||
sagittarius | LINUX | K6II/450 | i586 | 64 | 539 | 51 | ||
scorpio | LINUX | K6II/450 | i586 | 64 | 536 | 54 | ||
snapper | LINUX | XP1900+ | i686 | 512 | 2053 | 689 | ||
swan | LINUX | K7/550 | i686 | 128 | 705 | 216 | ||
taurus | LINUX | K6II/450 | i586 | 64 | 542 | 50 | ||
trout | LINUX | K7/550 | i686 | 128 | 703 | 228 | ||
virgo | LINUX | K6II/450 | i586 | 64 | 538 | 50 |
Last modified by Keith Grennan, Feb 06, 2001