Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v0 of scipy on htcondor tools #1

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Steve Goldstein [email protected]
90 changes: 87 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,88 @@
scipy-on-htcondor
=================
Running python+numpy+scipy on HTCondor
======================================

Author: Steve Goldstein [email protected] July, 2014

To run python with numpy or scipy on condor, you have to ensure that
the execute node has a version of python that includes those
libraries. CHTC has solved that problem by building such a version of
python and providing a means to install that version on the execute
node.

The purpose of the scipy-on-htcondor project is to give you an easy way to
implement CHTC's solution that can be widely applied. On the CHTC web
pages, you can read about some of the details and a more general
implementation.

Step by step instructions:

1. Log on to the CHTC submit node:

```
ssh <netID>@submit-3.chtc.wisc.edu
```

1. Make a directory for your python work.

```
mkdir myProject
cd myProject
```

1. Copy the scipy-on-htcondor.tar.gz tar file to your workspace and extract the
archive.

```
wget <URL here...>
tar zxvf scipy-on-htcondor.tgz
```

1. Open a another terminal on your own local machine and transfer the program
to the indir/ subdirectory.

```
scp /path/to/my/myPythonProgram.py <netID>@submit-3.chtc.wisc.edu:myProject/indir
```

1. Transfer any other input files to the indir/ subdirectory.

```
scp /path/to/other/necessary_input_files <netID>@submit-3.chtc.wisc.edu:myProject/indir
```

1. Now back on the terminal in which you are logged into the CHTC submit node,
make sure the first line of the python program looks like this:

```
#!/usr/bin/env python
```

It should match the first line of the pythontest.py program:

```
head -1 indir/*py
```

1. Edit the condor submit file, following the (albeit, sparse) directions in
that file:

```
nano queue/process.cmd
```

1. Submit the jobs to condor:

```
condor_submit queue/process.cmd
```

1. When the jobs have finished, run the cleanup script:

```
queue/cleanupCHTCPython.pl
```

Your output will be in the outdir directory. Check the files in `chtcOutput/`
for errors.


Tools to effectively launch SciPy based jobs on HTCondor environments
32 changes: 32 additions & 0 deletions indir/pythontest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python

import os
import platform
import sys
import time
import numpy
from scipy import sqrt, pi

print >> sys.stderr, __doc__
print "Version :", platform.python_version()
print "Program :", sys.executable
print 'Script :', os.path.abspath(__file__)
print 'Args :', sys.argv[1:]
print

a = numpy.arange(10000000)
b = numpy.arange(10000000)
c = a + b


h1 = sqrt(pi/2)

print a,b
print c
print h1
print

f = open('DONE','w')
print >>f,'all','done'
sys.exit(0)

4 changes: 4 additions & 0 deletions outdir/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore
45 changes: 45 additions & 0 deletions queue/cleanupCHTCPython.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/perl -w


use strict;
use Carp;
use English;
use Getopt::Long;
use Cwd;

GetOptions (

);

my $chtcNoise = "./chtcOutput";
croak "$chtcNoise directory already exists."
if ( -e $chtcNoise);

croak "Can't create $chtcNoise directory"
unless (mkdir $chtcNoise);

my @filelist =
qw
(
ChtcWrapper*out
AuditLog*
CURLTIME*
outdir/*
queue/error/*
DONE
harvest.log
CODEBLOWUP
);

foreach my $fileSpec (@filelist) {
`mv $fileSpec $chtcNoise 2> /dev/null`;
}

opendir D, "." or croak "Can't read directory";
my @outputFiles = grep {/^\d+\.\d+\.out$/} readdir D;
closedir D;

`mv @outputFiles outdir`;


__END__
84 changes: 84 additions & 0 deletions queue/process.cmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
###############
universe = vanilla


###################################################
# 1. Edit these 2 lines to run your python program.
###################################################

PythonProgram = pythontest.py
PythonProgramArguments = $(PROCESS)
###################################################

###################################################
# 2. Edit the last line in this file
#
# queue N
#
# to run N instances of this program.
###################################################

###################################################
# 3 . Edit these lines if appropriate
###################################################
transfer_input_files = indir/

# Tell Condor how many CPUs (cores), how much memory (MB) and how much
# disk space (KB) each job will need:
request_cpus = 1
request_memory = 1000
request_disk = 1000000
###################################################


###################################################
# 4. Most of the rest of this file does not need editing
###################################################

executable = /squid/example/ChtcRun/chtcjobwrapper
arguments = --type=Other --version=Python-2.7.3 --cmdtorun $(PythonProgram) --unique=$(CLUSTER).$(PROCESS) -- $(PythonProgramArguments)


output = outdir/process.$(CLUSTER).$(PROCESS).out
error = queue/error/process.$(CLUSTER).$(PROCESS).err
log = queue/process.$(CLUSTER).log

#################################################################

requirements = (OpSysAndVer =?= "SL6")

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

#################################################################
# By default, your job will be submitted to the CHTC's HTCondor
# Pool only, which is good for jobs that are each less than 24 hours.
#
# If your jobs are less than 4 hours long, "flock" them additionally to
# other HTCondor pools on campus by uncommenting the below line:
#+WantFlocking = true

#
# If your jobs are less than ~2 hours long, "glide" them to the national
# Open Science Grid (OSG) for access to even more computers and the
# fastest overall throughput. Uncomment the below line:
#+WantGlidein = true
###################################################

# Release a job from being on hold hold after 5 minutes (300 seconds), up to 4 times,
# as long as the executable could be started, the input files and initial directory
# were accessible and the user log could be created. This will help your jobs to retry
# if they happen to fail due to a computer issue (not an issue with your job)
periodic_release = (JobStatus == 5) && ((CurrentTime - EnteredCurrentStatus) > 300) && (JobRunCount < 5) && (HoldReasonCode != 6) && (HoldReasonCode != 14) && (HoldReasonCode != 22)

# If you want your jobs to go on hold because they are
# running longer then expected, uncomment this line and
# change from 24 hours to desired limit:
#periodic_hold = (JobStatus == 2) && ((CurrentTime - EnteredCurrentStatus) > (60 * 60 * 24))

# We don't want email about our jobs.
notification = never

###################################################
## This must be the last line
queue 2