I am attempting to run a SAS
file on a cluster. The contents of the SAS
file myprogram.sas
are shown below:
data a;
input myvar1;
myvar2 = myvar1 + 100 ;
datalines;
0
1
2
3
4
5
;
proc print;
run;
I create a Condor
file to execute the SAS
file on the cluster. The contents of the Condor
file mycondorcode.condor
are shown below, except that I have altered the email address:
####################
#
# Submit SAS code to Condor cluster
#
# Submit this to run on the cluster with condor_submit THIS-FILENAME.condor
#
####################
UNIVERSE = vanilla
NOTIFICATION = Complete
NOTIFY_USER = [email protected]
REQUIREMENTS = (OpSys == "LINUX" && HAS_SAS )
GETENV = TRUE
EXECUTABLE = /usr/local/bin/sas
ARGUMENTS = -nodms -noterminal
INPUT = myprogram.sas
OUTPUT = $(INPUT).out
ERROR = $(INPUT).err
LOG = $(INPUT).log
QUEUE
I copy the SAS
and Condor
files to the cluster using an application called WinSCP.exe
which I guess converts the SAS
file to a format the cluster can understand, I guess like a dos2unix
command.
Then I submit the SAS
file to the cluster using PuTTY
by typing:
condor_submit mycondorcode.condor
When I type:
condor_q
I see:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
58683.0 markm 11/24 14:41 0+00:00:00 I 0 0.0 sas -nodms -noterm
Status (ST
) remains I
no matter how long I wait.
I can see a text file in my directory called myprogram.sas
which contains the following (except that I have altered the email address and altered the number that looks like it could be an IP address):
000 (58683.000.000) 11/24 14:41:55 Job submitted from host: <14.4.104.1:42259>
...
022 (58683.000.000) 11/24 14:42:56 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to [email protected] <14.4.104.23:50176>
...
024 (58683.000.000) 11/24 14:42:56 Job reconnection failed
Job not found at execution machine
Can not reconnect to [email protected], rescheduling job
...
022 (58683.000.000) 11/24 14:43:56 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to [email protected] <14.4.104.23:50176>
...
024 (58683.000.000) 11/24 14:43:56 Job reconnection failed
Job not found at execution machine
Can not reconnect to [email protected], rescheduling job
...
022 (58683.000.000) 11/24 14:44:56 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to [email protected] <14.4.104.23:50176>
...
024 (58683.000.000) 11/24 14:44:56 Job reconnection failed
Job not found at execution machine
Can not reconnect to [email protected], rescheduling job
...
022 (58683.000.000) 11/24 14:45:57 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to [email protected] <14.4.104.23:50176>
...
024 (58683.000.000) 11/24 14:45:57 Job reconnection failed
Job not found at execution machine
Can not reconnect to [email protected], rescheduling job
...
I have never successfully used this cluster, but have run R
on a different cluster. I know virtually nothing more about the current cluster. Based on what I have provided above does it appear that I am doing something incorrectly, or does it appear that there is a connection problem which must be addressed by the IT department who operates the cluster?
Thank you for any suggestions I might try to resolve this problem from my Windows desktop side while being almost entirely unfamiliar with Unix
or clusters
in general. Perhaps I am doing something incorrectly with WinSCP.exe
. Perhaps instead of using WinSCP
I might try using dos2unix
?