Server Scenario:
Ubuntu 12.04 LTS
Torque w/ Maui Scheduler
Hadoop
I am building a small cluster (10 nodes). The users will have the ability to ssh into any child node(LDAP Auth) but this is really unnecessary since all computation jobs they want to run can be submitted on the head node using torque, hadoop, or other resource managers tied with a scheduler to insure priority and proper resource allocation throughout the nodes. Some users will have priority over others.
Problem:
You can't force a user to use a batch system like torque. If they want to hog all the resources on one node or the head node they can just run their script / code directly from their terminal / ssh session.
Solution:
My main users or "superusers" want me to set up a remote login timeout which is what their current cluster uses to eliminate this problem. (I do not have access to this cluster so I can not grab the configuration). I want to setup a 30 minute timeout on all remote sessions that are inactive(keystrokes), if they are running processes I also want the session to be killed along with all job processes. This will eliminate people from NOT using an available batch system / scheduler.
Question:
How can I implement something like this? Thanks for all the help!
I've mostly seen sys admins solve this by not allowing ssh access to the nodes (often done using the pam module in TORQUE), but there are other techniques. One is to use pbstools. The reaver script can be setup to kill user processes that aren't part of jobs (or shouldn't be on those nodes). I believe it can also be configured to simply notify you. Some admins forcibly kill things, others educate users, that part is up to you.
Once you get people using jobs instead of ssh'ing directly, you may want to look into the cpuset feature in TORQUE as well. It can help you as you try to get users to use the amount of resources they request. Best of luck.
EDIT: noted that the pam module is one of the most common ways to restrict ssh access to the compute nodes.