Barley info
From FarmShare
(Difference between revisions)
Chekh (Talk | contribs)
(Created page with " === current barley policies === *480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf') *3000 max jobs in the system (look for max_jobs in output of 'qconf -...")
Newer edit →
(Created page with " === current barley policies === *480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf') *3000 max jobs in the system (look for max_jobs in output of 'qconf -...")
Newer edit →
Revision as of 13:25, 8 January 2013
current barley policies
- 480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
- 3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
- 48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precise.q')
- 30 days max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
- 15min max runtime in test.q
- 4GB default mem_free request per job
Technical details
- 19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
- 1 new machine, AMD Magny Cours 24 cores, 192GB RAM
- ~450GB local scratch on each
- ~7TB in /mnt/glusterfs shared across all barley and corn systems
- Grid Engine v6.2u5 (via standard Debian package)
- 10GbE interconnect (Juniper QFX3500 switch)
how to use the barley machines
To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.
Initial issues:
- You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
- The execution hosts don't accept interactive jobs, only batch jobs for now.
- You'll want to make sure you have your Kerberos TGT and your AFS token.
If you want to use the newer bigger storage:
- log into senpai1: "ssh sunetid@<host>.stanford.edu"
ssh sunetid@senpai1.stanford.edu
- cd to /mnt/glusterfs/<your username> (or wait 5mins if it doesn't exist yet)
- write a job script: "$EDITOR test_job.script"
- see 'man qsub' for more info
- use env var $TMPDIR for local scratch
- use /mnt/glusterfs/<your username> for shared data directory
- submit the job for processing: "qsub -cwd test_job.script"
- monitor the jobs with "qstat -f -j JOBID"
- see 'man qstat' for more info
- check the output files that you specified in your job script (the input and output files must be in /mnt/glusterfs/)
Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html