Main Page

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(licensed software)
Line 9: Line 9:
= How to connect  =
= How to connect  =
-
The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine that has relatively low load.  
+
The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine (e.g. corn21) that has relatively low load.  
-
The "barley" machines are only accessible via a resource manager (currently Open GridEngine). You'll need to ssh to corn-image-new.stanford.edu and a directory will be created for you on local (shared among barley) storage. E-mail the barley-alpha mailing list for more info.
+
The "barley" machines are only accessible via a resource manager (currently Open Grid Engine). You can submit jobs from any corn.  You'll need to ssh to corn-image-new.stanford.edu and a directory on /mnt/glusterfs will be created for you on non-AFS storage. E-mail the barley-alpha mailing list for more info.
=cardinal info=
=cardinal info=
Line 17: Line 17:
=corn info=
=corn info=
 +
The "corn" machines are general-purpose Ubuntu boxes and you can run whatever you want on them.  Please read the policies and the motd first.
 +
*Policies: http://lelandpolicy.stanford.edu
*Policies: http://lelandpolicy.stanford.edu
*IT services page: https://itservices.stanford.edu/service/unixcomputing
*IT services page: https://itservices.stanford.edu/service/unixcomputing
Line 26: Line 28:
=barley info=
=barley info=
-
 
+
The "barley" machines are general-purpose newer Ubuntu boxes that can run jobs that you submit via the resource manager software.
-
Job submission now works from any corn.
+
Technical details:
Technical details:
Line 37: Line 38:
*10GbE interconnect (Juniper QFX3500 switch)
*10GbE interconnect (Juniper QFX3500 switch)
-
To start using the new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.
+
To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.
Initial issues:
Initial issues:
Line 58: Line 59:
Any questions, please email 'barley-alpha@lists.stanford.edu'
Any questions, please email 'barley-alpha@lists.stanford.edu'
-
We plan to have "alpha" testing for a month or so, then rebuild the
+
We plan to roll out to the full Stanford community on Jan 1.
-
storage nodes using the information we learned, and also rebuild the
+
-
execution hosts to Ubuntu 11.10.  Then we'll have "beta testing" with
+
-
more users in Nov and Dec and roll out to the full Stanford community on  
+
-
Jan 1.
+
== barley software ==
== barley software ==
Line 80: Line 77:
* 100 max jobs per user
* 100 max jobs per user
* 3000 max jobs in the system
* 3000 max jobs in the system
-
* 48hr max runtime for any job
+
* 48hr max runtime for any job in regular queue
 +
* one week max runtime for the long queue
* no memory/CPU limits yet
* no memory/CPU limits yet
Line 110: Line 108:
= Getting started with MediaWiki =
= Getting started with MediaWiki =
-
 
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
-
 
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [http://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
* [http://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]

Revision as of 13:28, 13 December 2011

+-+-+-+-+-+-+-+-+-+
|F|a|r|m|S|h|a|r|e|
+-+-+-+-+-+-+-+-+-+

This wiki is intended for the users of the Stanford shared research computing resources. E.g. the "cardinal" and "corn" and "barley" machines.

Contents

How to connect

The machines are available for anyone with a SUNetID. Simply "ssh corn.stanford.edu" with your SUNetID credentials. The DNS name "corn.stanford.edu" actually goes to a load balancer and it will connect you to a particular corn machine (e.g. corn21) that has relatively low load.

The "barley" machines are only accessible via a resource manager (currently Open Grid Engine). You can submit jobs from any corn. You'll need to ssh to corn-image-new.stanford.edu and a directory on /mnt/glusterfs will be created for you on non-AFS storage. E-mail the barley-alpha mailing list for more info.

cardinal info

The "cardinal" machines are older and slower and smaller and intended for long-running processes that are not resource intensive. E.g. mail/chat clients. You could log in to a cardinal and run a screen/tmux session there to do things on other machines...

corn info

The "corn" machines are general-purpose Ubuntu boxes and you can run whatever you want on them. Please read the policies and the motd first.

Each of the 30 corn machines has 8 cores, 32GB RAM and ~70GB of local disk in /tmp.

barley info

The "barley" machines are general-purpose newer Ubuntu boxes that can run jobs that you submit via the resource manager software.

Technical details:

  • 19 new machines, 24 cores each, 96GB RAM
  • 1 new machine, 24 cores, 192GB RAM
  • ~450GB local scratch on each
  • ~3TB in /mnt/glusterfs
  • Grid Engine v6.2u5 (via standard Debian package)
  • 10GbE interconnect (Juniper QFX3500 switch)

To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

Initial issues:

  • You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
  • The execution hosts don't accept interactive jobs, only batch jobs for now.
  • You'll want to make sure you have your Kerberos TGT and your AFS token.

If you want to use the newer bigger storage:

  1. log into corn-image-new: "ssh sunetid@corn-image-new.stanford.edu"
  2. cd to /mnt/glusterfs/<your username> (or wait 5mins if it doesn't exist yet)
  3. write a job script: "$EDITOR test_job.script"
    1. see 'man qsub' for more info
    2. use env var $TMPDIR for local scratch
    3. use /mnt/glusterfs/<your username> for shared data directory
  4. submit the job for processing: "qsub -cwd test_job.script"
  5. monitor the jobs with "qstat -f -j JOBID"
    1. see 'man qstat' for more info
  6. check the output files that you specified in your job script (the input and output files must be in /mnt/glusterfs/)

Any questions, please email 'barley-alpha@lists.stanford.edu'

We plan to roll out to the full Stanford community on Jan 1.

barley software

stock software

The barley machines are running Ubuntu 11.04, and the software is from the Ubuntu repositories

licensed software

current barley policies

  • 100 max jobs per user
  • 3000 max jobs in the system
  • 48hr max runtime for any job in regular queue
  • one week max runtime for the long queue
  • no memory/CPU limits yet

If you have any concerns or suggestions, please mail barley-alpha.

Monitoring / Status

Current status of farmshare machines: http://barley-monitor.stanford.edu/ganglia/ More detailed graphs: http://barley-monitor.stanford.edu/munin/

For important announcements, we plan to:

  • modify /etc/motd on the corns
  • send a mail to farmshare-announce
  • duplicate that mail also to the farmshare "blog"

Mailing Lists

We have several mailing lists, all @lists.stanford.edu, most are not used.

  • barley-alpha - temp list till end of Oct/Nov 2011, for discussion around testing the barleys
  • farmshare-announce - announce list (new service name)
  • farmshare-discuss - users discussion (new service name)
  • stanford-timeshare-users - users discussion list for the corn users, list to be retired


Links

Want to learn HPC? Free education materials available:

Getting started with MediaWiki

Consult the User's Guide for information on using the wiki software.

Personal tools
Toolbox
LANGUAGES