Notes from configuration meeting of 16 August 2007
   Pierre Lermusiaux
   Patrick Haley
   Bryan Hilton
   Aaron Culich

RAID configurations on 3U and 1U servers
   selected configuration for 3U disks
      RAID 1 for the two 160GB system disks
      RAID 5 for the 500GB data disks: 6 in one RAID, 8 in a second RAID

   selected configuration for 1U disks
      Combine boot and swap partition on one disk
      Apply software RAID on remaining 3 disks

   some reconfiguring needed on current state of disks
      top 3U:
         OS ordered disks as :  sda   6 disk RAID
                                sdb   2 disk RAID
                                sdc   8 disk RAID
         and installed OS on sda

         RAIDS reset and reformatted.

      1U:
         boot and swap partitions currently on separate disks

Ethernet Switch Configuration
   ethernet switches configured and connected for 2 Gbit interconnect
      still need to demonstrate 2 Gbit throughput

   3U and 1U units (+ corresponding ports on switches) need to be configured
   for 2 Gbit
      see http://www.zazzybob.com/tips.php?type=view&tip=0043
   Also, 3U units are only booting (and hence known to Rocks) through fast
   ethernet port.  Do we want to keep the fast ethernet port as a boot port and
   use others for data?
      From Irwin:
         The 3U server has three Ethernet ports - two gigabit ports and a fast
         Ethernet 10/100 port.  The Tynan manual I provided you for the Thunder
         K8SD Pro motherboard shows how to set PXE to enabled or disabled for
         gigabit vs. fast Ethernet (10/100) ports on pages 51/52.  This is a
         BIOS configuration option.  By default, the gigabit ports are
         disabled for PXE and the 10/100 port is enabled.

         While you can change these options, I don't believe it is recommended.
         In researching some cluster configuration boards, I found a
         recommendation that PXE and other management be conducted on the
         10/100 port (as is the default) leaving the gigabit ports dedicated
         to data and IO communication.  Here is an example of such a
         discussion: http://www.beowulf.org/archive/2003-January/009274.html.
         I'm sure that Aaron will have an opinion on this.

   Connection between cluster and MIT backbone needs to be thought-out
      currently, only connected throught eth1 on frontend

Tape & back-up configuration
   Aaron configured tape drive set-up to be accessible on frontend web server
      needs to provide description on how we access this
   small area (e.g. users home areas) could be snapshot frequently (hourly)
      snapshots do not need to be saved on tape

Monitoring & diagnostics
   Nagios (see cluster_software.html)
   SY TechAID   http://www.soyousa.com/products/proddesc.php?id=261

   Remote Access
      IPKVM (see also http://en.wikipedia.org/wiki/KVM_Switch)
      Raritan cards - BIOS level and up remotely
                      blades unlikely to have slots for card
                      3Us have spare room for card
                      1U need to replace the card with NICs 3 & 4
      Teradici cards - same space issue as Raritan
      KVM + power control
         Aaron proposes:
            We need full remote access to the console of the following four
            machines:

            * 1U back-up server
            * 3U storage node (there are two of these)
            * Rocks front-end node (currently this is a blade, but I recommend
              that you purchase a new machine to put in the rack along with the
              3Us that will be the new Rocks front-end + web server + print
              server (the latter two can be virtualized servers running on the
              same box)).

            The console access must be full-control: both text & graphic mode
            capable. We also need to be able to power on/off the machines
            remotely. Maybe that is accomplished with a card inserted into the
            machine (that is probably the costly route); maybe it is a generic
            4 or 8 port KVM along with some remote-control power strip.

Spare part inventory
   Only truly crital parts would be spare harddrives for storage
      at least one of each (160 Gbyte and 500 Gbyte)

Have Aaron document what he's doing.
   How he's configuring things
   What we need to do when things go wrong
     What are the typical things he looks for
   How he was accessing the tape drive

Document cluster set-up
   what OS on each component
   what software on each component
   how many machines for grid (compute nodes)
   what are primary functions of 3U and 1U units
   spreadsheet of MAC addresses and IP addresses
      (this data available from Rocks after everything is installed)
   
---- configure kerberos with Brandon:
 I imagine it wouldn't be too different from doing these two articles, given
 Rocks is  built on CentOS 4.5, which itself is a free version of RHEL 4 Update
 5.

  * http://itinfo.mit.edu/answer.php?id=7706 - Configuring Kerberos on a RHEL4
                                               machine
  * http://itinfo.mit.edu/answer.php?id=7551 - Using my Kerberos password to
                                               login

 In order to do this, everyone must use their kerberos username for the server.
 I'd like to test this with another machine first to ensure it would work.

-------------------------------------------------------------------------------
Immediate (minimal) install task list:
--------------------------------------

insert-ethers on other devices so Rocks doesn't try use them as compute nodes
install rocks on 1u system w/o compute node make it I/O node
PVFS2
compilers
back-up
   http://www.bacula.org/
   http://amanda.sourceforge.net/
create user accounts

ssh access:
Aaron suggests login via ssh-keys only (keys w/ passwords)
key files for user accounts

copy data from Harvard

compilers

PVM make sure it's installed

matlab
  just basics.  no need for parallel version yet

screen
   others can attach and watch
   session can be recorded

Wiki utility
plone
   could integrate ganglia w/ plone on mseas
   rss feed on latest results/files

subversion (svn) - version control, newer
-------------------------------------------------------------------------------
For longer term management/software plans see
   http://modelseas.mit.edu/download/cluster/manage_notes3.txt

For complete list of software to be installed see
   http://modelseas.mit.edu/download/cluster/cluster_software.html
(includes other candidate software, e.g. Rocks competitors)