Notes from configuration meeting of 16 August 2007 Pierre Lermusiaux Patrick Haley Bryan Hilton Aaron Culich RAID configurations on 3U and 1U servers selected configuration for 3U disks RAID 1 for the two 160GB system disks RAID 5 for the 500GB data disks: 6 in one RAID, 8 in a second RAID selected configuration for 1U disks Combine boot and swap partition on one disk Apply software RAID on remaining 3 disks some reconfiguring needed on current state of disks top 3U: OS ordered disks as : sda 6 disk RAID sdb 2 disk RAID sdc 8 disk RAID and installed OS on sda RAIDS reset and reformatted. 1U: boot and swap partitions currently on separate disks Ethernet Switch Configuration ethernet switches configured and connected for 2 Gbit interconnect still need to demonstrate 2 Gbit throughput 3U and 1U units (+ corresponding ports on switches) need to be configured for 2 Gbit see http://www.zazzybob.com/tips.php?type=view&tip=0043 Also, 3U units are only booting (and hence known to Rocks) through fast ethernet port. Do we want to keep the fast ethernet port as a boot port and use others for data? From Irwin: The 3U server has three Ethernet ports - two gigabit ports and a fast Ethernet 10/100 port. The Tynan manual I provided you for the Thunder K8SD Pro motherboard shows how to set PXE to enabled or disabled for gigabit vs. fast Ethernet (10/100) ports on pages 51/52. This is a BIOS configuration option. By default, the gigabit ports are disabled for PXE and the 10/100 port is enabled. While you can change these options, I don't believe it is recommended. In researching some cluster configuration boards, I found a recommendation that PXE and other management be conducted on the 10/100 port (as is the default) leaving the gigabit ports dedicated to data and IO communication. Here is an example of such a discussion: http://www.beowulf.org/archive/2003-January/009274.html. I'm sure that Aaron will have an opinion on this. Connection between cluster and MIT backbone needs to be thought-out currently, only connected throught eth1 on frontend Tape & back-up configuration Aaron configured tape drive set-up to be accessible on frontend web server needs to provide description on how we access this small area (e.g. users home areas) could be snapshot frequently (hourly) snapshots do not need to be saved on tape Monitoring & diagnostics Nagios (see cluster_software.html) SY TechAID http://www.soyousa.com/products/proddesc.php?id=261 Remote Access IPKVM (see also http://en.wikipedia.org/wiki/KVM_Switch) Raritan cards - BIOS level and up remotely blades unlikely to have slots for card 3Us have spare room for card 1U need to replace the card with NICs 3 & 4 Teradici cards - same space issue as Raritan KVM + power control Aaron proposes: We need full remote access to the console of the following four machines: * 1U back-up server * 3U storage node (there are two of these) * Rocks front-end node (currently this is a blade, but I recommend that you purchase a new machine to put in the rack along with the 3Us that will be the new Rocks front-end + web server + print server (the latter two can be virtualized servers running on the same box)). The console access must be full-control: both text & graphic mode capable. We also need to be able to power on/off the machines remotely. Maybe that is accomplished with a card inserted into the machine (that is probably the costly route); maybe it is a generic 4 or 8 port KVM along with some remote-control power strip. Spare part inventory Only truly crital parts would be spare harddrives for storage at least one of each (160 Gbyte and 500 Gbyte) Have Aaron document what he's doing. How he's configuring things What we need to do when things go wrong What are the typical things he looks for How he was accessing the tape drive Document cluster set-up what OS on each component what software on each component how many machines for grid (compute nodes) what are primary functions of 3U and 1U units spreadsheet of MAC addresses and IP addresses (this data available from Rocks after everything is installed) ---- configure kerberos with Brandon: I imagine it wouldn't be too different from doing these two articles, given Rocks is built on CentOS 4.5, which itself is a free version of RHEL 4 Update 5. * http://itinfo.mit.edu/answer.php?id=7706 - Configuring Kerberos on a RHEL4 machine * http://itinfo.mit.edu/answer.php?id=7551 - Using my Kerberos password to login In order to do this, everyone must use their kerberos username for the server. I'd like to test this with another machine first to ensure it would work. ------------------------------------------------------------------------------- Immediate (minimal) install task list: -------------------------------------- insert-ethers on other devices so Rocks doesn't try use them as compute nodes install rocks on 1u system w/o compute node make it I/O node PVFS2 compilers back-up http://www.bacula.org/ http://amanda.sourceforge.net/ create user accounts ssh access: Aaron suggests login via ssh-keys only (keys w/ passwords) key files for user accounts copy data from Harvard compilers PVM make sure it's installed matlab just basics. no need for parallel version yet screen others can attach and watch session can be recorded Wiki utility plone could integrate ganglia w/ plone on mseas rss feed on latest results/files subversion (svn) - version control, newer ------------------------------------------------------------------------------- For longer term management/software plans see http://modelseas.mit.edu/download/cluster/manage_notes3.txt For complete list of software to be installed see http://modelseas.mit.edu/download/cluster/cluster_software.html (includes other candidate software, e.g. Rocks competitors)