Current status of cluster issues as of 3 August 2007 -------------------------------------------------------------------------------------- Needed parts ------------ * new motherboard for 1U (should be on order) * 2 160Gb disks (arrived 2 Aug 2007) * new power supply for compute-9-8 (should be on order) * 3 replacement blades (should be on order) replace misconfigured * 2 160Gb disks (arrived 2 Aug 2007) * 2 HP ProCurve 2848 ethernet switches (should be on order) * 6 Opteron 250 CPUs (should be on order) upgrade front-end * 8 DIMMS (2 Gbyte each) -------------------------------------------------------------------------------------- 1U storage managing system serial number: 679629 ----------------------------------------------------- (1) Unable to boot. Rocks cannot install operating system because front-end does not see DHCP requests from 1U unit. * new motherboard should be on order (2) missing 2 disks * missing disks arrived (2 Aug 2007) --------------------------------------------------------------------------------------- Compute 9-8 serial number 687737 ----------------------------------- (1) Series of error messages sent to front-end every few seconds. Filled /var partition by 29 Jul 2007. Messages indicate CPU or RAM issues to Aaron. Field Engineer (Chiu) reseated RAM. Messages have not reappeared yet. (2) Field Engineer (Chiu) observed inoperable fan on power supply of blade. * new power supply should be on order * new blade? --------------------------------------------------------------------------------------- Compute 6-9 serial number 687743 ----------------------------------- (1) Series of error messages sent to front-end every few minutes Helped fill /var partition (29 Jul 2007). Messages indicated disk drive failure to Aaron. * replacement disks arrived (2 Aug 2007) * new blade? --------------------------------------------------------------------------------------- Compute 9-10 serial number 687748 ------------------------------------ (1) Blade was one of many that did not boot up due to improperly configured ProCurve ethernet switch. The power switch on the front of the blade itermittently responsive. Field Engineer (Chiu) reset all ProCurve ethernet switches to their factory default settings. This allowed all compute nodes to be booted. (2) On boot, only 1 CPU and 2 Gbyte RAM were seen. Field Engineer (Chiu) reseated CPUs and RAM. All were seen after booting (3) While reseating, Field Engineer (Chiu) noticed that one fan on the blade's power supply was frozen. Field Engineer (Chiu) replaced the power supply. (4) After power supply replaced, again only 1 CPU and 2 Gbyte RAM were seen. Field Engineer (Chiu) reseated CPUs and RAMS. Blade failed to reboot. * blade sent back to verari (1 Aug 2007) * replacement blade should be on order. --------------------------------------------------------------------------------------- Replaced ethernet switches top 2848 switch serial number SG429PN060 --------------------------------------------------------------------------------------- Blades with Opteron 248 CPUs Compute 8-1 serial number 687761 Compute 9-8 serial number 687737 Compute 11-1 serial number 687769 --------------------------------------------------------------------------------------- Blades with insufficient memory Compute 11-2 serial number 687778 (bad memory slot, repositioning RAM fixed) --------------------------------------------------------------------------------------- Ethernet switches w/ insufficient ports three 2824 switches in middle cabinet --------------------------------------------------------------------------------------- Blades that have been worked on and seem fine now Compute 3-12 serial number 684456 Compute 5-11 serial number 684458 Compute 6-12 serial number 687770