Cluster SoftWare Notes

    For cluster software installation/management:
  1. Rocks (own Linux distribution, RedHat Enterprise Linux 4 - RHEL4 - based, adds capabilities via "Rolls" - prepackaged groups of software).
    • Provisioning
    • Monitoring
      • CPU load
      • free memory
      • disk usage
      • network I/O
      • operating system version
      • dead nodes
    • Interprocess: MPI (PVM probably via SGE)
    • Scheduler: SGE,Condor
  2. Oscar (Toolkit on top of other Linux distributions)
    • Provisioning
    • Monitoring
      • CPU load
      • free memory
      • disk usage
      • network I/O
      • operating system version
      • dead nodes
    • Interprocess: PVM,MPI
    • Scheduler: SGE,Maui,Torque
  3. SCore (Installs on top of CentOS 4.3 (RHEL4 clone).
    • Provisioning
    • Monitoring
      • Job Queue
    • Interprocess: MPI,PVM(w/limitations)
    • Scheduler: PBS,SCore-D
  4. OpenSCE (Toolkit on top of other distributions, works nicely with Rocks)
    • Provisioning
    • Monitoring
      • CPU Utilization and information (brand, clock rate, details)
      • Memory information and usage
      • Disk information and I/O rate
      • Network interface information and usage rate
      • Page/Swap/Context Switching rate
      • Interrupt information
      • System temperature and fan speed via ACPI or LM Sensor
      • File system usage information (per partition basis)
    • Interprocess: MPI
    • Scheduler: SQMS
  5. Perceus (More geared towards diskless clusters, should still work with local storage instead, costumizable)
    • Provisioning
    Warewulf (More geared towards diskless clusters, should still work with local storage instead, costumizable)
    • Monitoring
    • Interprocess: MPI (PVM probably via SGE)
    • Scheduler: SGE,Torque
  6. SCali Commercial toolkit above other Linux distributions
    • Provisioning
    • Monitoring
      • Looks to have Reboot/Shutdown capabilities via PBS?
    • Interprocess: MPI
    • Scheduler: PBS
  7. Verari Command Center Commercial toolkit for Verari BladeRack2 clusters
    • Standard
      • Monitoring
        • Blade Power management (power up/down indiv or groups of blades)
        • Blade insertion/removal
        • CPU fan speed
        • memory status
        • rack temperature
        • rack fan speed
        • power status
        • LED status/control
    • Advanced
      • Provisioning
      • Monitoring: Standard +
        • CPU utilization
        • memory usage
        • disk usage
        • network traffic
    • Enterprise
      • Provisioning
      • Monitoring: Advanced +
        • E-mail alerts for specific events
    MPI/Pro® Offered through Verari.
    ClusterController® Verari scheduler for Microsoft® Windows.
  8. BigBrother
    • Monitoring
  9. Nagios (Open Source)
    • Monitoring
      • Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)
      • Monitoring of host resources (processor load, disk and memory usage, running processes, log files, etc.)
      • Monitoring of environmental factors such as temperature (?requires hardware sensors?)
      • Contact notifications when service or host problems occur and get resolved (via email, pager, or other user-defined method)
  10. Halcyon "PrimeAlert Adapter for Netcool provides the capability to integrate Sun Management Center (Sun MC) alarms into Netcool." Does this restrict Halcyon to Sun systems?
    • Monitoring
    Rolling Back-ups
  1. Easy Automated Snapshot-Style Backups with Linux Uses the rsync facility of Linux. Includes links to many other variants.
  2. Bacula Tape backup freeware
  3. Amanda Tape backup freeware
    Fast communications over Gigabit Ethernet
  1. GAMMA and MPI/GAMMA Would have required a second LAN for non-GAMMA IP traffic (e.g. NFS)
  2. Parastation The Opteron version seems to be a beta version.
    • Interprocess: MPI
    • Scheduler: LSF,PBS-Pro,OpenPBS, (possibly SGE)
  3. SCore (see above)
  4. SCali MPI (see above, commercial)
    Parallel Filesystems
  1. PVFS2 Open source parallel filesystem, does not require kernel modifications
  2. GPFS IBM product. Appears to be for IBM servers only.
  3. Lustre Very high performance parallel filesystem, requires extensive kernel modifications
  4. SFS: HP's commercial version of Lustre
    Fortran compilers for Linux
  1. A set of different comparisons from polyhedron
    • Opteron Benchmark Notable results
      1. Pathscale EKO Fortran Compiler 2.4
      2. Absoft Pro Fortran 10.0.3 GA
      3. The Portland Group Compiler 6.2-4 (PGI)
      4. Intel Fortran Compiler
        • A couple of very bad results: "Capacita" & "Fatigue"
        • One warning against using Intel Fortran Compiler on non Intel chips
    • Intel Benchmark Notable results
      1. Intel Fortran Compiler
      2. Pathscale EKO Fortran Compiler 2.4
      3. Absoft Pro Fortran 10.0.3 GA
      4. The Portland Group Compiler 6.2-4 (PGI)
    • Supported language extensions
    • Diagnostic Capabilities
  2. Himeno Benchmark
    • Portland & Fujitsu compilers show more benefits from tuning compiler flags than does Intel compiler.
  3. Opteron Benchmark compiled by DisCO
  4. A list of FORTRAN compilers for linux, including some more benchmarks.
    Software Development Kits
  1. Absoft High Performance Computing Software Development Kits (HPC SDK)
    • Includes Absoft, PathScale, Intel or IBM compilers (F77/90/95, C/C++)
    • Fx2 debugger
    • math libraries
    • MPI distributions
    • tracing tools
  2. Allinea DDT The Distributed Debugging Tool
  3. MultiCore Plus SDK No FORTRAN support. May be limited to Mercury systems.
  4. Intel Software Development Products Not a single package but a page listing many Intel products. See also Original Intel page.
    • Intel compilers (F77/90/95, C/C++)
    • Intel Vtune analyzers
    • Intel performance libraries
    • Intel threading analysis tools
    • Intel cluster tools
  5. gdb
    Cluster Set-up
  1. ROCKS+support Support Rolls + 3rd party rolls. Stumbled onto this through one of the Older (4.2.1) Rocks Rolls.
    Matlab
  1. The Mathworks Announces Breakthrough Parallel Algorithm Development
    Security
  1. firewall?

Cluster Hardware Notes

    DDR2 memory
  1. DDR vs. DDR2 - What it means to you.
  2. Introduction to DDR-2: The DDR Memory Replacement
  3. DDR2: a Soon-to-be DDR Replacement. Theoretical Basis and First Low-level Test Results

ProCurve Switch 2800 Series. specifications