Next Previous Contents

10. System Monitoring Software

10.1 bWatch

bWatch is a GUI Beowulf Cluster Monitor. It displays load averages, memory, swap, number of processes and users for all nodes in a single window. bWatch is available from http://www.sci.usq.edu.au/staff/jacek/bWatch.

NOTE: The bwatch.rpm shipped with the S.u.S.E Linux distribution installs in /usr/X11R6/bin and assumes that the wish interpretor is also installed under /usr/X11R6/bin. Red Hat Linux installs wish under /usr/bin, hence bWatch won't run. You can simply edit the first line of /usr/X11R6/bin/bWatch and change it from #!/usr/X11R6/bin/wish to #!/usr/bin/wish.

10.2 Using httpd and CGI scripts.

One way of obtaining statistics from your beowulf cluster is via httpd running on your server node, and a CGI script. The idea is that the CGI script executes remote shells to the node your are querying, and formats the retrieved information into a HTML page which the httpd server sends to your browser. This is a very easy way of checking the system performance from anywhere in the world as long as there is a browser and an Internet connection. There is an example index.html file at ftp://ftp.sci.usq.edu.au/pub/jacek/beowulf-utils which calls the CGI script getinfo.cgi.

10.3 Netpipe

Netpipe is a very good network performance testing tool which enables to check the throughput of TCP, MPI, and PVM of different size packts. You can use gnuplot or a spreadsheet to plot the results produced by Netpipe. You can find NetPIPE at http://www.scl.ameslab.gov/Projects/ClusterCookbook/nprun.html

10.4 Network performance: netperf

Source: http://www.netperf.org/netperf/NetperfPage.html

Run Script:


./netperf -t UDP_STREAM -p 12865 -n 2 -l 60 -H NODE -- -s 65535 -m 1472
./netperf -t TCP_STREAM -p 12865 -n 2 -l 60 -H NODE

NODE is the remote node name.

10.5 Parallel Performance: NASA parallel Benchmarks

Source: http://www.nas.nasa.gov/NAS/NPB/

10.6 CMS

There is a package called CMS (Cluster Management System). It is available from http://smile.cpe.ku.ac.th/software/scms/index.html. This version is new, and we have not had time to test it. The previous version worked well except for the remote (real time) monitoring. It does include a system reboot and shutdown feature.


Next Previous Contents