DataGrid Wide Area Network Monitoring Infrastructure
(DWMI)
Connie Logg
February 13-17, 2005

History
Originally done for SC2001 demo and called IEPM-BW
After SC2001, development continued
FNAL picked up IEPM-BW and adapted it to their site
In Spring 2004 – redesigned for TeraPaths monitoring project
Currently still called IEPM-BW, and deployed at 4 sites

Architecture - I
Use MySQL database
All configuration is in the database so the code can self configure
Allows flexibility for adding new types of data
Written in perl
Low impact probes (currently abwed, traced, and pingd) have daemons that run independently
High impact probes have a daemon (bw-synchd) which insures that high impact probes do not run simultaneously and that there is a break between each test.

Architecture - II
Results from all probes written to a data directory and are loaded by load-datad daemon which assures that the data base is not bombarded by hundreds of writes simultaneously.
Analysis scripts run every hour or two depending upon how long they take
Plot data, traceroute reports, master web page generation

MySQL Database Tables - I
NODES – Each node has an entry and its specs (latitude, longitude, contact, paths, et al.)
MONHOST – Active monitoring host(s) information (web/cgi paths, data analysis specs, et al.)
TOOLSPECS – Probe specifications (probe, probe options, frequency, testtype, et al.)

MySQL Database Tables - II
Many types of tests possible
 background – low impact tests which can run concurrently (traceroute, ping, abwe)
 background-syn – Tests which must be run one at a time (iperf)
On demand – to be implemented

MySQL Database Tables - III
SCHEDULE
 scheduler inserts probe requests into the SCHEDULE
Daemons read SCHEDULE table for the probes they are responsible for within the “current” timeframe, and run the probes.
All results are written to a data directory and loaded by the data loading daemon

APIs and other utilities
Fetch-ping-data
Fetch-abwe-data
Fetch-trace-data
Fetch-bw-data (e.g iperf)
Fetch-trace-data
Etc..
All take a nodename and timespan and return a filename where the data is stored

Data Analysis
Time series plots – group and individual
Diurnal analysis & fitting
Traceroute analysis
Bandwidth Change Analysis – will be augmented by other methods currently be researched and developed

CGI Utilities – in development
Add and update NODES
Add and update TOOLSPECS
Add and update MONHOST
Interactive data analysis

Informational Web Pages
Table of defined NODES
Table of defined MONHOST
Table of TOOLSPECS – probe specifications
Description of data base tables
Report on data logging for past few weeks
PLM – needs updating
Others to come – every time I have to look at something for validation, I create a web page

Futures
Make data available via web services
Interactive data analysis CGIs
Add additional probe types
Develop complete distribution kit – complicated by differing locations and versions of perl, gnuplot, mysql, graphics libs, ploticus, iperf, etc.
Add additional anomaly detection techniques

Summary
The objective is to provide for regular and reliable network probe testing and data collection from several locations around the world
Make the data available to the community
Provide a framework for the incorporation of a variety of analysis tools

Acknowledgements
Many people have contributed content to this system over the years

Questions & Considerations
BWCTL –
not installed everywhere and it is one more thing I would need to install as part of the distribution kit and maintain
Does not do multiple iperf streams
May want other heavyweight tests that bwctl does not provide for
OWAMP – special NTP configuration