ęCopyright 2004, NIIT and SLAC.
projects home maggie tools meetings presentations team docs

MAGGIE

Measurement and Analysis for the Global Grid and Internet End-to-end Performance

Abstract

Introduction

Developing NIMI

Publishing to Grid Middleware

Characterizing the Traffic Mix

Analysis, Reporting and Trouble-Shooting

Security Considerations

Man Power

Support

Bibliography

Abstract

We propose to integrate numerous network and application performance monitoring tools into a scalable and secure infrastructure providing measurements, analysis and access to data. We propose to incorporate and extend the existing IEPM-PingER and IEPM-BW infrastructures, and extend the NIMI configuration and control software. We will also incorporate tools whose development is already being funded by the DoE/MICS SciDAC initiative.

1. Introduction

Scientific research has greater computing and networking power than ever before, but it is recognized in the scientific community that the increases in power also bring with them major challenges. The deployment of Grid technologies and a production quality Grid Service requires detailed information on network performance. It is widely believed that what is needed is to develop a ubiquitous monitoring infrastructure that would not only provide the measurements seen in multiple monitoring projects but also provide the novel addition of allowing us to co-ordinate and integrate tools in a co-operative framework.

The monitoring project described here is distinguished from other monitoring work in the following ways:

The ability to allow authorized users run tests on demand. This includes the ability to specify who can run which tests, and even how much bandwidth a test may utilize.

Access to a large number and range of network paths by leverage the current PingER, IEPM-BW and NIMI deployment.

Project members who are leaders in the GGF, allowing this work to become standardized for the Grid.

Integration of a measurement infrastructure with application adaptation.

The proposal also complements other projects. For example, the Internet2 End-to-End performance initiative (e2epi) and their Performance Evaluation Station (PIPES) [PIPES] project has been designed to aid end-to-end trouble-shooting for Internet2 connected Universities. By working with the PIPES group to implement the MAGGIE publishing scheme we expect to increase our ability to gain insight to the networks used by the DoE science community.

2. Developing NIMI

The following enhancements will be made to the NIMI system and infrastructure.
Support for fundamentally new types of measurement:

Continuous monitoring, in which, rather than scheduling measurements for execution at a particular future time, measurements are conducted in an on-going fashion without requiring external interaction, and the results are streamed to a common data collection point.
Spot measurement, which supports immediate measurement execution and streamlined delivery of results, for use in trouble-shooting.
Adaptive measurement, in which a series of measurements can be scripted by the user, with the execution of the later measurements guided by the immediate results and analysis of the earlier ones, without requiring external coordination or interaction.
Complete development of the "packet injection daemon" that exists currently as a partially implemented prototype, to serve as a common basis for providing customized packet creation in support of a wide variety of measurement.
Design and development of resource control and security policy enforcement components for the packet injection daemon, so it can serve as a single point of policing for all measurements conducted using the NIMI probe.
Modification of tools in the NIMI measurement suite to use the packet injection daemon. Assist developers, in particular those being funded by the SCIDAC [SciDAC] initiative, to integrate their tools into the NIMI framework.

Porting the full NIMI system, including measurement tools suite, to Linux.

On-going operation of the Central Point of Contact (CPOC) NIMI administrative center, used for defining and exporting NIMI probe configurations, NIMI user authentication and authorization, and NIMI infrastructure self-monitoring.


3. Publishing to Grid Middleware

Network monitoring data is particularly important to Grid middleware [Globus] such as the Replica Manager. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate predictions of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.

The following tasks are required to make this network monitoring information useful to Grid applications and middleware.

Perform a classification of various types of network measurement characteristics, and determine which measurement tools provide each characteristic
Determine "standard" names for the network characteristics that are of use to Grid applications

Work with Grid application and middleware developers to determine their requirements for network measurement data.
Define new, higher-level "derived characteristics", that make it easier for Grid middleware to determine what to do. For example, the "closeness" of 2 Grid resources, such as storage and compute cycles, is a combination of delay, bandwidth, and loss.
Define standard schemas to describe network measurement data, and define standard publication and archival mechanisms (i.e.: SOAP [SOAP], WSDL [WSDL], etc.)
Many of these tasks will be done working closely with the Global Grid Forum, and we will work to standardize this work to enable interoperability between all Grid projects.

After defining what to measure, how to represent the data, and how to publish the data, then the following tasks are required:

Develop a GMA-based [GMA] Web Service to publish network measurement data

Integrate this web service with both caches of recently collected data, and with archives of historical data

Integrate this web service with NIMI

Develop and/or integrate a distributed peer-to-peer query mechanism to locate monitoring data from multiple sources or archives.

4. Characterizing the Traffic Mix

If demand arises, we will evaluate the use of passive monitoring techniques within the MAGGIE framework. The participants in this project are experienced with various tools, including
Netflow. SLAC has developed a package to analyze data gathered by the Cisco netflow feature.
SNCM. The Self-configuring Network Monitor [SNCM] developed at LBNL is a project addressing the need for a network monitoring infrastructure to support passive network monitoring.
Bro. The Bro package [Bro] also developed at LBNL is probably best know as intrusion detection software, but is also capable of identifying network performance characteristics.

5. Analysis, Reporting and Trouble-shooting

Making use of measurements to assist with engineering, grid-middleware, trouble-shooting, setting end-user expectation and a myriad of other tasks is the reason monitoring is performed. We will undertake the following to contribute in those areas.
Currently, the measurement results produced via NIMI are packaged and shipped to a predetermined repository referred to as the Data Analysis Client (DAC). Post-processing of the results, when performed, are done outside of the NIMI architecture. On the other hand, post-processing analysis and reporting is an integral part of the IEPM-BW system. We will extend the IEPM-BW analysis and reporting to the NIMI DAC data.

We will compare and contrast the various active measurement tools that have been ported to the NIMI environment, with each other to identify the regions of applicability, the expectations of accuracy and the time needed to make a measurement and the resources required. We will feed this information back to the developers.

Research and develop conversions between different types of network measurements. That is, predicting the outcome of a data transfer from low-impact test.

Monitor, analyze and report on the performance of networks and grids relating to the SciDAC community and major scientific collaborators.

We will develop a web site to explain the project, report the results, and allow navigation around the large amount of results. The reports will allow user selection of metric (e.g. delay, loss), time scale (separation between measurements and time window), paths (e.g. group by affinities such as collaboration).

Visualization of large amounts of data in a meaningful form has always been a challenge to researchers conducting network monitoring. We will work actively with developers in this field to make results compatible with their tools. In particular we will work with CAIDA [CAIDA] to evaluate the applicability of the tools they have developed.

We will work with CAIDA to integrate the data into their Trends project [trends] and leverage the data from other projects in the MAGGIE analysis mechanism.

6. Security Considerations

Security is a primary concern of the NIMI architecture. The Access, Authorization and Authentication aspects of NIMI are already being funded and is not part of this proposal. However, the development of the packet daemon for policy enforcement and policing is part of this proposal and provides extra security. Security in terms of publishing the data is also a serious consideration. The developers will work with the Global Grid Forum to implement standards and enable control of the data.


7. Man-power

We estimate the following full time effort (FTE) to complete the tasks described in this document:

NIMI Development (section 3) – 0.75 FTE

CPOC maintenance (section 3) – 0.25 FTE

Publishing to Grid Middleware (section 4) – 1.00 FTE

Troubleshooting (section 6) -1.00 FTE

Total is 3 FTE.

8. Support

Expression of interest from kc claffy for CAIDA

Deploying and testing network performance tools at high performance ESnet and ESnet collaborator sites will greatly assist in understanding and validating the scaling properties of the proposed tools/algorithms.

CAIDA supports this proposal and would like to collaborate with the MAGGIE team to assist in deploying, integrating and configuring the tools in their measurement probes. CAIDA will also assist in adapting their existing visualization tools.

9. Bibliography

[PingER] http://www-iepm.slac.stanford.edu

[IEPM-BW] http://www-iepm.slac.stanford.edu/bw

[NIMI] http://www.ncne.nlanr.net/nimi

[PIPES] http://e2epi.internet2.edu

[SciDAC] http://www.osti.gov/scidac

[Grid] http://www.gridforum.org

[SOAP] http://www.w3.org/TR/SOAP

[WSDL] http://www.w3.org/TR/wsdl

[GMA] http://www-didc.lbl.gov/GGF-PERF/GMA-WG/

[SNCM] http://www-itg.lbl.gov/Net-Mon/Self-Config.html

[Bro] http://www-nrg.ee.lbl.gov/bro-info.html

[CAIDA] http://www.caida.org

[trends] http://www.caida.org/projects/trends/