We propose to integrate numerous network and application performance monitoring tools into a scalable and secure infrastructure providing measurements, analysis and access to data. We propose to incorporate and extend the existing IEPM-PingER and IEPM-BW infrastructures, and extend the NIMI configuration and control software. We will also incorporate tools whose development is already being funded by the DoE/MICS SciDAC initiative.
Scientific research has greater computing and networking power than ever before, but it is recognized in the scientific community that the increases in power also bring with them major challenges. The deployment of Grid technologies and a production quality Grid Service requires detailed information on network performance. It is widely believed that what is needed is to develop a ubiquitous monitoring infrastructure that would not only provide the measurements seen in multiple monitoring projects but also provide the novel addition of allowing us to co-ordinate and integrate tools in a co-operative framework.
The monitoring project described here is distinguished from other monitoring work in the following ways:
The ability to allow authorized users run tests on demand. This includes the ability to specify who can run which tests, and even how much bandwidth a test may utilize.
Access to a large number and range of network paths by leverage the current PingER, IEPM-BW and NIMI deployment.
Project members who are leaders in the GGF, allowing this work to become standardized for the Grid.
Integration of a measurement infrastructure with application adaptation.
The proposal also complements other projects. For example, the Internet2 End-to-End performance initiative (e2epi) and their Performance Evaluation Station (PIPES) [PIPES] project has been designed to aid end-to-end trouble-shooting for Internet2 connected Universities. By working with the PIPES group to implement the MAGGIE publishing scheme we expect to increase our ability to gain insight to the networks used by the DoE science community.
enhancements will be made to the NIMI system and infrastructure.
Support for fundamentally new types of measurement:
in which, rather than scheduling measurements for execution at a particular
future time, measurements are conducted in an on-going fashion without requiring
external interaction, and the results are streamed to a common data collection
Spot measurement, which supports immediate measurement execution and streamlined delivery of results, for use in trouble-shooting.
Adaptive measurement, in which a series of measurements can be scripted by the user, with the execution of the later measurements guided by the immediate results and analysis of the earlier ones, without requiring external coordination or interaction.
Complete development of the "packet injection daemon" that exists currently as a partially implemented prototype, to serve as a common basis for providing customized packet creation in support of a wide variety of measurement.
Design and development of resource control and security policy enforcement components for the packet injection daemon, so it can serve as a single point of policing for all measurements conducted using the NIMI probe.
Modification of tools in the NIMI measurement suite to use the packet injection daemon. Assist developers, in particular those being funded by the SCIDAC [SciDAC] initiative, to integrate their tools into the NIMI framework.
Porting the full NIMI system, including measurement tools suite, to Linux.
On-going operation of the Central Point of Contact (CPOC) NIMI administrative center, used for defining and exporting NIMI probe configurations, NIMI user authentication and authorization, and NIMI infrastructure self-monitoring.
Network monitoring data is particularly important to Grid middleware [Globus] such as the Replica Manager. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate predictions of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.
The following tasks are required to make this network monitoring information useful to Grid applications and middleware.
Perform a classification
of various types of network measurement characteristics, and determine which
measurement tools provide each characteristic
Determine "standard" names for the network characteristics that are of use to Grid applications
Work with Grid
application and middleware developers to determine their requirements for
network measurement data.
Define new, higher-level "derived characteristics", that make it easier for Grid middleware to determine what to do. For example, the "closeness" of 2 Grid resources, such as storage and compute cycles, is a combination of delay, bandwidth, and loss.
Define standard schemas to describe network measurement data, and define standard publication and archival mechanisms (i.e.: SOAP [SOAP], WSDL [WSDL], etc.)
Many of these tasks will be done working closely with the Global Grid Forum, and we will work to standardize this work to enable interoperability between all Grid projects.
After defining what to measure, how to represent the data, and how to publish the data, then the following tasks are required:
Develop a GMA-based [GMA] Web Service to publish network measurement data
Integrate this web service with both caches of recently collected data, and with archives of historical data
Integrate this web service with NIMI
Develop and/or integrate a distributed peer-to-peer query mechanism to locate monitoring data from multiple sources or archives.
If demand arises,
we will evaluate the use of passive monitoring techniques within the MAGGIE
framework. The participants in this project are experienced with various tools,
Netflow. SLAC has developed a package to analyze data gathered by the Cisco netflow feature.
SNCM. The Self-configuring Network Monitor [SNCM] developed at LBNL is a project addressing the need for a network monitoring infrastructure to support passive network monitoring.
Bro. The Bro package [Bro] also developed at LBNL is probably best know as intrusion detection software, but is also capable of identifying network performance characteristics.
Making use of
measurements to assist with engineering, grid-middleware, trouble-shooting,
setting end-user expectation and a myriad of other tasks is the reason monitoring
is performed. We will undertake the following to contribute in those areas.
Currently, the measurement results produced via NIMI are packaged and shipped to a predetermined repository referred to as the Data Analysis Client (DAC). Post-processing of the results, when performed, are done outside of the NIMI architecture. On the other hand, post-processing analysis and reporting is an integral part of the IEPM-BW system. We will extend the IEPM-BW analysis and reporting to the NIMI DAC data.
We will compare and contrast the various active measurement tools that have been ported to the NIMI environment, with each other to identify the regions of applicability, the expectations of accuracy and the time needed to make a measurement and the resources required. We will feed this information back to the developers.
Research and develop conversions between different types of network measurements. That is, predicting the outcome of a data transfer from low-impact test.
Monitor, analyze and report on the performance of networks and grids relating to the SciDAC community and major scientific collaborators.
We will develop a web site to explain the project, report the results, and allow navigation around the large amount of results. The reports will allow user selection of metric (e.g. delay, loss), time scale (separation between measurements and time window), paths (e.g. group by affinities such as collaboration).
Visualization of large amounts of data in a meaningful form has always been a challenge to researchers conducting network monitoring. We will work actively with developers in this field to make results compatible with their tools. In particular we will work with CAIDA [CAIDA] to evaluate the applicability of the tools they have developed.
We will work with CAIDA to integrate the data into their Trends project [trends] and leverage the data from other projects in the MAGGIE analysis mechanism.
Security is a primary concern of the NIMI architecture. The Access, Authorization and Authentication aspects of NIMI are already being funded and is not part of this proposal. However, the development of the packet daemon for policy enforcement and policing is part of this proposal and provides extra security. Security in terms of publishing the data is also a serious consideration. The developers will work with the Global Grid Forum to implement standards and enable control of the data.
We estimate the following full time effort (FTE) to complete the tasks described in this document:
NIMI Development (section 3) – 0.75 FTE
CPOC maintenance (section 3) – 0.25 FTE
Publishing to Grid Middleware (section 4) – 1.00 FTE
Troubleshooting (section 6) -1.00 FTE
Total is 3 FTE.
Expression of interest from kc claffy for CAIDA
Deploying and testing network performance tools at high performance ESnet and ESnet collaborator sites will greatly assist in understanding and validating the scaling properties of the proposed tools/algorithms.
CAIDA supports this proposal and would like to collaborate with the MAGGIE team to assist in deploying, integrating and configuring the tools in their measurement probes. CAIDA will also assist in adapting their existing visualization tools.