and Analysis for the Global Grid and
Internet End-to-end Performance
(PI) and Warren Matthews
Accelerator Center (SLAC)
Supercomputing Center (PSC)
ICSI Center for
Internet Research (ICIR)
National Laboratory (LBNL)
We propose to integrate numerous network and application
performance monitoring tools into a scalable and secure infrastructure providing
measurements, analysis and access to data. We propose to incorporate and extend
the existing IEPM-PingER and IEPM-BW infrastructures, and extend the NIMI
configuration and control software. We will also incorporate tools whose
development is already being funded by the DoE/MICS SciDAC initiative.
to Grid Middleware.
Characterizing the Traffic Mix.
Reporting and Trouble-shooting.
Scientific research has greater computing and networking
power than ever before, but it is recognized in the scientific community that
the increases in power also bring with them major challenges. The deployment of
Grid technologies and a production quality Grid Service requires detailed
information on network performance. It is widely believed that what is needed is
to develop a ubiquitous monitoring infrastructure that would not only provide
the measurements seen in multiple monitoring projects but also provide the novel
addition of allowing us to co-ordinate and integrate tools in a co-operative
The monitoring project
described here is distinguished from other monitoring work in the following
- The ability to allow
authorized users run tests on demand. This includes the ability to specify who
can run which tests, and even how much bandwidth a test may utilize.
- Access to a large number and
range of network paths by leverage the current PingER, IEPM-BW and NIMI
- Project members who are
leaders in the GGF, allowing this work to become standardized for the Grid.
- Integration of a measurement
infrastructure with application adaptation.
The proposal also complements
other projects. For example, the Internet2 End-to-End performance initiative
(e2epi) and their Performance Evaluation Station (PIPES) [PIPES]
project has been designed to aid end-to-end trouble-shooting for Internet2
connected Universities. By working with the PIPES group to implement the MAGGIE
publishing scheme we expect to increase our ability to gain insight to the
networks used by the DoE science community.
The following enhancements will be made to the NIMI system
- Support for fundamentally new types of measurement:
- Continuous monitoring, in which, rather than
scheduling measurements for execution at a particular future time,
measurements are conducted in an on-going fashion without requiring external
interaction, and the results are streamed to a common data collection point.
- Spot measurement, which supports immediate
measurement execution and streamlined delivery of results, for use in
- Adaptive measurement, in which a series of
measurements can be scripted by the user, with the execution of the later
measurements guided by the immediate results and analysis of the earlier
ones, without requiring external coordination or interaction.
- Complete development of the "packet injection daemon"
that exists currently as a partially implemented prototype, to serve as a
common basis for providing customized packet creation in support of a wide
variety of measurement.
- Design and development of resource control and
security policy enforcement components for the packet injection daemon, so
it can serve as a single point of policing for all measurements conducted
using the NIMI probe.
- Modification of tools in the NIMI measurement suite to
use the packet injection daemon. Assist developers, in particular those being
funded by the SCIDAC [SciDAC]
initiative, to integrate their tools into the NIMI framework.
- Porting the full NIMI system, including measurement
tools suite, to Linux.
- On-going operation of the Central Point of Contact (CPOC)
NIMI administrative center, used for defining and exporting NIMI probe
configurations, NIMI user authentication and authorization, and NIMI
Network monitoring data is
particularly important to Grid middleware [Globus]
such as the Replica Manager. Selecting the best source to copy the data from
requires a prediction of future end-to-end path characteristics between the
destination and each possible source. Accurate predictions of the performance
obtainable from each source requires measurement of available bandwidth (both
end-to-end and hop-by-hop), latency, loss, and other characteristics important
to file transfer performance.
The following tasks are required to make this network
monitoring information useful to Grid applications and middleware.
Perform a classification of various types of network
measurement characteristics, and determine which measurement tools provide each
- Determine "standard" names for the network
characteristics that are of use to Grid applications
- Work with Grid application and middleware developers to
determine their requirements for network measurement data.
- Define new, higher-level "derived characteristics", that
make it easier for Grid middleware to determine what to do. For example, the
"closeness" of 2 Grid resources, such as storage and compute cycles, is a
combination of delay, bandwidth, and loss.
- Define standard schemas to describe network measurement
data, and define standard publication and archival mechanisms (i.e.: SOAP [SOAP],
WSDL [WSDL], etc.)
Many of these tasks will be done
working closely with the Global Grid Forum, and we will work to standardize this
work to enable interoperability between all Grid projects.
- After defining what to measure, how to represent the
data, and how to publish the data, then the following tasks are required:
- Develop a GMA-based [GMA]
Web Service to publish network measurement data
- Integrate this web service with both caches of
recently collected data, and with archives of historical data
- Integrate this web service with NIMI
- Develop and/or integrate a distributed peer-to-peer
query mechanism to locate monitoring data from multiple sources or archives.
If demand arises, we will evaluate the use of passive
monitoring techniques within the MAGGIE framework. The participants in this
project are experienced with various tools, including
- Netflow. SLAC has developed a package to analyze
data gathered by the Cisco netflow feature.
- SNCM. The Self-configuring Network Monitor [SNCM]
developed at LBNL is a project addressing the need for a network monitoring
infrastructure to support passive network monitoring.
- Bro. The Bro package [Bro]
also developed at LBNL is probably best know as intrusion detection software,
but is also capable of identifying network performance characteristics.
Making use of measurements to assist with engineering,
grid-middleware, trouble-shooting, setting end-user expectation and a myriad of
other tasks is the reason monitoring is performed. We will undertake the
following to contribute in those areas.
- Currently, the measurement results produced via NIMI are
packaged and shipped to a predetermined repository referred to as the Data
Analysis Client (DAC). Post-processing of the results, when performed, are
done outside of the NIMI architecture. On the other hand, post-processing
analysis and reporting is an integral part of the IEPM-BW system. We will
extend the IEPM-BW analysis and reporting to the NIMI DAC data.
- We will compare and contrast the various active
measurement tools that have been ported to the NIMI environment, with each
other to identify the regions of applicability, the expectations of accuracy
and the time needed to make a measurement and the resources required. We will
feed this information back to the developers.
- Research and develop conversions between different types
of network measurements. That is, predicting the outcome of a data transfer
from low-impact test.
- Monitor, analyze and report on the performance of
networks and grids relating to the SciDAC community and major scientific
- We will develop a web site to explain the project,
report the results, and allow navigation around the large amount of results.
The reports will allow user selection of metric (e.g. delay, loss), time scale
(separation between measurements and time window), paths (e.g. group by
affinities such as collaboration).
- Visualization of large amounts of data in a meaningful
form has always been a challenge to researchers conducting network monitoring.
We will work actively with developers in this field to make results compatible
with their tools. In particular we will work with CAIDA [CAIDA]
to evaluate the applicability of the tools they have developed.
- We will work with CAIDA to integrate the data into their
Trends project [trends] and leverage the data from other projects in the
MAGGIE analysis mechanism.
Security is a primary concern of the NIMI architecture. The
Access, Authorization and Authentication aspects of NIMI are already being
funded and is not part of this proposal. However, the development of the packet
daemon for policy enforcement and policing is part of this proposal and provides
extra security. Security in terms of publishing the data is also a serious
consideration. The developers will work with the Global Grid Forum to implement
standards and enable control of the data.
We estimate the following full time effort (FTE) to
complete the tasks described in this document:
- NIMI Development (section 3) – 0.75 FTE
- CPOC maintenance (section 3) – 0.25 FTE
- Publishing to Grid Middleware (section 4) – 1.00 FTE
- Troubleshooting (section 6) -1.00 FTE
Total is 3 FTE.
- Expression of interest from kc
claffy for CAIDA
Deploying and testing network
performance tools at high performance ESnet and ESnet collaborator sites will
greatly assist in understanding and validating the scaling properties of the
CAIDA supports this proposal and would
like to collaborate with the MAGGIE team to assist in deploying, integrating and
configuring the tools in their measurement probes. CAIDA will also assist in
adapting their existing visualization tools.