Network Data Operational, Security and Research Stewardship Working Practice

Version 1.0, October 29, 20131
Adopted as Internal Internet2 Working Practice

Introduction

Internet2 collects network flow data from its network to aid in operational support and for research projects. Anonymized flow data is stored for researchers, and raw flow data is collected for operational forecasting. With the exception of a network flow data policy to encompass IPv6, Internet2 has not had a clearly articulated and publicly available flow data policy or practice. Due to increasingly rich service delivery options and a need to monitor them, a new operational and research-support practice for flow data is necessary.

Summary

Internet2 respects the privacy of traffic flowing over its network and will not disclose netflow data unless authorized.

Internet2 will inspect flow data for operations. It will use such data to respond to network security incidents and it will keep summaries derived from flow data to understand network growth and evolution, to engineer the network to meet demand, and to provide management-level summaries of how the network is being used. Raw flow data will not be kept in an unencrypted format for longer than two weeks. Our intention is that any raw flow data will be kept at locations secured by Internet2 and managed by Internet2 direct employees on devices with limited network access, and will only be accessible by authorized employees and subcontractors. In the case of subcontractors, no data will be shared until a data sharing agreement is executed with the subcontractor.

Internet2 also keeps anonymized data for network research, for time limited only by limited storage of data. The current policy is to zero the last 11 bits of IPv4, and 80 bits of IPv6 addresses. Researchers must state a purpose to access the data, intend to publish the results openly, not disclose any netflow data, and are required to give Internet2 credit. If results are published, researchers are asked to provide citations back to Internet2. All authorizations to access anonymized data will be logged in a standard format in Network Services.

Ownership and Control

Internet2 owns all flow data associated with the Internet2 network. Internet2 will not delegate ownership of or sell its flow data to any other organization, including subcontractors.

Operational Data Policy

Internet2 will inspect sampled network flow data collected throughout the network as well as occasionally take full packet captures of specific links for network operations and security assurance. Internet2 will also process data online to provide operational, engineering, and management summaries.

Internet2 will minimize the amount of raw network flow data or packet captures kept on disk. However, it may be kept for up to two weeks to allow for incident response and correlation in the summaries.

In special instances, data sets may be kept longer; however, the data sets will be destroyed at the end of the specific activity. Any raw flow data is kept inside Internet2, on devices with limited network access, and is only accessible by authorized employees and contractors. An administrative procedure to log and track these exceptions will be kept in the office of the Vice President of Network Services.

Summaries derived from network flow data may be kept indefinitely, but will not identify traffic characteristics that are more granular than participating institution, type of traffic transmitted and other summary information. Summaries will be used to understand network growth and evolution; to engineer the network to meet demand; and to provide management-level summaries to Internet2 and its members of how the network is being used. Management level summaries targeted to a particular institution may be shared with that institution.

In the event an Internet2 member requests information about its own usage of the Internet2 network, Internet2 will make reasonable attempts to provide views of data that only include the information that member could reasonably have gathered on their own by analyzing their own network connections to Internet2. Further, Internet2 will seek a designated representative (or representatives) that are approved by the Office of the CIO at that institution before sharing data.

Raw packet captures or traces such as network “sniffer”, pcap or other capture information, may be required from time to time to analyze network operational issues. Internet2 will treat such data as at least as sensitive as raw flow data, and Internet2 shall assure that the information is managed and shared with 3rd parties only within defined contractual relationships. Internet2 will seek contractual assurances that controls are in place for the data to be destroyed after it is used for diagnostics.

Research Data Policy

Internet2 collects and makes available anonymized network flow data from the Internet2 Network for research and other approved purposes under administrative controls that attempt to balance privacy concerns with the benefits of providing data for research purposes.

IPv4 Address Anonymization

Internet2 sanitizes IPv4 network flow data by having the low order 11 bits of each unicast IPv4 address zeroed before data is released for analysis, leaving the remaining 21 bits of each 32 bit IPv4 address intact.

For context, most sites have subnets somewhere in the /23-/25 range, which means that in general while it is possible to use the masked IP addresses to tie a given network flow record to a particular institution, it is not possible to localize IPv4 data down to a unique subnet.

That level of anonymization is designed to insure that a sufficient amount of user traffic will be inseparably “pooled” or “comingled,” thereby precluding the mapping of any given network flow record to a particular user or other identifiable campus activity.

IPv4 Multicast addresses are not anonymized, as they do not present any privacy risk.

IPv6 Address Anonymization

Internet2 sanitizes IPv6 network flow data by having the low-order 80 bits zeroed, leaving only the remaining 48 bits of the IPv6 addresses for analysis. As of 2010, it was determined that only an 80-bit mask could be relied on to adequately protect the privacy of IPv6 user traffic while additional empirical data is collected. To date, Internet2 has not been able to collect IPv6 data. The latest generation of routers installed in Internet2, however, supports Netflow version 9. Thus, as soon as it is practical to collect IPv6 flow data, research as described in the “Interim IPv6 Netflow Anonymization Policy v1.0” should be performed with community members to understand if the mask restriction might be relaxed.

When IPv6 flows are collected with Netflow v9, MAC address fields will be zeroed.

IPv6 Multicast and 6 to 4 addresses are not anonymized, as they don’t present any privacy risk.

Using Anonymized Flow Data

Researchers desiring to use Internet2’s anonymized flow data must ask for approval so that Internet2 can keep track of the utility of the data, and the projects that use it. Researchers are asked about the project; in general the flow data is intended for research uses where the results are published to advance scientific knowledge. Researchers are asked to use the data only for that project (they may ask again to apply to new projects), acknowledge the use of Internet2 data, assert that the data will not be shared with any 3rd party for any reason and report back publications that use the data.

The current specific procedures for use are obtained by sending a request to rs@internet2.edu with the following information:

  1. A brief description of the research project, including a title
  2. List the project leads and participants
  3. Include URLs if appropriate and available
  4. Indicate any potential issues with data resulting from the project, including any potential privacy issues.
  5. Should the project be listed as a participant on the Internet2 Observatory web page?
  6. Submit an id and password to be used with rsync
  7. Submit a range or a set of individual ip addresses that will be used to access the data (range can be e.g., /28, /30, /32, etc.)
  8. Indicate any recommendations for additional data sets.

In addition, please indicate your agreement with the following conditions:

  1. Please ensure that the data remains inaccessible to anyone outside the given project
  2. Please employ the data only for the project/analysis for which it was provided
  3. If it is desired to employ the data for another task, please submit a request to rs@internet2.edu
  4. Researchers are encouraged to cite the use of this data in papers and articles. If so used, please give Internet2 attribution for the data. An example citation might be as follows: “This project has benefited from the use of measurement data collected on the Internet2 network as part of the Internet2 Observatory Project.”
  5. Please return pointers to research results and any papers generated
  6. If Internet2 data is used in research papers or articles, please send future citations to be included with the above information.

Once this request is received, it is reviewed by the Office of the Chief Technology Officer of Internet2. If it is reasonable, and passes basic sanity checks, the id, password, and list of IP addresses used for access is added to the anonymized network flow server.

Exceptions to Practice

Any request for an exception to this practice must receive written approval of the Vice President for Network Services or the President and CEO of Internet2. A written log of any exceptions must be kept in the office of the Vice President for Network Services.

If required by law and upon advice of legal counsel, Internet2 will comply with lawful requests to disclose netflow data. These requests will be disclosed to the President and CEO and Vice President of Network Services to the extent permitted by law.

1 This document updates, clarifies, and supersedes the “Interim Internet2 IPv6 Netflow Anonymization Policy,” v1.0, dated April 16, 2010.