*Internet2 QoS Working Group Meeting* May 7, 2002 Arlington, Virginia *Attendees* Ben Teitelbaum (chair), Internet2 Stanislav (Stas) Shalunov, Internet2 Amela Sadagic, Advanced Network and Services Phil Chimento, Ericsson IPI Phil Devan, Penn State University Steve Schroeder, Penn State University Agatineo Sciuto, NASA/GSFC Alistair Munro, U4EA Tim Chown, UKERNA David Salmon, UKERNA Brandon Saunders, Ohio University Jeff Ogden, Merit Network Hugh Smith, Cal Poly Olivier Martin, CERN Ludek Matyska, CESNET Eric Nielsen, Sylantro Systems Athanassios Liakopoulos, GRNET Paul Love, Internet2 Jeff Boote, Internet2 Gerry Creager, TAMU David Richardson, Pacific Northwest GigaPoP Ben Chinowsky (scribe), Internet2 *Discussion* Teitelbaum: The Internet2 working-group model is very much like IETF's -- all meetings are open, and decisions are made by consensus on the mailing list rather than in face-to-face meetings. The answer to any question is pretty much "write a draft" -- for information, for proposals, for whatever. There's no rfc-editor function for Internet2 yet, so documents don't yet have official status beyond QoS WG approval. Munro: Objectives, deliverables? Teitelbaum: Deliverables aren't standards, they're Internet2 best practices. To paraphrase our charter, the mission of the QoS Working Group (QoSWG) is "to meet Internet2 needs through packet differentiation." Ogden: Is the model working? When you say to write a draft, do people do it? Teitelbaum: It's not working so well so far, but I'm not sure the model is widely understood. Today's agenda: 1. We stopped trying to deploy Premium service more than a year ago. The idea of doing a postmortem is to capture what we've learned from failure; see http://qbone.internet2.edu/arch-dt.shtml. Possibly, we may revive Premium in the future if bandwidth scarcity demands it. 2. Now the focus is on non-elevated services -- services that "drop hints to the network" about how packets should be treated, but without enforcement. QBone Scavenger Service (QBSS; see http://qbone.internet2.edu/qbss/) is the main example so far. 3. Another area of work is in shifting the focus (and burden) to applications -- "put smarts in the end system to deal with loss and jitter in a more elegant way." Amela Sadagic is chairing a design team in this area. This design team is working with Dimitrios Miras--a grad student at UCL--to write a survey paper of application needs and adaptation techniques. See http://www.internet2.edu/qos/wg/apps/. 4. Bandwidth management: two successful sessions so far, one in Tempe (http://www.internet2.edu/qos/wg/200201-Tempe.shtml) and one earlier today here in Arlington (http://www.internet2.edu/qos/wg/200205-Arlington.shtml). 5. Signaling: Phil Chimento plans to join us later and discuss how his group (http://qbone.internet2.edu/bb/) will finish up work on Simple Inter-domain Bandwidth Broker Signalling (SIBBS). 6. QoS needs of the Internet2 Geospatial Working Group (http://www.internet2.edu/html/geospatial.html). Gerry Creager will join us later to explain the QoS needs of this application area and to establish liaison between the QoS working group and the new geospatial working group. 1. Premium Postmortem Teitelbaum: We still want to fix the holes in the QBone Architecture, including signaling, and include concrete directions for operators for Premium configuration. Our thinking got ahead of what was written down -- we still want to write it down, even though for non-architectural reasons we don't think Premium will be deployed anytime in the forseeable future. Stas and I have a draft on why we don't think Premium will deploy; the document is now posted for review under the auspices of the QBone Architecture design team. Barring major objections, it will go to the list for comment in a week or two. The document includes lots of stuff on the economics of guaranteed services generally, and their impact on routers, applications and the end-to-end principle. Munro: What's the timeline for the Premium postmortem? Teitelbaum: There will be two documents -- the first discusses known non-architectural problems and is >95% done and under review by the Architecture design team; the second document, which will bring closure to the premium service architecture work, isn't written yet. 2. Scavenger Service Shalunov: QBSS lets one saturate the link without affecting "normal" traffic, creating a parallel virtual network. Target uses are bulk data transfer (already deployed), distributed applications using idle net capacity (doesn't exist yet, because there's no service that would allow applications to do that), and local policy (we strongly advise against application snooping and using QBSS as a threat against bandwidth-hogging applications). Why use it? There are self-policing users; you can charge less for QBSS service than for best-effort service; administrative policies and social pressure can get people to use it; enforced marking is also a possibility. Status: "it's one year old and it can walk." The service definition has been approved by the QoS WG and has been implemented in Abilene (about 1% of traffic), and various groups have expressed interest. Use on bottleneck links is the intermediate success metric; the ultimate criterion of success would be to have close to 100% link utilizations, with Scavenger traffic causing no adverse effects on best-effort traffic. Xxx: To what degree are people using Scavenger on purpose, and to what degree by accident? Shalunov: Usage was about 0.03% before Scavenger was announced. Xxx: What is peak traffic? Does Scavenger make any difference in Abilene? Shalunov: 2.5 Gbps, and we think not. Xxx: You should design a constrained-resource net so you can test it. Teitelbaum: There may be value for it today at the edge. Shalunov: Right, there are places where QBSS packets get worse service, but not on Abilene and not all the time. QBSS is only useful for intermittent congestion, not zero or complete congestion. The percentage of capacity allocated to Scavenger should be as small as possible. Most routers work in integer percent increments, so it gets 1%, but less would be better. Liakopoulos: I'd expect bigger packet sizes -- who generates the Scavenger traffic? Shalunov: Mostly dorms, from four campuses the last time we checked. We don't know why Scavenger packets are so small. Xxx: Games? Shalunov: The percentage of game traffic on Abilene is small; I don't think that's it. Xxx: If anything should be on Scavenger, it's games. Richardson: It seems like it should be application-specific -- you discourage good applications by blanket application of Scavenger to dorms. Shalunov: It's voluntary and doesn't break anything. Richardson: Wouldn't you want to re-mark traffic based on what application is generating it? Shalunov: We don't think port snooping is a good idea. This is part of a more general bias agains port discrimination, because if we do that, applications will stop letting us know who they are. ... Chown: Are you saying that TCP itself should reflect back the codepoint? Shalunov: I should have more options than just "send me a file" -- for example, to request certain buffer sizes or codepoints. ... Shalunov: Most universities have more outbound than inbound traffic. Chown: You have an Internet2-centric view. The UK fetches a hundred times more data from North America than it sends to North America. Xxx: It wasn't always the case that universities were net data exporters; now it's universal. Shalunov: From a network operator's perspective, Scavenger lets you extend a network's uncongested lifetime and thereby delay upgrades; it may also provide a negotiating tool for the price of packet delivery outside your network. The power-user gains the ability to self-police, thus getting all the bandwidth that no one else wants. Teitelbaum: Scavenger assumes "a separate bottom-feeding queue" -- in the absence of congestion, you raise the risk of packet reordering. Shalunov: Right. Teitelbaum: If it's expensive to fix reordering, would a WRED (Weighted Random Early Detection) version of Scavenger -- with no separate queue -- make sense? Shalunov: We discussed that on the list. Teitelbaum: Right, but David has brought up the issue. Richardson: In the multimedia realm, an application might want to control how it will be hurt. For example, a videoconferencing stream might want to prioritize voice over video. In MPEG there are three frame types: snapshots and two kinds of deltas. Loss of some of these types is harder to recover from than loss of others. Shalunov: Scavenger is not intended for videoconferencing use. Teitelbaum: But would a different version of QBSS be useful for this purpose? Richardson: This relates to the application-adaptability agenda item. Shalunov: Why not just never ask for degraded service? Richardson: You could keep a coherent service by just keeping the I-frames, for example. Within a flow, some packets may be more valuable than others. Shalunov: A WRED flavor of QBSS could be of use -- I haven't encouraged it because Scavenger bursts would affect best-effort jitter. You can't hurt others with QBSS, but you can with WRED. You're talking about giving the network more fine-grained information -- I'm not sure the benefit is worth the cost. Teitelbaum: It's an interesting open question. Shalunov: What are the applications that could benefit from it? Richardson: Any multimedia application. Shalunov: Such applications are less than 2% of Abilene traffic. Xxx: But those applications will need this kind of differentiation to work. Chown: Where are you finding Scavenger packets dropped? Teitelbaum: On Internet1 egress links. Campuses are giving Scavenger traffic Scavenger treatment, not just dropping all of it. Shalunov: There aren't many places where Scavenger packets are getting dropped, because there isn't much congestion. Chown: What kind of router do you need? Shalunov: Any will work; you may need to use the "dirty trick" of classifying by IP address [?] rather than DSCP. Xxx: TCP vs. UDP... Teitelbaum: We discourage people from "blasting non-adaptive streams", whether Scavenger or not. 3. Applications QoS Sadagic: How many applications people and how many network people here? (Show of hands indicates that most are net folk) Sadagic: We created a fellowship to produce a survey paper on application QoS needs, expecting to take about six months from start in January 2002. For example, needs of videoconferencing audio vs. needs of telemedicine audio. For multiple forms of information, performance of one form by itself won't be same as performance with others. For an H.323 application, there are three places you can consider introducing QoS: the network, H.323, and the application. The QoS needs of a particular form of information may vary depending on what it's being used for -- for example, using video for talking heads vs. for a beach cam. Video quality can be assessed both subjectively and objectively. At this point we're working on videoconferencing and VoIP; if you want to add other areas, get in touch. Liakopoulos: Application writers can't be network-aware, as networks change over time. For example, the first H.323 applications were written for a completely different backbone. Sadagic: Congestion is congestion wherever you are -- the awareness we're talking about is network-independent. Liakopoulos: Performance may have to do with protocols being used, not congestion. Sadagic: You can't cover all possible cases, but you can know some things -- for example, transcontinental service can't be faster than about 40ms. You have to explore all possible cases where you can improve QoS, but you do have to do best guesses. Munro: What are the larger goals of this effort? Teitelbaum: Not to be exhaustive, but to use some selected applications to show the right way to think about measuring and meeting QoS needs. Munro: From what I hear from industry, there's lots of research on QoS for e-commerce -- anything where there's an interaction with customers. Nielsen: To me an essential difference is realtime vs. non-realtime -- "better late than never" vs. the reverse. When using TCP for network audio, instantaneous changes in jitter are enough to cause audible problems. There are heuristics to deal with issues like this. Teitelbaum: Right, this is why we want to get jtter-adverse and loss-adverse applications out of the same pool. Nielsen: There are papers that show that when you have a solid stream of bits you get better performance -- there may be a way to take advantage of this. Sadagic: Re Liakopoulos's issue, it's true that applications need to have knowledge of the particular network they're dealing with. Boote: Isn't the idea to classify the things applications need, and come up with, for example, standard socket operations to deal with each? Sadagic: Yes. One difference is between applications that send constantly and applications that send in bursts. Teitelbaum: Using libraries to make it possible to implement application-level QoS is an attractive idea, but to really maximize app-level QoS, you need to know a lot about how the human brain processes the information provided by the specific application. Generic libraries for adaptation, FEC, etc. will only get you so far. Richardson: But aren't there some things that might be common across many applications? Try to help applications that just view the network as a black box. Shalunov: You're talking about congestion control for non-TCP applications. How to do it depends on the nature of the application -- for example, congestion-based vs. rate-based window resizing. Richardson: I'm interested in how an application can learn what the rate and jitter were over a given time period. Shalunov: Of course an application can do this, but the best response will depend on what the application is trying to do. Richardson: Generalize data collection, then let applications do what they want with it. Shalunov: You have to be TCP-friendly -- otherwise you need to run on a different network. Xxx: No, you just need a different queue. Shalunov: Yes, if there's another queue where congestion-friendliness is appropriate. It might be fun to write a library that does equation-based congestion control. Xxx: Are you looking at P2P? Teitelbaum: P2P applications almost all use TCP. Liakopoulos: Are there applications that ask the network "what is the best you can provide me?" Teitelbaum: No, and we don't think they should. We spent two and a half years working on a service that does this, and we pretty much know how to engineer it, but it was almost impossible to deploy in Abilene, and completely impossible to convince people that it's worth deploying. Premium is possible but not desirable. Shalunov: We already have a telephone network. Teitelbaum: Right. If you want hard QoS, buy a circuit -- bundled ISDN or whatever. Xxx: I'd say don't stick with the dumb and cheap Internet we know -- finding a way for companies to offer services they can charge money for is necessary for the Internet to move forward. Teitelbaum: It's a rich space -- while ISPs "don't have a God-given right to make a fortune," it would be good if they made enough to expand capacity. That's a good segue to the campus bandwidth management discussion -- where there's been some discussion of pricing. 4. Campus Bandwidth Management Teitelbaum: There's no charter for this area, but it is part of QoS. Campus links are congested, and there's not enough money to solve the problem by just buying bandwidth. The problem is compounded by P2P traffic being seen as outside the mission of universities. The Tempe meeting discussed middleboxes and various approaches to controlling abusive users. A synthesis of ideas is starting to emerge from this -- Terry Gray suggested writing a book on it, modeled on Educom books from the 1980s -- an overview, case studies, and conclusions. We're still trying to figure out if we have the resources to make this happen. Less ambitious forms of this are also possibilities. There will, at least, be another session on bandwidth management at Joint Techs in Boulder this summer. The starting point for this work is to look at the bandwidth management strategies that people are actually deploying now. Chown: How do middleboxes deal with Scavenger? Shalunov: Usually not at all, usually much more complex -- but you could make one that did. 5. Signaling Teitelbaum: Phil Chimento chaired the SIBBS working group -- Phil, can you justify its continued existence? Chimento: It needs closure, then we'll put it to sleep. We basically need to integrate existing stuff into a single web document, then shut down the group. Xxx: Are SIBBS and the Premium postmortem part of the same thing? Teitelbaum: Yes. I want Phil's group to summarize what we learned in SIBBS in case we need to take Premium off the shelf in the future. 6. Geospatial Creager: We want to partner with other groups, including the Internet2 distributed storage, health sciences and QoS working groups and the external OpenGIS group. Our focus is on how to make commercial applications network-cognizant; they haven't done anything with that. For health, this has uses in epidemiology. We're interested in distributed storage because we expect each group to have its own core data of interest distributed around the network. Some of the data can be very large and very process-intensive, like three-dimensional virtual-reality models you can fly through. Atmospheric, topological, and land-use data are going to get more heavily used as time goes by. Re QoS, Premium isn't happening and Scavenger is only tangentially useful. Maps don't have much need for QoS, but interactive fly-through models have great QoS needs. Boote: Locally or remotely rendered? Creager: Locally. Boote: Latency is more of an issue then. Creager: Acquiring overhead data and being able to use it quickly -- that's what we're asked for and what will be driving this. Boote: You're not going to have bus bandwidths on a single system fast enough to do this for several years. An eight-processor Onyx with four pipes can still only render hundreds of millions of polygons per second -- not nearly enough to do what you're talking about. But it's totally data-dependent. Creager: But we may have a tera-scale grid, and the overhead data is highly parallelizable. Boote: For both problems you've mentioned, you have more issues locally than on the network. Creager: We will have an influence on industry, so we want to have our story straight before we go out to talk to them. Shalunov: The typical case on Abilene is zero loss and zero jitter -- so what do you want? Creager: I always look at the worst case, so I'm never disappointed. We want to be sure we don't break the rules and that we're telling the right story when we leave Internet2 to go talk to other geospatial organizations. The mass of data being accumulated at distributed sites is what's motivated us to form the working group. Teitelbaum: Don't dismiss Scavenger entirely - a main constituency is people who want to move a lot of data, but non-interactively. You might want to look at a way to use this, pushing pre-fetched data to the edges and doing processing there. Creager: I anticipate a smart caching system emerging as we evolve -- I won't dismiss Scavenger though. - FIN -