[DE]Glen Posted January 30, 2017 Share Posted January 30, 2017 We have a saying at the office, "It's not my fault but it's my problem," and nowhere but networking does this phrase get used more. If you've been playing Warframe for long enough you'll have seen some pretty epic battles we've fought to work around troublesome networking equipment -- like that time when when we set up automatic proxy servers around the world to help people with difficult NAT setups. It's been quite an adventure and I'd like to take a moment to explain some of the changes we've made recently, what problems we encountered, what new problems we recently became aware of, and what we're going to do about it. First let's go back to Christmas Eve: like the year before our network was under attack and we were neglecting our families to try to keep the game online. Bourbon helped, but so did a new network that we'd been setting up. We weren't quite ready to deploy it but after 4 attacks in 24 hours we decided to risk it and flipped the switch. We watched anxiously that night and spent the better part of our holiday break watching signs of trouble that never came -- the new network held! The great thing about the new network is that it's faster: I could feel the interactions with our servers responding more quickly! I knew that on paper we'd cut a good amount of round-trip time off of most connections, especially for Europeans, but I was thrilled to notice that it actually felt better to me personally (this was particularly impressive during the heavy load of many, many Tenno playing on the holidays!) Unfortunately not everyone was impressed. I began to hear a lot about the dreaded "Network Not Responding" (NNR) pop-up and began a lengthy investigation with many, many emails back and forth with our network partners. Before I tell you what caused it let me explain what this warning means: whenever a request to our servers takes longer than 10 seconds to respond we show the NNR spinner to warn you that there's a problem. It could mean that our servers are being attacked, the hamsters have fallen off their wheels, or it's time for you to go kick your modem -- in any event we thought it would be good to let you know that there was something up so you know why things are taking so long. The nice thing about NNR is that it only shows up for very specific services; you might be able to stay in your mission because the host for your squad might be unaffected; this gives you a chance to wait for the network to settle down before you extract to make sure we can save your progress. At least that was the idea until the new network started causing people to get this message for no reason and suddenly it wasn't particularly helpful any more. It took about 3 weeks of diagnostics, conference calls, and all kinds of weird stress-tests before we figured out what was wrong. We even set up cloud servers around the world running tests that emulated how the game connects to us but nothing we did could reproduce the problem. It took the diligent and patient help from a number of players, especially Guides of the Lotus, to get us logs and telemetry from people who had the problem. It turns out that the new network, apart from being faster, was much more generous with how long it would let the game stay connected. To reduce server load the old network had to enforce a limit on how long it would wait for clients to reuse an idle connection; since the new network is much, much bigger it can happily leave connections open so that when the game needs to talk to the server it can skip a bunch of work and just start talking right away. There's just one problem with that: some networks will rudely drop connections that they feel have been quiet for too long. Maybe it's because they think you've crashed or maybe they're just overloaded, but either way, they would decide to violate the TCP protocol and just forget all about the connection without telling anyone. If the game went to talk to our servers on one of these "zombie" connections it would take a certain amount of time to realize that it was talking to a corpse before it buried it and reconnected; while it was waiting you'd see the annoying NNR pop-up. The interesting thing about this behavior is that no amount of VPN-testing or cloud-hosted tests would enable us to reproduce it because the data centers that operate these services tend to have good networking equipment. Nobody at the office could reproduce the problem with their home networks either: the only information we had to work was was supplied by players! It's not our fault, but it's our problem, and once we figured out what was going on it was easy to work-around: we just cut the persistent connection idle timeout back to match the old network (because they can't drop your connection if you close it first!). After making the change we anxiously waited for logs from the volunteers who had helped us and were pleased to see that the almost all of the new cases they had were from times when our servers were actually having problems (like on Friday we had a 20x spike in bandwidth on the minute Baro arrived but that's a story for another time!). The trouble with asking for help with an issue like this is that even when you've fixed the main problem people keep sending you logs. In many cases we could correlate their NNR with their PC having trouble connecting to other services as well -- this is a sign of a local problem -- but it occurred to us that it might be something we could help with as well. On Friday we added some new stats to track how stable your router's NAT is but before I show you those numbers let's start with an updated picture of the networks Tenno use to connect to warframe. The following data was collected for PC connections on Saturday, January 28th (the day after Baro Ki'Teer arrived). We've seen the number of players suffering with Strict NAT stay roughly steady over the last year and if anything, the number has increased slightly. We suspect that this is because our automatic proxy service allows people to play even when their network connection isn't as cooperative as we would like. In terms of how Tenno networks automatically forward ports I was surprised to see that over half of the connections trust their routers to just do the right thing (or had to do it manually, we can't tell): I would have expected to see more cases support the configuration protocols but perhaps this is a sign of IPv6 adoption on the rise (carrier-grade NAT doesn't support them). And finally, the graph that really blew our minds: Roughly one in twenty Tenno are on a network that will change their public address or their NAT port-mapping without warning! Imagine if your mobile carrier changed your phone number while you were having a call! "Sorry but you're going to have to call them back and to make it a challenge you have no idea what number to dial!" As we dug into the numbers we found even more horrifying statistics: when your network wants to ruin your day it doesn't just do it once, it does it all the time: if you're one of these unlucky Tenno your network port will change on average 140 times per day and your address on average 5 times! Luckily computers are pretty good at automating things and we should be able to handle some of these scenarios automatically. As time permits we'll be working on automatic workarounds for this totally crazy behavior because even though it isn't our fault, it's our problem. Link to comment Share on other sites More sharing options...
SpeedOfLightPuncher Posted January 30, 2017 Share Posted January 30, 2017 definitely read all of that thanks for the info anyways Link to comment Share on other sites More sharing options...
NovusNova Posted January 30, 2017 Share Posted January 30, 2017 Thanks for the information Glen. Link to comment Share on other sites More sharing options...
Plushy Posted January 30, 2017 Share Posted January 30, 2017 (edited) We believe in you Glen-and-team! Edited January 30, 2017 by Plushy Link to comment Share on other sites More sharing options...
MagPrime Posted January 30, 2017 Share Posted January 30, 2017 Neat Link to comment Share on other sites More sharing options...
Gio21 Posted January 30, 2017 Share Posted January 30, 2017 Nice read and work , keep going :) Link to comment Share on other sites More sharing options...
TennoSimons Posted January 30, 2017 Share Posted January 30, 2017 Thanks so much for this explanation. Much appreciated! Link to comment Share on other sites More sharing options...
trunks013 Posted January 30, 2017 Share Posted January 30, 2017 * Raise hand sadly * i'm in the " Roughly one in twenty Tenno " who gets its public IP changed without notice thanks "honest cable comapny" ^.^ Link to comment Share on other sites More sharing options...
(PSN)PrincessDark_ Posted January 30, 2017 Share Posted January 30, 2017 (edited) Thank you for the all the information Glen. I know me and a lot of tenno are very great full for all you're hard work especially when you shouldn't be working but your there fighting back attacks <3. It's much appreciated dude. Edited January 30, 2017 by (PS4)PrincessDark_ Link to comment Share on other sites More sharing options...
Buff00n Posted January 30, 2017 Share Posted January 30, 2017 Thanks for the details. Interesting read! 16 minutes ago, [DE]Glen said: Roughly one in twenty Tenno are on a network that will change their public address or their NAT port-mapping without warning! Or a lot of tenno play using their mobile phone as a hot spot. Either way that's ... appalling. Link to comment Share on other sites More sharing options...
(PSN)Elvenbane Posted January 30, 2017 Share Posted January 30, 2017 I like pie charts. Thanks for taking the time to write this up Glen, very interesting. Link to comment Share on other sites More sharing options...
(XBOX)x Varda x Posted January 30, 2017 Share Posted January 30, 2017 I noticed my ISP (BT) used to change my IP address every few months as Google thought I was somewhere else! Very odd. Don't know if they're still doing it now... Link to comment Share on other sites More sharing options...
xXDeadsinxX Posted January 30, 2017 Share Posted January 30, 2017 Thanks for the information. Link to comment Share on other sites More sharing options...
Praxxor Posted January 30, 2017 Share Posted January 30, 2017 The CS student in me found this very informative, I can only imagine the hell you guys went through, part of the fun in working in this field hides behind a big wall of torture I guess, and that'd be a gross understatement. Link to comment Share on other sites More sharing options...
ashadcaine Posted January 30, 2017 Share Posted January 30, 2017 Guys I have avoided warframe for almost 4 years, in fact my last comment here was at the release of the grustag three. I guess alot of folks won't even remember them days. So as you can probably imagine I was completely blown away when I was finally placed in my liset, the way it orbits the planet or ship of my last mission, i see it circling on my star map. The immersion of it all is light years away from the game I came to enjoy and eventually forsake all those years ago for lack of meaningful content. I just wanted to go on record and thank the developers in particular for the cohesive storyline told by the quest chain of Stolen Dreams, Natah, The Second Dream and The War Within. It has been a much more fulfilling experience and have naturally opened my wallet in appreciation since returning in November. I don't care for the dismal RNG on alot of stuff, but for those few items that have several drop sources I am grateful since it allows me to change things up and still have a shot at the object I'm after, big plus. The fact that I've completely missed opportunities to get certain things that were either events or vaulted, without buying it off someone else or awaiting it to cycle in kiteers stock or some future unknown event is... more than a little disappointing.. but I bide my time. All in all very happy I came back. Game looks fanfreakin tastic. Thanks Devs. Link to comment Share on other sites More sharing options...
Omnipower Posted January 30, 2017 Share Posted January 30, 2017 Even though im not 100% sure i understanding this properly but that last part scares with the example with the cellphone. So telling me that my network/modem/router can be screwing me over with something so simple?! Thanks for the info though i really wish i knew more about networking to fix it myself x.x Link to comment Share on other sites More sharing options...
Tenarsha Posted January 30, 2017 Share Posted January 30, 2017 Excellent article, thanks ! Link to comment Share on other sites More sharing options...
SonicSonedit Posted January 31, 2017 Share Posted January 31, 2017 (edited) 6 hours ago, [DE]Glen said: Luckily computers are pretty good at automating things and we should be able to handle some of these scenarios automatically. As time permits we'll be working on automatic workarounds for this totally crazy behavior because even though it isn't our fault, it's our problem. The problem is that you use almost purely UDP. If you will step away from this dogma just a little bit and let's say, open 1 tcp keep-alive connection, you can watch it's status - there are a few mechanisms that allow you to know when tcp endpoint changes it's adress. You can also send more important data via tcp tunnel, so there would be less desynch with server. Remember when reward appears in your inventory a few minutes late? Yeah these ones. Edited January 31, 2017 by SonicSonedit Link to comment Share on other sites More sharing options...
Valthryn Posted January 31, 2017 Share Posted January 31, 2017 Fantastic work. Thanks for the update. c: Link to comment Share on other sites More sharing options...
Evanescent Posted January 31, 2017 Share Posted January 31, 2017 Hoho, always excited for more network optimizations. Link to comment Share on other sites More sharing options...
Lawmonark Posted January 31, 2017 Share Posted January 31, 2017 (edited) 11 hours ago, [DE]Glen said: -snip quote This is great news. I wont bump a support ticket i put in... But i suspected something like this. That any time i left a hosted group or hosted, then disbanded. The game would just "forget" my ports and upnp. Glad you guys have working on this. Edit: One question i have. Why does Warframe have upnp and Nat-pmp Settings? This is the only game i have seen... use these settings. Do console players also get these option? I know by default console connect via upnp and Nat. (NOT nat-pmp) Edited January 31, 2017 by Krhymez Link to comment Share on other sites More sharing options...
Bibliothekar Posted January 31, 2017 Share Posted January 31, 2017 16 hours ago, [DE]Glen said: Roughly one in twenty Tenno are on a network that will change their public address or their NAT port-mapping without warning! Question to Legal: Could you disclose a list of ISPs that do that so we (as their customers) could kick them in the bollocks? Or would that only get you into a lot of trouble? Link to comment Share on other sites More sharing options...
[DE]Glen Posted January 31, 2017 Author Share Posted January 31, 2017 9 hours ago, Krhymez said: Edit: One question i have. Why does Warframe have upnp and Nat-pmp Settings? This is the only game i have seen... use these settings. Do console players also get these option? I know by default console connect via upnp and Nat. (NOT nat-pmp) You mean why is Warframe awesome? I'm sure I can find a gif for that. And that's two questions. Link to comment Share on other sites More sharing options...
Lawmonark Posted January 31, 2017 Share Posted January 31, 2017 30 minutes ago, [DE]Glen said: You mean why is Warframe awesome? I'm sure I can find a gif for that. And that's two questions. Well, i did start with one question in mind. But i would like to know. Are those same options on console? Through trouble shooting this issue, i found a few threads about these setting. There was one post were it was suggested to disable both if you port forward. Seems it was a post by you: From trying to "fix" issues i was having... i found that these settings were the issue. Even a few people in game that i have spoken to, have or had the same issues. In your statistics Nat-pmp was used very little. If that is the case, why is it enabled by default? Does it cause any issues to have it enabled? If find that it does. Link to comment Share on other sites More sharing options...
(PSN)HyperInfestation Posted January 31, 2017 Share Posted January 31, 2017 Looking good boss. We appreciate all the hard work. Link to comment Share on other sites More sharing options...
Recommended Posts