From:  Christopher Barry <>
Date:  07 Dec 2014 08:21:34 Hong Kong Time

Re: Reducing DNS latency


On Sat, 6 Dec 2014 21:47:27 +0000
"Vulimiri, Ashish"  wrote:

>Hi Patrick,
>Thanks for looking into this.
>> (actually the first link is timing out for me right now - so I have
>> only read the second)
>Sorry, looks like the university website went down.  This should be a
>more reliable link:
>> Quick question - you seem to be using a speculating tone in this
>> email thread about a drop in for getaddrinfo() but the paper
>> indicates this experiment was actually executed for a local Firefox
>> build.. is this how it was done? That seems like a reasonable
>> approach but I want to understand if we're speculating or talking
>> about results.
>For the experiments we used a proxy DNS server: a separate process
>that would listen for requests on localhost:proxy_port, replicate
>them, and send answers back.  When testing replication, we’d change
>the OS DNS server settings to point to the proxy, so that all DNS
>requests would go through the proxy.  When testing the unreplicated
>baseline, we’d revert to the ISP default DNS settings.
>We haven’t yet modified Firefox (or any other browser) to directly
>incorporate redundant DNS requests.
>> If that's the case, I am (pleasantly?) surprised you saw such an
>> impact in page load metrics. I'm not especially surprised that you
>> can do better on any particular query, but a lot of the time our
>> page load time isn't actually serialized on the dns lookup latency
>> because of the speculative queries we do. Maybe its just a
>> manifestation of a huge number of sub-origins or maybe your test
>> methodology effectively bypassed that logic by not finding urls
>> organically. (that would mean telemetry of average browsing behavior
>> would show less of an impact than the lab study).. we've got some
>> additional code coming soon that will link subdomains of origins to
>> your history so that when you revisit an origin the subdomain dns
>> queries will be done in parallel with the origin lookup - I would
>> expect that would mitigate some of the gains you see in real life as
>> well.
>Yes, you’re right, our testing methodology was not very realistic: we
>simply kept repeatedly picking and loading a random website from the
>Alexa top-1000 list.  It is possible that prefetching would perform
>better over a more realistic browsing session.
>> There are two obvious scenarios you see improvement from - 1 is just
>> identifying a faster path, but the other is in having a parallel
>> query going when one encounters a drop and has to retry.. just a few
>> of those retries could seriously change your mean. Do you have data
>> to tease these things apart? Retries could conceivably also be
>> addressed with aggressive timers.
>I can’t speak to total page load times, but one thing I should be able
>to do is look at our raw DNS latency data to see how the improvement
>we’re seeing (in DNS lookup latency) would change if we were to ignore
>all failed requests (no answer before timeout).  This should cut out
>effect #2.  I’m (briefly) traveling soon and won’t have access to our
>archived data but I will figure this out over the next couple of days.
>> Its also concerning that it seems the sum of the data is all based
>> on the comparison of one particular DSL connection and one
>> particular (un-named?) ISP recursive resolver as the baseline. Do I
>> have that right? How do we tell if that's representative or
>> anecdotal? It would be really interesting to graph savings % against
>> rtt to the origin.
>Yes, our page load time numbers are only from two sites -- Firefox on
>an AT&T (Illinois) DSL link vs the ISP’s DNS server, and Chrome on an
>academic network (U of Utah’s) -- and you’re right that a larger scale
>evaluation would be necessary to argue these numbers are
>representative.  But I will note that we did look at raw DNS lookup
>latency a little more extensively, at 15 sites across North America.
>These numbers are in our other paper, the one I linked to above.
>> One of my concerns is that, while I wish it weren't true, there
>> really is more than 1 DNS root on the Internet and the host resolver
>> doesn't necessarily have insight into that - coporate split horizon
>> dns is a definite thing. So silently adding more resolvers to that
>> list will result in inconsistent views.
>Agreed, that is an issue.  One, more limited, implementation that
>would still be feasible would be to see if the OS has multiple DNS
>servers configured, and if yes only replicate queries to those
>servers.  Of course, this is quite different from the scenario we
>tested and would require careful evaluation to see if there’s any
>> also :biesi's concerns are fair to consider.. this is a place where
>> mozilla operating a distributed public service on behalf of its
>> clients might be a reasonable thing to consider if it showed
>> reproducible widespread gains (a mighty big if).. any use of
>> third-party servers (which would include mozilla operated services)
>> also comes with tracking and security concerns which might not be
>> surmountable. All interesting stuff to consider - certainly before
>> any code was integrated.
>> On Dec 6, 2014, at 12:56 PM, Christopher Barry
>>  wrote:
>> On Fri, 05 Dec 2014 22:49:55 +0000
>> Christian Biesinger  wrote:
>>> I think this is something we need to be really careful about,
>>> because this would effectively double (or triple, etc) the load on
>>> the DNS servers we use, I am not sure that the owners of those
>>> servers would be happy.
>> Are you intimating that Firefox has specific builtin DNS servers it
>> uses, independent of the host's configured resolver, or that you as a
>> user, say in a Corporate environment, use multiple DNS servers?
>To lay out the options here, in no particular order:
>1.  Mozilla operates a public-good service, a network of DNS servers
>that Firefox will use to reduce latency.
>2.  Convince a third party to lend their DNS infrastructure out.  Many
>of the DNS servers we used for our experiments publicly advertise
>their DNS service -- Google public DNS, OpenDNS, Level-3 (well, I
>suppose L3 doesn’t quite advertise) -- and it’s conceivable some of
>them could be convinced to support a service like this.  Although I’m
>aware this would be a can of worms.
>3.  Somehow learn a list of DNS servers the user would be willing to
>trust.  Say by checking to see if the OS already has multiple DNS
>servers configured. If I’m not mistaken, Comcast configures
>connections with two different DNS servers; and I’ve been on corporate
>networks with multiple servers configured.
>Advantage from #1 and #2: adding load to the DNS servers would not be
>a concern Advantage from #3: no trust issues

My strong opinion, and indeed it is the understood expectation of anyone
using any application that requires name resolution, is that all
applications always strictly obey the local resolver configuration of
the host running the application. Period. At no time should any
application bypass the local resolver configuration and use name
servers not explicitly specified by the user - for any reason,
regardless of possible performance benefit. If this is what FF is doing
now, I am extremely disappointed in that decision. That behavior
transcends bad design and approaches malware level.

I can understand it if you included a list of DNS servers *you* trust
for convenience, and distribute that as a text file with the app, but
modifying the system's DNS server list (e.g. adding the servers you
might recommend to the system without specific instructions to do so,
or using them directly from the application) is strictly a root or
administrator-only permissions level decision - never an application's.

This behavior should reside at the resolver and/or dhcp server
level, not in any application. Put your idea into a new kind of
resolver daemon that can select from configured dhcp or statically
configured name servers, and let people can run that if they so choose.
This would benefit all name resolution on the system, not just from
within a specific application.