At Foursquare, tracking and improving server-side response times is a problem many engineers are familiar with. We collect a myriad of server-side timing metrics in Graphite and have automated alerts if server endpoints respond too slowly. However, one critical metric that can be a bit harder to measure for any mobile application is user perceived latency. How long did the user feel like they waited for the application to startup or the next screen to load after they've tapped a button? Steve Souder gives a great talk about the perception of latency in this short talk.
For a mobile application like Foursquare, user perceived latency is composed of several factors. In a typical flow, the client makes an HTTPS request to a server, the server generates the response, the client receives a response, parses the response and then renders it.
We instrumented Foursquare clients to report basic timing metrics in an effort to understand user perceived latency for the home screen. Periodically, the client batches and reports these measured intervals to a server endpoint which then logs the data into Kafka. For example, one metric the client reports is the delta between when the client initiated a request and when the first byte of the response is received. Another metric the client reports is simply how long the JSON parsing of the response took. On the server-side, we also have Kafka logs of how long the server spent generating a response. By combining client-side measured timings with server-side measured timings using Hive, we are able to sketch a rough timeline of user perceived latency with three key components: Network transit, server-side time, and client-side parsing and rendering. Note that there are many additional complexities within these components, however this simple breakdown can be a useful starting point for further investigation.
The above bar chart shows a composite request timeline that is built using the median timing of each component from a sample of 14k Foursquare iPhone home screen requests. In this example, the user might wait nearly two seconds before the screen is rendered, and most of it was actually due to network and client time rather than server response time. Let's dive into network and client time deeper.
The next chart below splits out requests in Brazil versus requests in the US.
The state of wireless networks and the latency to datacenter are major factors in network transit time. In the above comparison, the median Brazil request takes twice as long as one in the US. At Foursquare, all API traffic goes through SSL, to protect user data. SSL is generally fine for a connection that has already been opened, but the initial handshake can be quite costly as it typically requires two round-trips additional to a typical HTTP connection. It's absolutely critical for a client to reuse the same SSL connection between requests, or this penalty will be paid each time. Working with a CDN to provide SSL early termination can also be incredibly beneficial at reducing the cost of your first request (often the most important one, since the user is definitely waiting for it to finish). For most connections, the transmission time is going to dominate, especially on non-LTE networks. To reduce the number of packets sent over the wire, we eliminated unnecessary or duplicated information in the markup and we were able to cut our payload by more than 30%. It turns out, however, that reducing the amount of JSON markup also had a big impact on the time spent in the client.
The amount of time spent processing the request on the client is non-trivial can vary wildly depending on the hardware. The difference in client time in the US vs Brazil chart is likely due to the different mix of hardware devices in wide use in the market. For example, if we were to plot the median JSON parsing times across different iPhone hardware, we would see a massive difference from older iPhone 4's to the latest iPhone 6's. Although not as many users are on the older hardware, it's important to understand just how much impact needless JSON markup can have.
In addition to JSON processing, another important topic for iOS devices is Core Data serialization. In our internal metrics, we found that serializing data into Core Data can be quite time consuming and is similarly more expensive for older hardware models. In the future, we are looking at ways to avoid unnecessary Core Data access.
A similar variation can be found across Android hardware as well. The chart below shows the median JSON parsing times of various Samsung devices, (note that the Android timing is not directly comparable to the iPhone timing, as the Android metric is measuring the parsing of the JSON markup to custom data structures while the iPhone measurement is parsing straight to simple dictionaries).
In our next engineering blog post, we will discuss some critical fixes that were made in Android JSON parsing.
Measurement is an important first step towards improving user perceived latency. As Amdahl's law prescribes, making improvements on the largest components of user perceived latency will of course have the largest user impact. In our case, measurements pointed at taking a closer look at networking considerations and client processing time.