Before diving into lambdas, trees, and joy functions, we'll start this overview of Foursquare's Snap-to-Place technology with background on unique aspects of Foursquare's proprietary dataset and the extensive research we've put into our home-grown algorithms. In this post, we'll explain the power of phone's-eye-view data and how we enhance our location insights via customized applications of machine learning methods.
The Power of Phone's-Eye-View Data
With our established and massive (~12 billion) check-in dataset, we're able to frame location intelligence problems as supervised learning problems, with the goal of creating a model of the world over time using labeled data points. Many others in our industry, by contrast, have to solve unsupervised learning problems, with the goal of discovering patterns in unlabeled data from which models can be built. The problem we are solving is quite different, as we can see below.
The top image shows how the world looks to a mobile phone (unsupervised setting), with each black dot representing a visit, as viewed by the noisy location measurement mechanisms on a phone. The bottom image shows how that data appears when each point is tied to a user checking in to a venue (supervised setting).
That second image represents what we at Foursquare call phone's-eye-view data. Phones use a variety of black-box methods to estimate the latitude and longitude of a user, including wifi signals and cell-tower triangulation. Occasionally, a mobile device will fire up GPS sensors when the OS decides it's worth the battery spend. At the end of the day, the error in mobile location data is greater than the distance between two venues in an urban area, as we see above. The red, blue and green rectangles above are the cartographic locations of three venues in New York City. The corresponding dots are labeled check-ins from users; notice that almost none of the dots are where they're supposed to be (but they're also not random).
The Snap-to-Place algorithm at Foursquare, which we call VenueSearch, is a non-linear model (a gradient-boosted decision tree) built from R&D based on billions of labeled check-ins that we use to snap billions of other unlabeled visits to venues. A venue is any place in the world that we know about, from mom-and-pop restaurants to concert halls to boutique fitness studios. VenueSearch supports the APIs that myriad companies — such as Uber, Samsung and Twitter — use to identify and tag locations, as well as the visit data panel that drives Foursquare's enterprise business products.
Even with these labels, real-world data gets messy, like in the example of a New York City block below. While the colors correspond correctly to venues, it's not obvious where exactly these venues are. So while we have an edge on our competitors from our data, the problem is not less complex. It instead means that we have more interesting problems to solve.
Challenges for the Industry: Why is Indoor Retail so Hard?
There are several open challenges in the location intelligence space that remain largely unsolved. It is perhaps obvious that dense locations pose a challenge. However, sparse areas pose a different challenge, as do places where the third dimension (altitude) matters — not to mention places that are unpopular. For example, consider a houseware store on the fifth floor of a building that also contains generic office space. Let's say there's a popular coffee shop on the ground floor next to a new restaurant that just opened up. A user of our core location technology, which we call Pilgrim, stops somewhere in the vicinity but does not check in. So where did they go? Were they at the houseware store, their office job, the coffee shop or the restaurant?
Indoor retail is the confluence of many challenges in the space: a dense area with relatively small venues that change ownership and stack vertically. At Foursquare, we have addressed this problem by framing it as an explore/exploit learning setting.
Framing Business Challenges as Explore/Exploit Opportunities
Each venue in the Foursquare database is represented as a probability distribution function of visit frequency across space and time, like the example of Madison Square Garden below. The more data we have at a given venue, the more clarity our model has. A major differentiator for us is that these “place shapes" are often separate from where the venue is cartographically, due to the inherent bias in phone's-eye-view location measurements.
An initiative at Foursquare has been to go after less well-mapped venues by framing our challenges as explore/exploit problems, also known as “multi-armed bandit" problems. In machine learning, a multi-armed bandit is a “gambler" with many different “games" to play. Each game has some probability of victory, but that probability has to be learned by playing. The gambler has to balance risk, opportunity cost and expected value.
Consider the previous case of the coffee shop near the houseware store. It's likely that the coffee shop is the most popular place in the building based on user check-ins. We have historical evidence that tells us our likelihood of success if we guess the coffee shop. Imagine that 90% of the times we've guessed this coffee shop, we were right. Analytics of this data alone cannot always tell us enough about the places we might have missed. In order to get better as a gambler, the bandit must put money into other games. We have to predict the houseware store sometimes in order to learn.
The framing of VenueSearch as an explore/exploit problem gives us levers to play with. We can decide when to make a safer guess and when we're playing with house money. One of our core services, the contextual ping (shown above), has a required accuracy level that we maintain, call it P. We want P% of our pings to be at the correct venue with no action from the user needed. Using that as our baseline, we have the option to be more or less aggressive based on how well we are hitting our standards. Accuracy is a currency we can spend in the pursuit of improving our map of the world.
The above chart shows how we can trade off small amounts of accuracy (in the form of confirmation rate). For a set of interesting commercial venues we experimented with, we were able to double our volume of confirmed visit predictions at those venues while taking about a 2.6% relative hit in confirmation rate. Having a set of strategies we can observe as we turn our knobs gives us a Pareto Frontier to choose from. As we will describe, understanding these Pareto trade-offs enables us to express trade-offs through utility functions; internally we call these joy functions, because they allow us to think about product trade-offs in terms of the direct benefits to user happiness and client happiness.
Getting the Algorithm to Compromise: Importance Weighting and Lambdas
Continuing on the theme of quantitative compromise, we can coax our model generation algorithm to care about venue coverage at scale by rewarding it accordingly. Decision tree generation is a greedy process, and the resulting trees, while pseudo-random, optimize for an objective function and the provided examples. As far as the algorithm knows, these examples are the only events in the world. Importance Weighting allows us to motivate the model by applying weights that reward the model more heavily for getting interesting places right (but still offer no reward for wrong answers).
The side-by-side charts above show what this enables. Like many natural phenomena, user check-in behavior follows Zipf's Law (above right), the distribution that models how there are a greater amount of unpopular places than popular places. A naive sampling distribution is shown on the left. A textbook machine learning approach is popularity biased; we observe popular places too often. By reweighting our examples, we can make our training set match Zipf's Law. The degree of aggressiveness in our reweighting approach is what we internally call the “lambda factor." This name comes from lambda, a variable we slide between 0 and 1 that provides a continuous transform between the two above distributions, allowing us to smoothly tune the aggressiveness of our model.
Overall (above) we were able to achieve a 12% increase in distinct venue detection while also improving our performance at stores within commercial complexes, while losing only 0.5% of our accuracy. Effectively, the model leans toward newer venues, but only in cases where the decision is close. Some venue categories traditionally see less consistent foot traffic volume (e.g., mattress stores, jewelry stores), and this method improves our ability to detect visits for some of these categories by 50% or more.
Joy Functions and the Pursuit of Specific and Diverse Venue Data
As we continue to improve our pioneering Snap-to-Place technology, we will be focusing on a holistic view of the value of commercial visits to our clients and to ourselves, so that we can continue to generate novel information about our commercial and social world.
A subtle distinction that continues to impact our thinking is the difference between a user check-in action and a confirmed visit prediction. The likelihood of a user check-in happening and the likelihood of a prediction getting confirmed are not the same. For example, it is reasonable for a user to check-in at JFK Airport and also confirm our predicted visit at the same time to the Muji shop inside the terminal. These data points are different but both correct, because the choice of social broadcast biases the expression of the data.
One of our approaches to balancing this is to wrap our visit predictions in linear utility functions, which we call “joy functions." When we make a prediction speculatively, we aim to predict where the user was in real life; we also factor in how likely it is that they would agree with our prediction and how informative that data point will be if accurate. In this way, we ensure that we are learning only from true data points while still encouraging our users to respond to predictions that are of higher value and contain greater information density. The more specific, diverse and novel a visit is, the more we learn about our users and how they interact with the world. The correctness of a prediction doesn't capture all aspects of its value to our business and to client happiness, and joy functions help us capture more of that value.
Snap-to-Place technology ties physical location to meaningful venues and thus enables the connected view of the world we have at Foursquare. Our investment and commitment to machine learning continues to be a driving force behind our business and our belief that the connectedness and insights of our data are more powerful than the sheer volume we possess.