Last summer, foursquare's employee count had grown a bit beyond our office capacity (as we surged towards 20 employees) and we had people sitting in whatever open space we could find. We were split between floors, parked on folding tables, and crammed into couches and loveseats. In one of those seats, @anoopr was playing around with building a map showing interesting places, which we called “Explore." In the ensuing eight months, we went through several iterations and evolutions to arrive at the recommendations engine we launched two weeks ago as part of the foursquare 3.0 update. This blog post describes the process we used to develop this system, and some interesting tidbits we found along the way.
After that initial discussion, we quickly set up an API endpoint for Explore and started adding and tweaking features. On the server side, there was a web-based test view directly hitting the API with a list of places and a map with pins. We generalized the structure coming through, and started throwing more and more interesting data at it. We started with “popular" — which, as you will see in a moment, is not the easiest thing to define — and tried to put in some time sensitivity. We iterated, quickly. We deployed new functionality multiple times a day and had meetings with the product team every few days to tweak our trajectory and revisit various concepts.
With the results we were seeing, we could already sense that Explore was going to become something awesome. To get some more live feedback, we built a mobile-web version that sensed location and had touch controls. By using mobile-web, we could update the code instantly and tweak often without having users install a new app. Foursquare employees started carrying it around with them and testing out searches to find lunch and dinner.
Our mobile web test client
At this point, it was time to build in some personalization into the algorithm. Collaborative filtering is a common way of personalizing content based on your check-in history or the check-ins of “people like you." We used both, limiting to just people inside your friend graph for the latter. By then we had approximately 10M venues, which means that if we want to compute the similarity between every pair of venues, that is 100 trillion computations (which is a lot!). Luckily by this time we had a more sophisticated grasp on internal analytics (see @rathboma's analytics blog post), so we had a Hadoop cluster setup as part of our Hive deployment. On top of this we layered a package called Mahout, which was relatively easy to modify in order to compute custom similarity scores. By setting some constraints on which scores were significant, it was possible to build the resulting similarity matrix in less than an hour on a 40-machine cluster. By extrapolating how long it would take with a straightforward score calculator I wrote in Scala, computing the full matrix would have measured into days, or even weeks.
One of the hardest parts of building this was determining what the algorithm should do. For example, take the “cold start" — the case where you don't have enough data for a user to provide personalized results. If you want to order these to optimize the average rank in the list returned, you could simply sort by pure check-in count or a moving average of the same. With that method, you get some interesting places, but you don't find some of the hidden gems like Alidoro or Curry-Ya, foursquare staff favorites. Restaurants with larger capacity and longer hours tend to outweigh others, despite the fact that they may be more interesting to the average user. Also, ranking against the user-based feedback (star ratings, for example) from external sources, runs into other problems, like relative differences between rankings. For example, someone could rank both Per Se and Shake Shack five stars, but are they really equal in weight? Something needed to be done to account for all of these deficiencies.
While we're keeping the new “cold start" algorithm as part of our secret sauce, we wanted to give you a closer look into the data that fed the ranking. To get you started, see below for a graphic of 50 randomly sampled, highly-frequented venues from New York state plotted on two axes: number of unique users and average visits per user. You can think of the distance from the origin as the overall popularity. Notice how categories of venues tend to fall in different regions. As one would expect, offices are visited often by a relatively small number of users, and most restaurants are frequented by many people but tend to have a lower average. However, once you zoom into a specific type of venue, you can see the difference between “must see" places like Juniors and places like Meze Grill, a spot at which people tend to be repeat customers.
Although a big component of building a recommendation system is the math, another major component is telling users why they should go to a place. Here at foursquare, we believe very strongly in social encouragement; we tell you which of your friends went to a place and which places you have gone that people also visit. We also believe in the power of loyalty, which means we tell you how many times you have been to a place before. These reasons not only influence the ranking decisions we make but also supplement the experience, maybe sparking a conversation with your friend or remembering a place you haven't been in a while.
Reasons backing up recommendations
The algorithms were a challenge to build and required significant research into both common and esoteric statistical methods. This was trumped only by the massive engineering effort we put in place to power this system. Aside from the pre-computed similarities, all of the retrieval, ranking, and rendering is done online with an average server response time of less than 100ms. Generally, our largest latencies come from database queries; here are a few that were especially tricky:
- Querying an initial corpus to rank — We retrieve a set of venues within the radius you specify as candidate recommendations. For performance reasons, we pull back a fixed number of entries (pro tip: this means that the smaller the radius you set, the more venues we look at per unit area, potentially giving you better results). Given these constraints, we need to have a way of querying our venue database for the top N values sorted by some criteria within a specific geocircle. Most databases simply cannot handle this (at our scale), as it first requires the ability to index geographically and then further requires a way to compound the geo index. Even scanning each record within the circle is prohibitively expensive. There aren't many databases in the world that can do this, but after some tuning and custom logic, MongoDB is able to do it in less than 30ms for a 10mi radius in New York City (hundreds of thousands of venues) using some magic from the folks over at 10gen.
- Where have your friends been? — Pulling back the history of your friends' check-ins at all of the candidate venues is a potentially gargantuan task. Imagine you have 50 friends and an initial corpus of 250 venues. If each friend has checked in twice, that's a potential of 25,000 check-ins we could pull back. At our scale and query volume, this is impossible to process. To get around this, we built a special cache on top of our check-ins that we call the “intersection server" which essentially stores light weight aggregate data about the interaction between users and venues, including check-in count. Now we have an aggregate form, half as many potential records, and lighter weight documents, which are manageable to pull back given our performance constraints.
What's next? We will continue to make improvements to the engine, open up to more platforms, add new features to drive the algorithm, gather feedback, and generally make it more awesome. We've gotten great feedback so far and are glad Explore is the tool so many of you already use to explore your city!
Justin Moore (@injust)
P.S. If you love data as much as we do and want to bring Explore to the next level, foursquare is hiring!