Earlier this summer, we shipped an update to Foursquare on Android and iOS focused on giving each user a selection of “top picks" as soon as they open the app. Our goals with this new recommendation system were to improve the level of personalization and deliver fresh suggestions every day. Under the hood, this meant a rethinking of our previous recommendations flow.
Previous iterations of our recommendation flow relied on a fairly traditional search pipeline that ran exclusively online in our search and recommendations service.
- O(100's) of candidate venues are retrieved from a non-personalized store such as our venues ElasticSearch index
- Personalized data such as prior visit history, similar venue visits, friend/follower prior history is retrieved and used to rank within these candidate venues.
- For the top ranked venues we choose to show the user, short justification snippets are then generated to demonstrate why this venue matches the user's search
This works well for an intentful searches such as “pizza" where a user is looking for something specific but is restricting for broader query-less recommendations. For broad recommendations, the depth of personalization becomes limited by the size of the initial set of non-personalized candidate venues in the retrieval phase. Simply increasing the size of this set of candidates online would be computationally expensive and increase request latencies past acceptable limits so we looked towards better utilizing offline computation.
To establish a larger pool of personalized candidate venues, we have created an offline recommendation pipeline. For each user, we have a pretty detailed understanding of the neighborhoods and locations they frequent with the technology we call Pilgrim. Given these locations for a user, we generate a ranked personalized list of recommendations via a set of Scalding jobs on our Hadoop cluster. These jobs are then run at a regular interval by Luigi workflow management and then served online by HFileService, our immutable key-value data service which uses the HFile file format from HBase.
The personalized sources of candidate venues come from a set of offline “fetchers" also implemented in Scalding:
- Places friends / people you follow have been, left tips, liked and saved
- Venues similar to those you've liked in the past, both near and far
- Places that match your explicit tastes
For our more active users, there can be thousands of candidate venues produced by these fetchers, an order of magnitude more than our online approach. We can afford to consider such a large set since we're processing them offline, out of the critical request path back to the user.
Several non-personalized sources of candidate venues are also used.
- The highest rated venues of various common intents (dinner, nightlife, etc).
- Venues that are newly opened and trending in recent weeks
- Venues that are popular with out-of-town visitors (if the user is traveling)
- Venues that are vetted by expert sources like Eater, the Michelin Guide, etc.
The non-personalized sources not only provide a robust set of candidates for new users whom we don't know much about yet, but also provide novel and high quality venues for existing users. While personalization should skew a user's recommendations towards those they'll find relevant and intriguing, we want to avoid creating a “personalization bubble" that misses great places just because the user doesn't have any personal relation to them.
For each homepage request, the recommendation server logs which venues have been shown to HDFS via Kafka. These server-side logs are combined with client-side reporting of scroll depth, giving us a combined impressions log of which venues we have previously shown users, so we can avoid showing repeated recommendations. This impression information is used for both ranking and diversification.
Each candidate venue is individually scored with a combination of signals, seeking to balance factors such as venue novelty, distance, personalized taste match, similarity to favorites, and friend/follower activity. The top ~300 candidates are then kept in an HFile and served on a nightly basis.
Some product requirements are difficult to fulfill solely via scoring each venue independently. For instance, it is undesirable to show too many venues of the same category or to show only newly opened restaurants. To introduce diversity, before selecting the final set of venues to show, we try and enforce a set of constraints while also maintaining the ranked order of the candidate venues.
Selecting a list of venues to show isn't the end of the process. Every venue that makes it onto a user's home screen comes with a brief explanation of what's interesting (we believe) to them about this venue. These “justifications" are the connective tissue between the sophisticated data processing pipeline and the user's experience. Each explanation provides not only a touch of personality but also a glimpse into the wealth of data that powers these recommendations.
To accomplish this, the “justifications service" (as we call it internally) is responsible for assembling all the information we know about a venue and a user, combining it, ranking it, and generating a human readable explanation of the single most meaningful and personalized reason the user may be interested in this place.
Broadly speaking, the process can be divided into four stages: Data fetching -> Module execution -> Mixing/Ranking -> Diversification/Selection. Each type of justification that the system can produce is represented by an independent “module". The module interface represents a simple IO contract, describing a set of input data and returning one or more justifications, each with a generated string and a score. Each module is designed to run independently, so after all the data is fetched, the set of eligible modules runs in parallel. When each module has had an opportunity to produce a justification, the candidates are merged and sorted. A final pass selects a single justification per venue, ensuring that not only are the most relevant justifications chosen but that there is a certain amount of diversity in insights provided. And this happens at runtime on each request.
Here are just a few examples of how the finished product appears in the app:
With the product in the hands of users, we're working on learning from user clicks to improve the quality of our recommendations. We're also running live traffic experiments to test different improvements to our scorers, diversifiers and justifications. Finally, we're improving the online layer so recommendations can quickly update in response to activity in the app such as liking or saving a venue to a to-do list. If you're interested in working on search, recommendations or personalization in San Francisco or New York, check out foursquare.com/jobs!