Over the past decade, changes in retail usage have increasingly resulted in venue closure, from small music stores to books shops. These trends have been attributed to multiple factors including a shift towards e-commerce and changing spending preferences. Venue closure, however, is a complex issue that is often a result of many intertwined factors. To better understand and account for some of these factors we built a machine learning model which predicted venue closure in ten cities around the world with 80% accuracy.
Social media provides a rich source of data to examine the patterns of its users through their posts, interactions, and movements. These datasets have created opportunities for research as their fine granularity can help build robust models with a complex understanding of user trends. In our research, we model the movements of individuals to venues in urban areas to predict whether a given venue will close down. This research could be used to inform policymakers and business owners. Further, it has the potential to impact licensing agreements by local authorities where an analysis of the likelihood of venue closure in an area could be considered as a factor in the process.
Machine learning is a powerful framework tool which has quickly risen in prominence for its ability to automatically identify patterns in data. Using venue demand and transportation data, we devised a number of metrics which were then analyzed for their effectiveness as predictors of venue closure. Our first dataset was from Foursquare, a location recommendation platform, which included check-in details of anonymous users and represented venue demand over time. We additionally used transportation data from taxis trajectories which detailed the history of pickup and drop-off points of thousands of anonymous users; these represented dynamics between different areas of a city.
The metrics we devised broadly fell into three categories: neighborhood profile, customer visit patterns, and business attributes. When looking at the neighborhood profile, we examined the area surrounding a venue, such as the diversity of different venue types as well as competition. Customer visit patterns represented popularity trends of the venue of interest. These metrics included the range of time during which a venue was popular and the similarity of a venue’s popularity times to that of its local competitors. Business attributes defined basic properties such as the price tier and venue category.
These metrics enabled us to model how closure predictions differs between new and established venues, how these predictions varied across cities, and which metrics were the most significant predictors of closure. We were able to more accurately predict venue closure in established venues which suggested that new venues are susceptible to higher variations in their causes of closure. When comparing multiple cities, we found that the types of metrics that were useful predictors vary from city to city. This suggested that factors affect cities in different ways. However, consistently across almost all ten cities, we saw the following three indicators as significant predictors of a venue’s closure. The first, temporal popularity skew, represents the range of time during which a venue was popular. This suggests that venues which cater to only specific customer segments (e.g., lunchtime office workers) are more likely to close. The second, temporal misalignment of the neighborhood, measures difference of a venue’s popularity times to that of its competitors in the neighborhood. This finding suggests that venues that are popular outside of the typical hours of other venues in its area tend to survive longer. Lastly, the diversity of venue types represents the variations of venue categories in an area. Our results show that a decrease in diversity tends to increase the likelihood of closure, suggesting venues in diverse neighborhoods, with multiple types of venue categories, tend to survive longer.
Our models present a novel approach to venue closure prediction, building metrics through multiple datasets which are more fine-grained than previous research in this space. It is important to recognize that our data, like any dataset, are biased in some ways. However, through the use of multiple datasets which target different user segments, we hope to mitigate bias. Further, the consistency of our analysis across multiple cities gives us confidence in our results. Our methodology, of combining data of venue demand and transportation has tremendous potential to study user behavior and inform future urban planning decisions.
More details of our work can be found in our conference paper published October 2018 at the ACM conference on pervasive and ubiquitous computing (Ubicomp).
About the author
Krittika is a Computer Science PhD student and Gates scholar at the University of Cambridge. Her research harnesses traditional machine learning algorithms and network metrics to reveal properties of cities worldwide. This includes work on venue demand as well as closure predictions. She also collaborates with the United Nations Global Pulse Lab to model patterns of behavior after natural disasters and support future disaster relief efforts.