In 2012, I was among the early data scientists in the New York City Mayor’s Office during the administration of then-Mayor Michael Bloomberg. It was a particularly exciting time in my career. There was electricity in the air. A great many opportunities to affect change by reaching into the unfathomable troves of untapped data. For a computational statistician, it was a dream come true.
While the pace of innovation in government was only starting to gain steam, the tech sector was already publishing advancements in data science almost every day. In particular, there was one paper that I still remember clearly to this day. Google’s X lab — the stealth experimental division of the tech giant — released a paper entitled “Building high-level features using large scale unsupervised learning”. For those who are new to machine learning parlance, large scale unsupervised learning is not a massive undirected study hall session in a high school, but rather a task for an algorithm to group and classify images without human intervention. To illustrate its capabilities, the authors allowed the algorithm to roam Google Images, then learn and correctly identify cats and thousands of other objects without human guidance. Upon reading this article, it was only natural to fantasize about the possibilities of having access to a computing cluster with 16,000 cores and devising the elegant mathematics and code that would course through its electronic veins.
In the spirit of the technocratic-leaning Bloomberg administration, I decided to share these findings and their applicability to government operations at a Monday morning staff meeting. I imagined that the news of this new development would be met with fanfare. Instead, a few of my colleagues fixated on the mention of cats. Why cats? Why not potholes, cars, or airplanes? Cats and algorithms?
Clearly, my point was missed.
A wise policy advisor pulled me aside and suggested that perhaps I was the one who had missed the point – my colleagues’ reactions were clear indications that data science and machine learning were still new concepts in public service. After all, human nature tends to favor the familiar (cats in this case) rather than the abstract (algorithms). And while I enjoyed the support of a few policy advisors and had the freedom to develop elegantly written code and launch sophisticated predictive initiatives, there was an even bigger question: could data science be a sustainable policy option in the long run? The path to affect change might not be through building the models myself, but rather by affecting an organisation’s technical capacity to the point that others are ready to rise to the challenge.
Data science capabilities have grown throughout government and produced notable advancements. In city government, for example, agencies with traditional field operations have adopted predictive modeling, such as the NYC Fire Department’s FireCast model for risk-based inspections and restaurant inspection forecasting by the City of Chicago. On the global level, a 2018 survey of 39 national statistical offices and international agencies found that 21 have machine learning projects underway – some exploratory while others are in production. These efforts reflect just a few of the agencies at the forefront. For newcomers to data science, how might other government institutions approach capacity building?
Perhaps data science is not the thing on everyone’s mind [read: pandemic and politics]. But it is important not to lose sight of how data science and machine learning are already playing a silent but ever-growing role. Data products are invisible hands that guide social media and e-commerce. Computer vision algorithms play a role in driving cars and everyday life. Governments can and should find fair and equitable strategies to integrate this resource into its operations, and at the very least, understand how corporations and citizens wield it for their interests.
Drawing from my experience in data science roles in two mayoral administrations in the largest city in the United States as well as in two US presidential administrations, my work with the Bennett Institute for Public Policy will explore strategies for building data science capacity and illustrate the role of data products. Through a series of articles, I will investigate the issues surrounding data science capacity by sharing the experiences of public servants from around the globe as well as my own. My hope is to bridge the technical and social aspects of data science and offer practical recommendations. For executive audiences, I will highlight the common challenges that agencies face when launching data products and investigate strategies employed in a variety of fields, ranging from economics and national statistics, emergency services, among others. For the aspiring data scientist, I will offer a candid perspective as a former insider. Being a data scientist in government requires skill to “read the room” so that uncomfortable facts that are surfaced from research can be delivered in a politically palatable format. It requires humbling oneself to investigate hypotheses proposed by stakeholders even when this seems scientifically infeasible. It requires knowledge not only of mathematical theory, but of how to win support for a data science initiative and make it actionable.
About the author
Jeff Chen, Affiliated Researcher
Jeff Chen is a computational statistician who has led public-facing data science initiatives in over 40 fields. Currently, he is the Vice President of Data Science at Nordic Entertainment Group where he leads machine learning (ML) and data engineering for personalizing one of Europe’s leading ... Learn more