A new working paper by Diane Coyle and Annabel Manley looks to contribute to the ‘grand challenge’ of understanding the value of data to our society.
Data sets, and the inferences made from them, are generating an increasing amount of value in modern economies. However, this value is typically not well captured in GDP, and in general, the absence of markets for data assets means there is no easy approach to measuring the value of data. Yet given the potential value that can be created from investing in data and making it available, this oversight could lead to underinvestment or too little access to data.
Data has certain economic characteristics that make market-based methods of determining value insufficient to understanding its true potential value to society.
First is its non-rival nature, in that one person or company’s use of a dataset does not affect whether another person or company can also use it.
Second is that datasets often involve externalities. For example, information externalities mean that the presence of one data point will increase the value of all other data points in the dataset. Conversely, loss of privacy would be a negative externality. Therefore, the potential to link two datasets creates complications for valuations as the combined dataset will have a value possibly greater than the sum of its parts. These characteristics mean that private markets will not deliver economically efficient social availability of data, and that market prices will not reflect social value.
The experiment
In our new working paper we test one potential method of determining the social value of a dataset: discrete choice analysis.
Discrete choice analysis is a type of ‘contingent valuation’ method used to elicit individuals’ willingness to pay, a measure of consumer surplus. The method we tested is frequently used in marketing research for pricing strategies, and so there are a number of software tools that will automate the survey design and analysis (we used conjoint.ly). More recently, contingent methods have also been used to value ‘free’ digital goods[1][2][3], and for a pilot study by the ONS for valuing their own datasets.
For this method, the ‘good’ in question (in this case a dataset) is split up into various component characteristics. Then, by combining different characteristics and different prices, and asking consumers which they would most prefer, it is possible to work out which characteristics respondents most prefer, and how much they would be willing to pay for them. To work out the social value of this dataset in the case where most people would be unlikely to use it directly, we set the price as the amount an individual would be willing to pay to ensure open access of the dataset rather than for their own personal access.
For our proof of concept experiment, we chose to use a well-known dataset and test a few of its characteristics. We used the World Development Indicators (WDI) as our example dataset. We chose to vary the degree of timeliness (how frequently the data is updated), interoperability (how easy it is to download and use), and granularity (how detailed the data gets in terms of its coverage) of the dataset. There were three levels of each characteristic, and three price levels included in the survey.
For the survey, these are combined into different ‘bundles’ and the individual is asked which they would most prefer, before giving the option to say whether they would prefer it to the option of no open access for the WDI. Figure 1 below shows an example of the survey design along with all the levels we included for each characteristic.
Our sample consisted of 401 members of the American Economic Association (AEA), all of whom had agreed to be sent surveys through a service run by the AEA. They each had to answer eight of these multiple choice questions, which produced enough information to give an output that ranked which of the characteristics was most important for determining value, and willingness to pay for different settings of each characteristic.
Conclusions
The method has some potential uses, particularly for data sets consisting of aggregated data (rather than peronalised or individual data). Two key limitations emerged during our experiment, though, primarily concerned with survey design. First, participants often found the survey confusing, even with a high data literate sample and only three characteristics. Future implementations should therefore include both question screening for understanding and an example product to make sure people are clear about the subject. Second, there are no accepted ‘standard’ characteristics to value the dataset on and many potential characteristics which could impact dataset value. Two stage surveys where the first stage draws out the most important characteristics may help to address this.
However, these limitations are not insurmountable and further experiments would standardise the survey design process. In particular, this method would work very well for bodies considering a ‘freemium’ style business model, where a ‘standard’ project is freely available whilst a ‘premium’ data product is offered for a fee. It would also work well for public bodies considering what forms of their data they should make openly available, and what the social return to their data investments might be. Ultimately, in many settings, the wedge between private and social means there seems to be few alternatives to discrete choice or other contingent valuation methods of analysis for understanding the full social value of data.
[1] The Welfare Effects of Social Media (Allcott et al 2020)
[2] Measuring the Impact of Free Goods on Real Household Consumption (Brynjolfsson et al 2020)
[3] Free goods and economic welfare (Coyle and Nguyen 2020)
We are grateful for funding for this work from the Omidyar Network.
Image: Georgia de Lotz – Unsplash
The views and opinions expressed in this post are those of the author(s) and not necessarily those of the Bennett Institute for Public Policy.