Turning Data into a Commodity

Imagine you could do the seemingly impossible and poll every eligible person in the U.S. about their voting preference in the upcoming US elections. What would that information be worth?

In this instance, it's important to consider it in terms of economic value. The value would be compared to the survey's cost, which is the bare minimum: collecting, formatting, and storing the polling data. Consequently, a buyer interested in the data would be rationally skeptical and wish to perform due diligence or have a trusted third party perform and sign off on an audit.

Due diligence would include sampling the data to increase the confidence that the voting population was indeed included and would require drilling down into that sample to assess the accuracy of each questionnaire. What incentive does anyone have to fill out a questionnaire? The point is that the higher the required level of confidence in the data, the more cost starts trending toward the original cost of the survey. This results in inefficiencies due to the duplication of efforts.

Our economic and legal structures suggest that if the original surveyor represents an institution, the institution will deem and represent the data as valuable. And if the institution is reputable, anyone relying on it, like a buyer, would also consider it valuable. This is how it should work. However, in practice, the further someone is from the original surveyor, the less confidence they have in the data; even with due diligence, less value will be placed on the data than the original surveying company. Why?

The value decreases for two reasons: First, stakeholders outside the original surveyor cannot prove where the data came from or if it was collected and represented appropriately; second, stakeholders don’t see the incentive for those questioned to contribute to the survey. Therefore, no matter how sophisticated the application, the process introduces subjectivity and noise, leading to high variance, loss of confidence, and decreased value as more degrees of separation from the original surveyor are added.

Chaining the Data Together

The solution is to employ a bit of blockchain thinking. Let's assume that a hundred companies would find the polling data valuable. Then, assume a platform exists where these companies use a proof-of-work derivative to verify that captured data from each U.S. citizen is considered valid. These companies, acting as nodes, could collect each poll as a qualified transaction and place them in a pool; the platform would programmatically create blocks once certain conditions or specifications are met. In essence, the block created would represent a data asset. So, instead of creating a block of transactions, the network of companies would create a chain of valuable assets - the block IS the asset. Once the protocol has created the asset, the network, through a consensus mechanism, approves the block and adds it to a workchain - not a blockchain - as the asset reflects the value of the work done.

The network of companies could predefine several types of blocks to represent different things, like a demographic in a county; hence, mining this type of block creates a data asset with a clear representative and intrinsic value. In this example, the data asset would be valuable to a company, a municipality, or a politician looking to unlock value by contributing to or developing underrepresented communities.

Each block created in this network is signed off, hashed, and linked to the previous block, creating a chain of data assets with full transparency and a provable provenance. The inherent benefit of this approach is not just the ability to create several different data assets from a given dataset but also to combine data assets to create new ones that are more valuable than the sum of them. Taking it a step further, combining blocks could represent the spectrum of demographics of one county or a cross-section of a specific demographic across all counties in a state.

What about the supply side? What is in it for the people of the country contributing to the poll? Given the voting population has access to the Internet (big if), they would have rights attached to how their transaction within the assets (blocks) are used. In other words, a data market could open up where any value created by the assets would be shared, incentivizing the voting population to contribute and allowing their contributions to be used fully. Since data doesn’t generally have a one-use value but decreases over time, the shared value in this example could decrease cash flow over time.

This proactive solution builds value by including due diligence before polling takes place. As a result, the data captured from the polling has value because it was "mined" through a rigorously predetermined process. The companies in the network would access this information and extract value for themselves, either directly by using the asset or indirectly by providing access to these assets outside of the network. Either way, the voting public gets a share of that value. The differentiator is that on the demand side, users engaging with the network would have higher confidence in the data assets' value, as they would have access to the data along with the proof of origin and creation, thereby maintaining the requisite confidence in the data as an asset, decreasing variability, and mitigating value erosion through degrees of separation.

Limitless Possibilities

The above is a hypothetical example of one application of such a platform creating several types of data assets; however, the capability to create data assets is limitless. Generally, the platform solution could be applied to any task that creates an agreed-upon valuable outcome captured in a data asset with price discoverable utility, generating intrinsic asset value and valuable cash flows for the owner and participants of that asset.

The real surprise is that the platform for creating these data assets already exists and is being used in established industries (pharmaceuticals, gaming, carbon credits, sustainable farming) to solve challenging problems and unlock new value through creative data asset design.

The company behind this platform is The Data Economics Company.