New York City
Is the future of open data in NYC going to be decentralized storage?
The city's open data on demographics, air quality and legal notices was uploaded to Filecoin’s distributed network to evaluate its security and reliability.
This story was originally published by GCN.
Filecoin, an open source decentralized file storage network, is testing out its service by duplicating New York City's open data and hosting the information on its platform.
Protocol Labs, an open source research, development and deployment laboratory, and the Filecoin Foundation will store and maintain city data on demographics, air quality and legal notices on the network – at no cost for at least the next five years.
The Filecoin Foundation is an independent organization that supports the year-old Filecoin network and promotes the growth of the decentralized web, also known as dWeb or Web3. That’s the concept of reorganizing the web so that rather than most online data being managed on private servers operated by major companies such as Amazon, Google and Microsoft, data is stored and managed using other computing resources, such as mobile devices.
“New York City continuously looks ahead to better understand how technology can help it deliver for New Yorkers,” NYC’s outgoing CTO John Paul Farmer wrote in an email to GCN. “That includes identifying efficiencies, improving governmental resilience, and safeguarding data – all of which are addressed by the Filecoin Foundation’s test of the decentralized web.”
The foundation identified the datasets and uploaded them to the network via Estuary, open source software that allows public data to be sent the Filecoin network and retrieved from anywhere.
“The Filecoin Foundation and Protocol Labs downloaded the data and conducted the storage operation, making this a very light lift for the City of New York,” Farmer said.
The project came about late this year, and the foundation announced it Dec. 16. Farmer said the city will assess the effort in early 2022 to determine next steps, such as adding datasets. The data will be available at NYC Open Data’s website and also via Estuary, Filecoin and the InterPlanetary File System, a peer-to-peer network.
“One key decision that allowed the collaboration to move quickly is that we decided to test using open data – which is by definition already approved for broad availability and use – as a place to start,” he said. “A benefit of this test run is that it doesn’t require any behavior change by users of NYC Open Data. In fact, the experiment doesn’t remove data at all, but copies it and stores it on the decentralized web, allowing the City to A/B test impacts on key considerations such as cost, redundancy, completeness and data access latency. All of this makes the experiment an essentially no-risk opportunity for New York City to deepen its understanding of how the decentralized web can bring benefit to its daily operations.”
A/B, or split, testing compares two versions of something – in this case, storage systems – to measure which performs better.
“The NYC Open Data datasets stored on the Filecoin network are additive – the datasets are now also being stored on the decentralized web, in addition to however they were stored before,” Marta Belcher, head of policy at Protocol Labs and chairwoman of the foundation, wrote in an email to GCN. “This project adds a new, more secure, more robust way to store these important datasets. That’s really what Filecoin is designed to do – to store humanity’s most important information.”
Benefits of using the decentralized web include greater security and reliability, Belcher added, citing major outages such as the one Amazon Web Services experienced Dec. 8 that brought much of the internet to a halt.
“That’s the problem with having single points of failure,” she said. “We believe you can create a better version of the web if you combine the storage capacity and computing power on all of our individual devices into a supercomputer-like network, and store multiple copies of data across those devices. On this decentralized version of the internet, websites will stay up even if some nodes fail, and the availability of information is not dependent on any one server or company.”
Filecoin uses cryptography and blockchain to secure data, plus it has nodes worldwide, which Belcher said reduces the risk of outages.
The Filecoin network kicked off in October 2020. Users pay with cryptocurrency to store their files on storage miners, which are computers that must prove they have stored the files correctly over time, according to Filecoin’s website. Anyone who wants to store their own files or get paid (in cryptocurrency) for storing others’ files can join Filecoin, according to its website.
Storage on the Filecoin network costs 0.02% of storing the same data on Amazon S3, according to the foundation. More than 3,500 storage providers use the network to offer more than 13 exibytes, or 1 million terabytes, of storage.
Correction: An earlier version of this story mischaracterized New York City's role in this project.
NEXT STORY: Eric Adams’ COVID-19 plan keeps private sector vaccine mandate in place