Public blockchains are not suited to function as a data storage provider. Since they are append-only ledgers, storing many large files would result in a dramatic increase of the whole distributed ledger. That would force the network nodes to store huge databases leaving it in the hands of just a few providers. Likewise, the use of a public blockchain as a computing engine has its limitations. Smart contracts on a blockchain are decentralized, i.e., every miner verifies the correct execution of the contract. Hence it is very expensive to run, for example on the Ethereum blockchain, computationally intensive programs, to prevent the network of becoming paralysed.
Public blockchains can be used for a decentralized open market platform where users can trade cloud infrastructure resources. This way, a distributed storage system could be connected directly to the client without relying on a third party, providing features like commitment policies, access to data policies, or micropayment systems. Similarly, off-chain distributed computation services could run computationally intensive programs which might even take care of privacy issues.
Storj, one of the projects in this field, describes the use of blockchain technology for distributed storage as a possible way to move away from the traditional large storage provider monoliths. They also point to the problem that client-side encryption is usually not standard which incurs risk of exposure of private consumer and corporate data. Client-side encryption can easily be integrated in decentralized market platforms. In general, distributed storage can guarantee data availability since there is no single point of failure, allow redundancy storage, and can, usually, offer lower service fees than existing service providers.General Picture
The key role of a blockchain in distributed storing or distributed computing lies in its capability of securely taking record of all services performed in the cloud, and to automate (micro-) payments accordingly. The challenge we are faced to is to find a secure way to connect the off-chain services with the blockchain, i.e., that all records have a publicly verifiable proof which guarantees the correctness of the off-chain service.
The cloud service could either be provision of storage, computation resources, or an application service. Many of the existing proposals for distributed storage systems make use of distributed hash tables (DHT). We recall the very basics of DHTs quickly in the next section. Kademlia is a well-known protocol built on DHT which some of the proposals extend in order to connect the distributed storage network with the blockchain. Redundancy schemes may be employed to ensure availability and the capability to restore data in case some hosts go offline.
Those platforms which offer distributed computation services either rely on some kind of multi-party computation (MPC) protocols or employ distributed grid computing. We recap both approaches in the next section and give references to survey articles. In both cases, a proof of correct execution or proof of contribution is needed.
The blockchain may be used to store data or intermediate in the following steps:
- Audits and Proofs: Clients may ask for a proof of retrievability, a proof of data integrity, or a proof of correct execution which are then recorded on the blockchain.
- Security deposits: Coins can be used to lock collaterals as a guarantee for correct service execution. Time locked coins can serve the host as payment guarantee.
- Payment service: Smart contracts enable automatically enforced payment executions relying on the verification of proofs recorded on-chain. Off-chain payment channels can be used for exact billing (like Lightning or Raiden network).
- Orderbook: An on-chain orderbook might be used to intermediate the “trade” between client and host.
- Access control: Cryptography can be used to control the access to the cloud or to perform computations on stored data.
Many proposed off-chain networks can be connected to a (existing suitable) blockchain. Nodes participating in the consensus protocol (“miners”) check (beyond the usual transactions of the blockchain) the proofs that some off-chain service was (correctly) completed. The reward for their work depends on the blockchain used.
In some proposals miners are directly involved in providing storage, offering computing service, or provide proofs related to the former to services. These systems have their proper blockchain and reward the miners in their native coins which can be used to pay services or can be exchanged on some exchange platform to other currencies.
Building blocks and methods which are often used in distributed cloud services:
Distributed Hash Tables for Storage
One way to store data between peers in a network is through distributed hash tables. The hash table consists of pairs (key, value), where keyis the hash of the data value. All nodes of the network receive an ID which are of the same format as key. In the Kademlia protocol, a metric based on XOR introduces a topology on the network, and a pair (key, value) is assigned to nodes which are the “nearest” to key with respect to this metric. This can be used to distribute data in the network and lookup the data in logarithmic time. The protocol has four remote procedure calls (RPCs): Ping, Store, Find_Node, and Find_Data. For further details see Kademlia.
Proof of Retrievability / Proof of SpaceTime
A standard method to guarantee the integrity and existence of a certain piece of data on a
remote host uses Merkle trees and Merkle proofs. These are hash trees built in a similar way as the blockchains of Bitcoin or Ethereum. The client sends a challenge and the host has to respond with a correct hash calculation to prove existence of data. There are ways to improve on message size in the challenge-response interaction (see e.g. in Storj whitepaper, section 2.3). Filecoin uses the advanced cryptographic tool zk-SNARK (see for a short intro our blog post) which produces succinct proofs of knowledge of the response. They propose a Proof of SpaceTime that can even provide evidence that some data was being stored throughout a period of time. This might be achieved by using iterated challenge-response interactions.
Multi-party Computation and Proof of Correctness
In multi-party computation (MPC) protocols, each party has a secret value, and the task is to compute a common function over all of their inputs, without revealing any information about their inputs. In A.C. Jao´s work in the 80’s, garbled circuits are proposed to solve this problem. In Engima, the entire encrypted data is split in n shards and is distributed between n peers. A kind of homomorphic encryption allows peers to do basic arithmetic operation on their shards allowing to construct a circuit for any arithmetic function. To obtain turing-completeness, the protocol has to provide control flow to the system as well. The ultimate goal is to achieve privacy, liveness, and correctness of the distributed computation. Please read here for more information about MPC in general.
Distributed grid computing and Proof of Contribution
Grid computing coordinates resource sharing by offering an infrastructure that couples computers, often constructed across LAN or Internet backbone networks, or with IPFS. The Gridcoin project employs a desktop grid computing platform BOINC in combination with a proof of contribution which they call Proof of Research. The iEX.ec project follows the same lines adding a reputation mechanism to the protocol which runs on a Proof of Stake blockchain. Please read here for more information about grid computing.
Binex (No documentation available)
Binex is a storage and computing marketplace in beta version (not public). The Binex platform connects to all data, computation, and application services. Binex token can be used for payments.
- Storage: Any providing service
- Computation: Distributed computing, grid computing, …
- Audit/Proofs: No information available.
- Security deposits in native coins: No information available.
- Payment service: No information available.
- Orderbook: No information available.
- Access control: No information available.
No information available.
The Enigma project started from a research group at MIT aiming at a private and scalable smart contract platform. Enigma network can be connected to any smart contract blockchain and provides off-chain distributed data storage and private off-chain computation (private contracts). Distributed computing accelerates code execution and comes with a Proof of Correct Execution.
Enigma’s network protocol is based on the Kademlia DHT protocol for storage extended by private communication channels. The encrypted data stored in the DHT are referenced and can be used for privacy-preserving multi-party computation (MPC).
The network is connected to the blockchain via shared identities and predicates which are stored on the blockchain and govern access-control. This way, access to any off-chain resources is moderated in a decentralized manner, and without verifying ownership, data (which are encrypted) can not be found.
- Storage: modified Kademlia DHT protocol with secure communication channel.
- Computation: Scripting language with state-of-the-art MPC. Interpreter breaks down the execution and distributes code resulting in improved run-time.
- Audit/Proofs: Proof of Correctness uses SPDZ (kind of “homomorphic encryption”). They are made “publicly verifiable” through commitment schemes.
- Security deposits (in native coins): Nodes of the Enigma network have to deposit a collateral on the blockchain. In case of misbehaviour (not completing execution) deposit gets split between honest nodes.
- Payment service: Storage contract is automatically renewed using the owner’s account balance. If the balance is too low, access to the data will be restricted, unless additional funds are deposited, and will be deleted after some time. Computation fees get split automatically between participating nodes of Enigma network.
- Orderbook: Not explicitly mentioned.
- Access control: Shared identity addresses are stored on the blockchain regulating: Store & Load. Storing party provides a custom predicate for verifying who can read the data and outputs a pointer to the data. Share & Compute. Stored data in DHT with permissions to work on it internally in network.
Enigma connects to any smart contract blockchain, hence consensus depends on host blockchain.
Golem is building a decentralized marketplace of computing power on top of the Ethereum blockchain. There are three groups of participants in the Golem network: clients, providers of CPU, and software developers. Clients can ask for computation resources or software services. The network makes matches between clients and providers.
There will be an Application Registry where software developers can offer their software programs. Computation audits will be available in form of challenge-response protocols in the style of TrueBit. Services are paid in Golem tokens (GNT), and there will be many options for secure payments including deposits and reputation systems.
- Computation: Volunteer grid computing and application services.
- Audit/Proofs: TrueBit-style proofs in the future .
- Security deposits: Client may require and providers may accept only time locked GNT payments.
- Payment service: Golem tokens; planned off-chain payment channels (Raiden), escrows, and more.
- Orderbook: Golem network, in the future Application Registry provides list of software services
- Access control: Golem token.
Built on top of Ethereum blockchain.
Gridcoin is a decentralized cryptocurrency in which volunteers are rewarded for their contribution in the BOINC distributed grid computing project. The project is based at Berkeley university and enables to share computing resources for scientific research purposes. Researchers who contribute with computational power towards BOINC research are compensated in Gridcoin.
The blockchain is based on an energy-efficient Proof-of-Stake process and a Proof of BOINC (Work), called Proof of Research. This consensus protocol allows to use the computing power to maintain the blockchain almost exclusively for scientific research replacing the cryptographic puzzle of “classical” PoW by useful work. The mining reward is calculated using the research age of the miner.
- Computation: Volunteer grid computing
- Audit/Proofs: part of CPID hashing algorithm.
- Security deposits in native coins: Not needed.
- Payment service: Block reward for Proof of Research.
- Orderbook: Not on chain. There is a whitelist of BOINC projects.
Proof of Stake and Proof of BOINC (WORK) hybrid (“Proof of Research”). Reward in Gridcoins.
iExec is a project (in Proof of Concept version) lead by researchers from the CNRS in France. Their project shares similarities with the Gridcoin project and uses desktop grid computing. Unlike Enigma, but similar to Golem, they plan to integrate any legacy applications or libraries through a Proof of Contribution, instead of specializing in off-chain computation. The protocol Proof-of-Contribution should allow to build consensus between the blockchain and off-chain resources (no details available yet).
iEx.ec will rely on a blockchain to coordinate the access of computing resources to distributed applications. Multicriteria scheduling smart contract is among their innovations encompassing different needs of the customers: fast execution or low costs. iEx.ec will be based on a decentralized network with Proof of Stake consensus and trusted nodes. Trust is built on reputation which gets built with backward mutability (Gridcoin) and certifications (Sarmenta) at early days of blockchain inauguration.
- Computation: Volunteer grid computing
- Audit/Proofs: Proof of Contribution.
Security deposits in native coins: No.
Payment service: Automated post-execution payments using smart contracts.
Orderbook: Yes. Matchmaking smart contracts. Multicriteria scheduling smart contract.
- Access control: Yes. Blockchain to coordinate access to application resources.
Proof of Stake protocol with trusted nodes. Trust is built on reputation which gets built with backward mutability (Gridcoin) and certifications (Sarmenta).
SIA is a peer-to-peer decentralized network for cloud storage created by the Nebulous Lab. This marketplace of storage is built on top of the Sia blockchain, and permits providers to rent their unused storage space.
Storage contracts provide the option to fix an allowance of the client, a prepaid amount of Siacoins to pay storage and bandwidth (uploads, downloads) for a certain period of time, and allows to lock a collateral of the host. Clients can ask for a Proof of Retrievability based on Merkle proofs. Off-chain payment channels, similar to payment channels in Bitcoin, provide a secure way to make transactions per bandwidth within the timeframe written in the storage contract.
The blockchain enforces automatically the final payments associated to the storage contracts. Host are compensated with fees, or client/host is punished with a percentage of their allowance or collateral, in case of non-compliance. There is a file repair mechanism: if host goes offline, a redundancy protocol takes care that data are uploaded to a new host.
- Storage: Host address database on blockchain. Reed-Solomon redundancy algorithm.
- Audit/Proofs: Proof of Retrievability using file size and Merkle Proofs.
- Security deposits in native coins: Allowance of client and collateral of host in Siacoins.
- Payment service: Storage contract and off-chain payment channels between renter and host. Revisions (money movement in contract — not on blockchain) allow updating the actual bandwidth consumption.
- Orderbook: Best 50 hosts of ranking system.
- Access control: Private key for decryption.
PoW Blake2b algorithm. Mining rewarded in Siacoins.
Storj is a P2P cloud storage network built on top of any blockchain (Storjcoin X token on Bitcoin blockchain or Storj on Ethereum blockchain). Access to and communication in the network is regulated by master nodes functioning as bridges.
If a client wants to upload a file, it is split in pieces and these pieces are sent to different nodes transferred via HTTP. Storj network is based on Kademlia DHT extended by new message types: Offer, Consign, Mirror, and Retrieve for contracting. Clients can ask for Proof of Retrievability via Audit which employs Merkle trees and Merkel proofs. Partial audits are available to lower computational overhead. There is a redundancy scheme (erasure coding: k out of m shards necessary to restore data) to secure availability of the data, KFS local file store to scale, and NAT traversal and Tunneling to protect participation of adversary peers not belonging to the network.
Client and host can negotiate via secure messaging a contract including payments. There are standard contracts provided. Upon receipt and validation of an audit, the data owner must issue payment to the farmer according to agreed-upon terms in the contract through payment channels.
- Storage: Sharding and modified DHT Kademlia protocol extended by efficient messaging. Redundancy scheme (t out of n shards), network access authentication via Bridge (master node) organizing network transfer and mirror creation.
- Audit/Proofs: Proof of Retrievability. Client can ask for partial audits. The host must respond with a Merkle proof. Partial audits to lower computational overhead.
- Security deposits: none.
- Payment service: Payments with Storjcoin X or Storj tokens (so far) via channels (opening channel transaction and then updating) or any other currency per bandwidth usage.
Directly between peers. (Reputation not yet implemented).
- Access control: Not through blockchain, but via master node of Storj network.
Storj connects to any smart contract blockchain, hence consensus depends on host blockchain. At the moment (via Counterparty) to Bitcoin and to Ethereum blockchain.
SWARM is a distributed storage platform (Proof of Concept version). Swarm primarily aims at a decentralized and redundant storage of Ethereum’s blockchain to off-load dapp code. SWARM is a network whose nodes are connected to the Ethereum blockchain.
The storage network is organized by a modification of the Kademlia DHT system. Data get split in pieces and assembled at an interface called “Chunker”. Manifest is a data structure describing collections which allows URL based access to content.
The Ethereum name service (ENS) enables content retrieval based on mnemonic names. Swarm will have an accounting protocol and secure automated ways to pay storage services.
- Storage: Sharding (Chunker) and modified DHT Kademlia protocol.
- Audit/Proofs: Challenge-response audit proofs using Merkle Proofs.
- Security deposits in native coins: SWINDLE – Secured With INsurance Deposit Litigation and Escrow. SWEAR – Storage With Enforced Archiving Rules or Swarm Enforcement And Registration
- Payment service: Swarm Accounting Protocol, Secured With Automated Payment
- Orderbook: Not needed. Upload to Swarm.
- Access control: Manifest – data structure for data access.
Built on top of Ethereum blockchain.
Some links to other projects:
Academic (Scalability, Proof of Space, useful PoW):
Dynamic Distributed Storage for Scaling Bitcoin (Scalability)
Rollerchain (Proof of Retrievability, useful PoW)
Permacoin (Proof of Retrievability, useful PoW)
SpaceMint (Proof of Space, useful PoW)
Sigwo technologies (Blockchain to store small data)
Decent (Decentralized content providers – not for storage)