Decentralized index for faster dApps: The Graph
A few days ago we had a good meeting with the team of The Graph: An infrastructure that creates a decentralized index for the data we store on the blockchain.
We were very skeptical about the project before the meeting. Why do we need a decentralized index? How fast will this index perform? What about adoption — who is going to use it and why?
Being engineers ourselves, we could see the problems. After discussing with the team in person, things became more clear.
Before we start explaining the project let’s see what an index is in first place.
What is an index anyway?
Databases and file systems define the way data is stored and organized. Now, indexes are the “address book” of data. They tell us how to search and retrieve data optimally. Imagine that each index is a catalog that tells us the location of every file (or data structure) on our storage device. We can organize something by its name, by its extension, by its file type, or content.
Each time we organize something a different way, we create a new index.
So what do indexes actually do?
Answer: They speed up the way we retrieve data by orders of magnitude.
Let us explain a bit more.
A fridge is a typical example of a database and an index in our daily life.
In a real world analogy imagine this: You have a refrigerator. Consider this your storage layer. The way your store the items inside your fridge, allows you to be more efficient in the way you retrieve the items you want. For example, you may keep the most frequently used items in front, and the less frequently used items at the back of a shelf.
This is what a database (like Bluzelle) does: It allows you to organize your data in a way that makes sense to you. Now, imagine your friend asks you to bring her some milk from the fridge. This command is called “the query.” It tells you what data to retrieve. Let’s say that you don’t really know where the milk is, so you will need to go over every item in the fridge until you find the milk.
This is where the index comes in. Imagine that you have a diagram of where each item is inside the fridge. Even before you open it, you can take a look at the diagram and you know that the milk is on the second shelf bottom right. This is the index.
On the left, data as they get in the server. On the right, after they get indexed in a database.
Why do we need an index for blockchain data?
For the simple reason that there is no other way to build a fast dApp. On Ether, we need 10s in order to retrieve data from the chain. Imagine having to wait for 10s staring at a spinning wheel. In the centralized world this happens in 10ms.
Why do we need a decentralized index and why we just don’t use a centralized one?
This is the most interesting part. In a centralized world, someone can alter the index and point you in the wrong direction or the wrong file. Imagine searching for the account balance of one of your clients. You ask for it on the index. This index — because it is centralized — can point you to any version of the balance. You may want the current balance, which is $10, but it points you to the balance 3 days ago, which was $10,000.
A decentralized index avoids this problem, by using a network of nodes that have a copy of the index. They maintain it trustless. The same way we do everything on blockchain.
In case something goes wrong and the index of one node fails, we can just use another node, achieving 100% uptime of the index.
How does The graph work?
The graph is a decentralized index , that works across blockchains (ie, it can index data in multiple blockchains like eth and btc, but also on ipfs and Filecoin). It monitors the blockchains for new data and updates the index every time this happens. Once the index is updated, it tries to reach consensus among the nodes that maintain it. Once consensus is reached, it ensures that the users of the index will have the latest data available.
One of our concerns is that a bottleneck might be created during the update of the index though, once we have faster blockchains. On ethereum, new blocks come in every 10s on average. The graph updates the index (currently) at about 1s.
The index updates much faster than the data coming in but what happens when you have 10 different chains bringing in data every 1s? Will the index be fast enough to update and also reach consensus among the indexes in a way that ensures that everyone has the same fresh state?
The team explained their scalability solutions on the above during our meeting and that their goal is to prototype and experiment with these different topologies before locking themselves into any design.
Regarding the query language, The Graph uses the GraphQL, a language developed from Facebook. It’s an easy language to understand and the majority of developers are already familiar with it.
Here is how it works: Once the app executes a query, the graph routes this (using the gateway node) to the query nodes that hold the index. The query nodes return the result to the gateway node and then back to the developer.
Now, the token is used to secure and govern the network and to incentivize behaviors that are critical for the network. Briefly, it’s bonded by query nodes and they can either been staked or used as medium of exchange within the network in order to use it (will share more info about it soon).
Another challenges that we see is the governance of the network. Who decides who is a node? Who decides when we deploy changes and how these changes are adopted. The team is still working on it but we loved the fact that they envision a huge open source community that will maintain the network in a decentralized way.
You can read more about the project here https://medium.com/graphprotocol