DipDup is a framework for building selective indexers for Tezos dapps. It helps to reduce boilerplate code and lets developers focus on what's important — the business logic. It works on top of TzKT API, which provides normalized and humanized blockchain data via REST and WebSocket endpoints.
This article will guide you through the recent DipDup changes with code snippets and demo samples.
It's been a month since the very first stable release of DipDup. We have received a lot of feedback which is great, and we are very grateful for that. Please continue sharing your thoughts in Baking Bad chats anytime, let's shape the future of DipDup together!
Today's changelog is pretty huge. DipDup is still at an early stage of development, and there's a lot of work to do. However, we'll try to establish shorter and more predictable release cycles to make adopting new versions easier.
# Breaking changes
# Hasura v1.0 is no longer supported
A new major version of Hasura GraphQL engine has been released recently bringing lots of improvements such as support for multiple database engines, REST endpoints, and better scalability. You can learn how DipDup benefits from these features later in this article.
# Migration
As Hasura documentation states, Hasura v2 is backwards compatible with Hasura v1. Hence, simply updating the Hasura docker image version number and restarting your Hasura instance should work seamlessly.
So if you're using docker-compose
replace the following line after upgrading DipDup, and you're ready to go:
docker-compose.yml
services:
hasura:
- image: hasura/graphql-engine:v1.3.3
+ image: hasura/graphql-engine:v2.0.1
# BigMapDiff.action
values renamed
Handlers of big_map
indexes now receive all kinds of events, including big map allocation and removal. Items of BigMapAction
enumeration were renamed to avoid ambiguity:
ALLOCATE
->ALLOCATE
ADD
->ADD_KEY
UPDATE
->UPDATE_KEY
REMOVE
->REMOVE_KEY
[missing]
->REMOVE
# Migration
All the existing expressions containing BigMapAction
items in your handlers will be updated once you run dipdup migrate
.
# SQL scripts: on_restart
and on_reindex
This change was introduced in DipDup 1.1.0 so if you use SQL scripts in your project you might have already noticed it. Here are some details about the change:
- Scripts from
sql/on_restart
directory are executed each time you run your indexer. Those scripts may containCREATE OR REPLACE VIEW
or similar non-destructive operations. - Scripts from
sql/on_reindex
directory are executed after database schema is created based onmodels.py
module, but before indexing starts. It may be useful to change database schema in the way Tortoise ORM can not, e.g. to create a composite primary key:
sql/on_reindex/00-composite-key.sql
ALTER TABLE dex.trade DROP CONSTRAINT trade_pkey;
ALTER TABLE trade ADD PRIMARY KEY (id, timestamp);
- Both types of scripts are executed without being wrapped with a transaction. It's generally a good idea to avoid touching table data in scripts.
- Scripts are executed in alphabetical order. If you're getting SQL engine errors, try to split large scripts to smaller ones.
- SQL scripts are ignored in case of SQLite backend.
NOTE
The SQL snippet above prepares the table to be converted to a TimescaleDB "hypertable". TimescaleDB is a PostgreSQL-compatible database with advanced features for handling time-series. And it works with DipDup, give it a try.
# Migration
- Run
dupdup init
to update the project structure. - Move existing SQL scripts from
sql
directory to eithersql/on_restart
orsql/on_reindex
depending on your case.
# Hasura improvements
# hasura configure
CLI command
Applies Hasura configuration without restarting the indexer. By default, DupDip will merge existing Hasura metadata (queries, REST endpoints, etc.) Use --reset
option to wipe the metadata before configuring Hasura.
$ dipdup hasura configure [--reset]
# Camel case for field names
Developers from the JavaScript world may be more familiar with using camelCase for variable names instead of snake_case Hasura uses by default. DipDup now allows to convert all the fields and table names automatically:
dipdup.yml
hasura:
url: http://hasura:8080
...
camel_case: True
Now this example query to hic et nunc demo indexer...
query MyQuery {
hic_et_nunc_token(limit: 1) {
id
creator_id
}
}
...will become this one:
query MyQuery {
hicEtNuncToken(limit: 1) {
id
creatorId
}
}
All fields autogenerated by Hasura will be renamed accordingly: hic_et_nunc_token_by_pk
to hicEtNuncTokenByPk
, delete_hic_et_nunc_token
to deleteHicEtNuncToken
and so on. To return to defaults, set camel_case
to False and run dipdup hasura configure
again.
# REST endpoints
One of the most anticipated features of Hasura 2.0 is the ability to expose arbitrary GraphQL queries via REST endpoints. By default, DipDup will generate GET and POST endpoints that fetch rows by primary key for all tables available:
$ curl http://127.0.0.1:8080/api/rest/hicEtNuncHolder?address=tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw
{
"hicEtNuncHolderByPk": {
"address": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw"
}
}
However, there's a limitation dictated by the way Hasura parses HTTP requests: only models with primary keys of basic types (int, string, and so on) can be fetched with GET requests. An attempt to fetch a model with BIGINT primary key will lead to an error: Expected bigint for variable id got Number
. A workaround to fetching any model is to send a POST request containing a JSON payload with a single key:
$ curl -d '{"id": 152}' http://127.0.0.1:8080/api/rest/hicEtNuncToken
{
"hicEtNuncTokenByPk": {
"creatorId": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw",
"id": 152,
"level": 1365242,
"supply": 1,
"timestamp": "2021-03-01T03:39:21+00:00"
}
}
We hope to get rid of this limitation someday and will let you know as soon as it happens.
Now the interesting part. You can put any number of .graphql
files to the graphql
directory in your project root, and DipDup will generate REST endpoints for each of those queries. Let's say we want to fetch not only a specific token, but also the number of all tokens minted by its creator:
query token_and_mint_count($id: bigint) {
hicEtNuncToken(where: {id: {_eq: $id}}) {
creator {
address
tokens_aggregate {
aggregate {
count
}
}
}
id
level
supply
timestamp
}
}
Save this query as graphql/token_and_mint_count.graphql
and run dipdup configure-hasura
. Now this query is available via REST endpoint at http://127.0.0.1:8080/api/rest/token_and_mint_count
.
You can also disable creation of REST endpoints in the config:
dipdup.yml
hasura:
...
rest: False
# Limitations for unauthorized users
DipDup creates user
role which is allowed to perform queries without authorization. Now you can limit the maximum number of rows such queries return and also disable aggregation queries that are automatically generated by Hasura:
hasura:
url: http://hasura:8080
...
select_limit: 100
allow_aggregations: False
Note that with limits enabled you have to use either offset or cursor-based pagination on the client side.
# Scheduled jobs
In some cases, it may come in handy to have an ability to run some code on schedule. For example, you want to calculate some statistics once per hour, not on every block. Add the following section to DipDup config:
jobs:
midnight_stats:
callback: calculate_stats
crontab: "0 0 * * *"
args:
major: True
atomic: True
leet_stats:
callback: calculate_stats
interval: 1337 # in seconds
args:
major: False
atomic: True
Run dipdup init
to generate corresponding handlers in the jobs
directory. You can use a single callback multiple times with different arguments.
When atomic
parameter is set, the job will be wrapped in SQL transaction and rolled back in case of failure.
If you're not familiar with crontab syntax, there's an online service crontab.guru that will help you to build the desired expression.
# Datasources
# Coinbase datasource
A connector for Coinbase Pro API. Provides get_candles
and get_oracle_data
methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.
datasources:
coinbase:
kind: coinbase
Please note that you can't use Coinbase as an index datasource instead of TzKT. It can be accessed via ctx.datasources
mapping in either handlers or jobs. See datasource compatibility docs for details.
# Tuning API connectivity
All datasources now share the same code under the hood to communicate with underlying APIs via HTTP. Configs of all datasources and also Hasura's one can contain an optional section http
with any number of the following parameters set:
datasources:
tzkt:
kind: tzkt
...
http:
cache: True
retry_count: 10
retry_sleep: 1
retry_multiplier: 1.2
ratelimit_rate: 100
ratelimit_period: 60
connection_limit: 25
batch_size: 10000
hasura:
url: http://hasura:8080
http:
...
Each datasource has its own defaults. Usually, there's no reason to alter these settings unless you use your own instance of TzKT or BCD.
By default, DipDup retries failed requests infinitely with exponentially increasing delay between attempts. Set retry_count
parameter in order to limit the number of attempts.
batch_size
parameter is TzKT-specific. By default, DipDup limit requests to 10000 items, the maximum value allowed on public instances provided by Baking Bad. Decreasing this value will reduce time required for TzKT to process a single request and thus reduce the load. To achieve the same effect for multiple synchronizing indexes reduce a connection_limit
parameter.
# Docker deployment
# Base images available at DockerHub
Now you need only two lines in the Dockerfile
to build Docker image for your project:
FROM dipdup/dipdup:2.0.2
COPY demo_hic_et_nunc /home/dipdup/demo_hic_et_nunc
$ docker build . -t dipdup-project
# Generating inventory templates
A new command dipdup docker init
is available to generate a compose-based setup.
$ dipdup [-c dipdup.yml] docker init [-i dipdup/dipdup] [-t 2.0.2] [-e dipdup.env]
The following files will be created:
docker
├── dipdup.env
├── dipdup.env.example
├── docker-compose.yml
└── Dockerfile
Environment files generated with substitution expressions (${VARIABLE:-default_value}
) from DipDup configs provided through the dipdup -c
option.
Now navigate to the created directory, edit the environment file and run project with docker-compose:
$ cd project/docker
$ nano dipdup.env
$ docker-compose up -d
$ docker-compose logs -f
By default, PostgreSQL and Hasura are exposed to localhost only: 5432
and 8080
respectively. Edit docker-compose.yml
file according to your needs.
Finally, all the demo projects in DipDup have Docker templates generated. In order to spin up a demo run docker-compose up
in the <demo_project>/docker
directory.
# Images with PyTezos included
In some cases you might need PyTezos to deal with Michelson values stored in big maps. Here's an example how to unpack such values (without knowing the exact type):
from pytezos.michelson.micheline import blind_unpack
packed_micheline = '05020000001e070401000000083631363136313631010000000a37333634363436343634'
expected_michelson = '{ Elt "61616161" "7364646464" }'
micheline_bytes = bytes.fromhex(packed_micheline)
michelson = blind_unpack(micheline_bytes)
assert expected_michelson == michelson, (expected_michelson, michelson)
A separate images containing preinstalled PyTezos are created on every release. Those tags have -pytezos
postfix. To set up a local development environment, run make install PLUGINS=pytezos
. Note that there are system dependencies required to install PyTezos. On Ubuntu-based distributions run apt install libsodium-dev libsecp256k1-dev libgmp-dev
(see more information in the PyTezos repo).
# Additional dependencies
Things are getting complicated when your project requires additional Python dependencies. At Baking Bad we use Poetry for Python package management and encourage you to do the same. Typical pyproject.toml
for a DipDup project will look like this:
[tool.poetry.dependencies]
python = "^3.8"
dipdup = "^2.0.0"
some_dependency = "^1.2.3"
To install dependencies missing in base DipDup image add the following lines to your Dockerfile:
FROM dipdup/dipdup:2.0.2
COPY pyproject.toml pyproject.toml
COPY poetry.lock poetry.lock
RUN sudo ./inject_pyproject
COPY demo_hic_et_nunc demo_hic_et_nunc
Note that your project will use DipDup version included in the base image instead of the one specified in your pyproject.toml
file.
# Single level rollbacks
It's important for DipDup to be able to handle chain reorgs since reindexing from scratch leads to several minutes of downtime. Single level rollbacks are now processed in the following way:
- If the new block has the same subset of operations as the replaced one — do nothing;
- If the new block has all the operation from the replaced one AND several new operations — process those new operations;
- If the new block misses some operations from the replaced one: trigger full reindexing.
We'll continue to improve rollback handling, read notes on the next steps below.
# Miscellaneous
# Sentry integration
Sentry is an error tracking software available both in the cloud and on-premise. It greatly improves troubleshooting experience and requires nearly zero configuration. To start catching exceptions with Sentry in your project add the following section in dipdup.yml
config:
sentry:
dsn: https://...
environment: dev
Sentry DSN can be obtained from the web interface at Settings -> Projects -> project_name -> Client Keys (DSN). Cool thing is that if you catch an exception and suspect there's a bug in DipDup you can share this event with us using a public link (created at Share menu).
# Immune tables
There's a new option in PostgresDatabaseConfig
to keep some tables from being dropped on reindexing.
database:
kind: postgres
...
immune_tables:
- some_huge_table
Here's a real world use case for this feature. Let's say you're developing an indexer for some NFT marketplace contract and decided to store JSON fetched from IPFS in PostgreSQL database. Now you've changed some application logic which requires reindexing. But there's no need to drop results of IPFS lookups as those lookups are "expensive" and will always return the same data.
Keep in mind that reindexing may be triggered by changing database schema as well as by rollback or by hand. Immune tables won't be dropped in all of these cases, so you need to handle immune tables migrations manually (or just drop them and let DipDup re-create them and fill from scratch).
# Protect typeclasses from being overwritten during init
In some cases you may want to make some manual changes in typeclasses and ensure they won't be lost on init. Let's say you want to reuse typename for multiple contracts providing the same interface (like FA1.2 and FA2 tokens) but having different storage structure. You can comment out differing fields which are not important for your index.
types/contract_typename/storage.py
# dipdup: ignore
...
class ContractStorage(BaseModel):
some_common_big_map: Dict[str, str]
# unique_big_map_a: Dict[str, str]
# unique_big_map_b: Dict[str, str]
Files starting with # dipdup: ignore
won't be overwritten on init.
# Auto-generated fully-typed Javascript SDKs
So far we were mostly focused on the backend aspect of DipDup but it's also very important how to use it properly on the client side. There are numerous GraphQL clients, some of them are relatively lightweight others are trying to cover as much cases as possible and thus pretty heavy and complex.
In DipDup we are dealing with blockchain data (which is a special case by itself) and also bound with Hasura engine specifics. Here are some important things we should keep in mind when choosing a GraphQL library (and designing the front-end in general):
- We will likely not need mutations (thus no need in complex state management)
- Hasura subscriptions are actually live queries (thus no need in query+subsribe-to-more pattern)
Also you should remember that GraphQL queries are just POST requests, and subscriptions over Websockets are powered by a simple standalone module that can be used as-is without a complex wrapping.
However there's a great solution with a perfect usability/complexity balance we want to share with you. It's called GenQL and basically all you have to do to create a ready-to-use fully-typed SDK for your DipDup is just to write a minimal package.json
. The rest is done automatically, isn't that just marvelous?
{
"name": "%PACKAGE_NAME%",
"version": "0.0.1",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"devDependencies": {
"@genql/cli": "^2.6.0"
},
"dependencies": {
"@genql/runtime": "2.6.0",
"graphql": "^15.5.0"
},
"scripts": {
"build": "genql --esm --endpoint %GRAPHQL_ENDPOINT% --output ./dist"
}
}
# What's next?
Our main goals are remaining the same since 1.0 release:
- Better reorg handling with "backward handlers";
- Integration with mempool and metadata plugins written in Go;
- More deployment options (Hasura Cloud, Heroku, AWS) and examples;
- Ready-to-use client side libraries for our mempool and metadata indexers.
DipDup is a free open-source project driven by your, fellow Tezos developers, needs. Let us know what do you think about the recent changes and our further plans! Come join Baking Bad Telegram group, #baking-bad channel at tezos-dev Slack, and our Discord server.