DipDup v2.0 brings extended Hasura support, REST endpoints, cron jobs, and many more

Check out all the new features added to our full-stack dapp developing framework.

Lev
Lev   Follow

DipDup is a framework for building selective indexers for Tezos dapps. It helps to reduce boilerplate code and lets developers focus on what's important — the business logic. It works on top of TzKT API, which provides normalized and humanized blockchain data via REST and WebSocket endpoints.
This article will guide you through the recent DipDup changes with code snippets and demo samples.

It's been a month since the very first stable release of DipDup. We have received a lot of feedback which is great, and we are very grateful for that. Please continue sharing your thoughts in Baking Bad chats anytime, let's shape the future of DipDup together!

Today's changelog is pretty huge. DipDup is still at an early stage of development, and there's a lot of work to do. However, we'll try to establish shorter and more predictable release cycles to make adopting new versions easier.

# Breaking changes

# Hasura v1.0 is no longer supported

A new major version of Hasura GraphQL engine has been released recently bringing lots of improvements such as support for multiple database engines, REST endpoints, and better scalability. You can learn how DipDup benefits from these features later in this article.

# Migration

As Hasura documentation states, Hasura v2 is backwards compatible with Hasura v1. Hence, simply updating the Hasura docker image version number and restarting your Hasura instance should work seamlessly.

So if you're using docker-compose replace the following line after upgrading DipDup, and you're ready to go:

docker-compose.yml

 services:
   hasura:
-    image: hasura/graphql-engine:v1.3.3
+    image: hasura/graphql-engine:v2.0.1

# BigMapDiff.action values renamed

Handlers of big_map indexes now receive all kinds of events, including big map allocation and removal. Items of BigMapAction enumeration were renamed to avoid ambiguity:

  • ALLOCATE -> ALLOCATE
  • ADD -> ADD_KEY
  • UPDATE -> UPDATE_KEY
  • REMOVE -> REMOVE_KEY
  • [missing] -> REMOVE

# Migration

All the existing expressions containing BigMapAction items in your handlers will be updated once you run dipdup migrate.

# SQL scripts: on_restart and on_reindex

This change was introduced in DipDup 1.1.0 so if you use SQL scripts in your project you might have already noticed it. Here are some details about the change:

  • Scripts from sql/on_restart directory are executed each time you run your indexer. Those scripts may contain CREATE OR REPLACE VIEW or similar non-destructive operations.
  • Scripts from sql/on_reindex directory are executed after database schema is created based on models.py module, but before indexing starts. It may be useful to change database schema in the way Tortoise ORM can not, e.g. to create a composite primary key:

sql/on_reindex/00-composite-key.sql

ALTER TABLE dex.trade DROP CONSTRAINT trade_pkey;
ALTER TABLE trade ADD PRIMARY KEY (id, timestamp);
  • Both types of scripts are executed without being wrapped with a transaction. It's generally a good idea to avoid touching table data in scripts.
  • Scripts are executed in alphabetical order. If you're getting SQL engine errors, try to split large scripts to smaller ones.
  • SQL scripts are ignored in case of SQLite backend.

NOTE

The SQL snippet above prepares the table to be converted to a TimescaleDB "hypertable". TimescaleDB is a PostgreSQL-compatible database with advanced features for handling time-series. And it works with DipDup, give it a try.

# Migration

  • Run dupdup init to update the project structure.
  • Move existing SQL scripts from sql directory to either sql/on_restart or sql/on_reindex depending on your case.

# Hasura improvements

# hasura configure CLI command

Applies Hasura configuration without restarting the indexer. By default, DupDip will merge existing Hasura metadata (queries, REST endpoints, etc.) Use --reset option to wipe the metadata before configuring Hasura.

$ dipdup hasura configure [--reset]

# Camel case for field names

Developers from the JavaScript world may be more familiar with using camelCase for variable names instead of snake_case Hasura uses by default. DipDup now allows to convert all the fields and table names automatically:

dipdup.yml

hasura:
  url: http://hasura:8080
  ...
  camel_case: True

Now this example query to hic et nunc demo indexer...

query MyQuery {
  hic_et_nunc_token(limit: 1) {
    id
    creator_id
  }
}

...will become this one:

query MyQuery {
  hicEtNuncToken(limit: 1) {
    id
    creatorId
  }
}

All fields autogenerated by Hasura will be renamed accordingly: hic_et_nunc_token_by_pk to hicEtNuncTokenByPk, delete_hic_et_nunc_token to deleteHicEtNuncToken and so on. To return to defaults, set camel_case to False and run dipdup hasura configure again.

# REST endpoints

One of the most anticipated features of Hasura 2.0 is the ability to expose arbitrary GraphQL queries via REST endpoints. By default, DipDup will generate GET and POST endpoints that fetch rows by primary key for all tables available:

$ curl http://127.0.0.1:8080/api/rest/hicEtNuncHolder?address=tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw
{
  "hicEtNuncHolderByPk": {
    "address": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw"
  }
}

However, there's a limitation dictated by the way Hasura parses HTTP requests: only models with primary keys of basic types (int, string, and so on) can be fetched with GET requests. An attempt to fetch a model with BIGINT primary key will lead to an error: Expected bigint for variable id got Number. A workaround to fetching any model is to send a POST request containing a JSON payload with a single key:

$ curl -d '{"id": 152}' http://127.0.0.1:8080/api/rest/hicEtNuncToken
{
  "hicEtNuncTokenByPk": {
    "creatorId": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw",
    "id": 152,
    "level": 1365242,
    "supply": 1,
    "timestamp": "2021-03-01T03:39:21+00:00"
  }
}

We hope to get rid of this limitation someday and will let you know as soon as it happens.

Now the interesting part. You can put any number of .graphql files to the graphql directory in your project root, and DipDup will generate REST endpoints for each of those queries. Let's say we want to fetch not only a specific token, but also the number of all tokens minted by its creator:

query token_and_mint_count($id: bigint) {
  hicEtNuncToken(where: {id: {_eq: $id}}) {
    creator {
      address
      tokens_aggregate {
        aggregate {
          count
        }
      }
    }
    id
    level
    supply
    timestamp
  }
}

Save this query as graphql/token_and_mint_count.graphql and run dipdup configure-hasura. Now this query is available via REST endpoint at http://127.0.0.1:8080/api/rest/token_and_mint_count.

You can also disable creation of REST endpoints in the config:

dipdup.yml

hasura:
  ...
  rest: False

# Limitations for unauthorized users

DipDup creates user role which is allowed to perform queries without authorization. Now you can limit the maximum number of rows such queries return and also disable aggregation queries that are automatically generated by Hasura:

hasura:
  url: http://hasura:8080
  ...
  select_limit: 100
  allow_aggregations: False

Note that with limits enabled you have to use either offset or cursor-based pagination on the client side.

# Scheduled jobs

In some cases, it may come in handy to have an ability to run some code on schedule. For example, you want to calculate some statistics once per hour, not on every block. Add the following section to DipDup config:

jobs:
  midnight_stats:
    callback: calculate_stats
    crontab: "0 0 * * *"
    args:
      major: True
    atomic: True
  leet_stats:
    callback: calculate_stats
    interval: 1337  # in seconds
    args:
      major: False
    atomic: True

Run dipdup init to generate corresponding handlers in the jobs directory. You can use a single callback multiple times with different arguments.

When atomic parameter is set, the job will be wrapped in SQL transaction and rolled back in case of failure.

If you're not familiar with crontab syntax, there's an online service crontab.guru that will help you to build the desired expression.

# Datasources

# Coinbase datasource

A connector for Coinbase Pro API. Provides get_candles and get_oracle_data methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.

datasources:
  coinbase:
    kind: coinbase

Please note that you can't use Coinbase as an index datasource instead of TzKT. It can be accessed via ctx.datasources mapping in either handlers or jobs. See datasource compatibility docs for details.

# Tuning API connectivity

All datasources now share the same code under the hood to communicate with underlying APIs via HTTP. Configs of all datasources and also Hasura's one can contain an optional section http with any number of the following parameters set:

datasources:
  tzkt:
    kind: tzkt
    ...
    http:
      cache: True
      retry_count: 10
      retry_sleep: 1
      retry_multiplier: 1.2
      ratelimit_rate: 100
      ratelimit_period: 60
      connection_limit: 25
      batch_size: 10000
hasura:
  url: http://hasura:8080
  http:
    ...

Each datasource has its own defaults. Usually, there's no reason to alter these settings unless you use your own instance of TzKT or BCD.

By default, DipDup retries failed requests infinitely with exponentially increasing delay between attempts. Set retry_count parameter in order to limit the number of attempts.

batch_size parameter is TzKT-specific. By default, DipDup limit requests to 10000 items, the maximum value allowed on public instances provided by Baking Bad. Decreasing this value will reduce time required for TzKT to process a single request and thus reduce the load. To achieve the same effect for multiple synchronizing indexes reduce a connection_limit parameter.

# Docker deployment

# Base images available at DockerHub

Now you need only two lines in the Dockerfile to build Docker image for your project:

FROM dipdup/dipdup:2.0.2
COPY demo_hic_et_nunc /home/dipdup/demo_hic_et_nunc
$ docker build . -t dipdup-project

# Generating inventory templates

A new command dipdup docker init is available to generate a compose-based setup.

$ dipdup [-c dipdup.yml] docker init [-i dipdup/dipdup] [-t 2.0.2] [-e dipdup.env]

The following files will be created:

docker
├── dipdup.env
├── dipdup.env.example
├── docker-compose.yml
└── Dockerfile

Environment files generated with substitution expressions (${VARIABLE:-default_value}) from DipDup configs provided through the dipdup -c option.
Now navigate to the created directory, edit the environment file and run project with docker-compose:

$ cd project/docker
$ nano dipdup.env
$ docker-compose up -d
$ docker-compose logs -f

By default, PostgreSQL and Hasura are exposed to localhost only: 5432 and 8080 respectively. Edit docker-compose.yml file according to your needs.

Finally, all the demo projects in DipDup have Docker templates generated. In order to spin up a demo run docker-compose up in the <demo_project>/docker directory.

# Images with PyTezos included

In some cases you might need PyTezos to deal with Michelson values stored in big maps. Here's an example how to unpack such values (without knowing the exact type):

from pytezos.michelson.micheline import blind_unpack

packed_micheline = '05020000001e070401000000083631363136313631010000000a37333634363436343634'
expected_michelson = '{ Elt "61616161" "7364646464" }'
micheline_bytes = bytes.fromhex(packed_micheline)
michelson = blind_unpack(micheline_bytes)

assert expected_michelson == michelson, (expected_michelson, michelson)

A separate images containing preinstalled PyTezos are created on every release. Those tags have -pytezos postfix. To set up a local development environment, run make install PLUGINS=pytezos. Note that there are system dependencies required to install PyTezos. On Ubuntu-based distributions run apt install libsodium-dev libsecp256k1-dev libgmp-dev (see more information in the PyTezos repo).

# Additional dependencies

Things are getting complicated when your project requires additional Python dependencies. At Baking Bad we use Poetry for Python package management and encourage you to do the same. Typical pyproject.toml for a DipDup project will look like this:

[tool.poetry.dependencies]
python = "^3.8"
dipdup = "^2.0.0"
some_dependency = "^1.2.3"

To install dependencies missing in base DipDup image add the following lines to your Dockerfile:

FROM dipdup/dipdup:2.0.2

COPY pyproject.toml pyproject.toml
COPY poetry.lock poetry.lock
RUN sudo ./inject_pyproject

COPY demo_hic_et_nunc demo_hic_et_nunc

Note that your project will use DipDup version included in the base image instead of the one specified in your pyproject.toml file.

# Single level rollbacks

It's important for DipDup to be able to handle chain reorgs since reindexing from scratch leads to several minutes of downtime. Single level rollbacks are now processed in the following way:

  • If the new block has the same subset of operations as the replaced one — do nothing;
  • If the new block has all the operation from the replaced one AND several new operations — process those new operations;
  • If the new block misses some operations from the replaced one: trigger full reindexing.

We'll continue to improve rollback handling, read notes on the next steps below.

# Miscellaneous

# Sentry integration

Sentry is an error tracking software available both in the cloud and on-premise. It greatly improves troubleshooting experience and requires nearly zero configuration. To start catching exceptions with Sentry in your project add the following section in dipdup.yml config:

sentry:
  dsn: https://...
  environment: dev

Sentry DSN can be obtained from the web interface at Settings -> Projects -> project_name -> Client Keys (DSN). Cool thing is that if you catch an exception and suspect there's a bug in DipDup you can share this event with us using a public link (created at Share menu).

# Immune tables

There's a new option in PostgresDatabaseConfig to keep some tables from being dropped on reindexing.

database:
  kind: postgres
  ...
  immune_tables:
    - some_huge_table

Here's a real world use case for this feature. Let's say you're developing an indexer for some NFT marketplace contract and decided to store JSON fetched from IPFS in PostgreSQL database. Now you've changed some application logic which requires reindexing. But there's no need to drop results of IPFS lookups as those lookups are "expensive" and will always return the same data.

Keep in mind that reindexing may be triggered by changing database schema as well as by rollback or by hand. Immune tables won't be dropped in all of these cases, so you need to handle immune tables migrations manually (or just drop them and let DipDup re-create them and fill from scratch).

# Protect typeclasses from being overwritten during init

In some cases you may want to make some manual changes in typeclasses and ensure they won't be lost on init. Let's say you want to reuse typename for multiple contracts providing the same interface (like FA1.2 and FA2 tokens) but having different storage structure. You can comment out differing fields which are not important for your index.

types/contract_typename/storage.py

# dipdup: ignore

...

class ContractStorage(BaseModel):
    some_common_big_map: Dict[str, str]
    # unique_big_map_a: Dict[str, str]
    # unique_big_map_b: Dict[str, str]

Files starting with # dipdup: ignore won't be overwritten on init.

# Auto-generated fully-typed Javascript SDKs

So far we were mostly focused on the backend aspect of DipDup but it's also very important how to use it properly on the client side. There are numerous GraphQL clients, some of them are relatively lightweight others are trying to cover as much cases as possible and thus pretty heavy and complex.
In DipDup we are dealing with blockchain data (which is a special case by itself) and also bound with Hasura engine specifics. Here are some important things we should keep in mind when choosing a GraphQL library (and designing the front-end in general):

  • We will likely not need mutations (thus no need in complex state management)
  • Hasura subscriptions are actually live queries (thus no need in query+subsribe-to-more pattern)

Also you should remember that GraphQL queries are just POST requests, and subscriptions over Websockets are powered by a simple standalone module that can be used as-is without a complex wrapping.

However there's a great solution with a perfect usability/complexity balance we want to share with you. It's called GenQL and basically all you have to do to create a ready-to-use fully-typed SDK for your DipDup is just to write a minimal package.json. The rest is done automatically, isn't that just marvelous?

{
  "name": "%PACKAGE_NAME%",
  "version": "0.0.1",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
  "devDependencies": {
    "@genql/cli": "^2.6.0"
  },
  "dependencies": {
    "@genql/runtime": "2.6.0",
    "graphql": "^15.5.0"
  },
  "scripts": {
    "build": "genql --esm --endpoint %GRAPHQL_ENDPOINT% --output ./dist"
  }
}

# What's next?

Our main goals are remaining the same since 1.0 release:

  • Better reorg handling with "backward handlers";
  • Integration with mempool and metadata plugins written in Go;
  • More deployment options (Hasura Cloud, Heroku, AWS) and examples;
  • Ready-to-use client side libraries for our mempool and metadata indexers.

DipDup is a free open-source project driven by your, fellow Tezos developers, needs. Let us know what do you think about the recent changes and our further plans! Come join Baking Bad Telegram group, #baking-bad channel at tezos-dev Slack, and our Discord server.