DipDup is a framework for building selective indexers for Tezos DApps. It helps to reduce boilerplate code and lets developers focus on what's important — the business logic. It works on top of TzKT API, which provides normalized and humanized blockchain data via REST and WebSocket endpoints. This article will guide you through the recent DipDup changes with code snippets and demo samples.
As time goes by, more cool projects on Tezos blockchain choose DipDup as a backend solution. Besides being a joyful event for us, this also reveals new challenges for the framework.
Today we are proud to introduce the next major DipDup version. This time it's marked as a pre-release, which means we will continue to support the 2.0 branch until the release of the stable version. If you're asking yourself, "Should I upgrade now?" the answer is simple. There are three reasons not to wait for the stable 3.0 release:
- You are experiencing issues when using index factories (processing originations matched by
source
/similar_to
fields) - You need to conveniently execute lots of SQL scripts and scheduled jobs
- You just want to be an early adopter and provide some valuable feedback 😃
In any way, think twice before using this version in production environments. Since almost every change in this version breaks backward compatibility, there will be no separate "Breaking Changes" paragraph this time. Instead, look for a fancy warn ⚠ emoji in a paragraph header to know if your action is needed to perform the migration.
# New entity: Hooks
Before version 3.0.0-rc1, every project had two handlers called "default": on_configure
fired before indexing starts and on_rollback
fired when TzKT Datasource receives the reorg message. In addition, arbitrary SQL scripts from sql/on_restart
and sql/on_reindex
project directories could be executed on restart and reindex, respectively.
Later we realized there are some flaws in this approach:
- "Default handlers" are not exactly handlers since they are not linked to any index.
- Adding new events when needed could be painful.
- A lack of ability to invoke SQL scripts from handlers and jobs.
- Jobs are very similar to default handlers and SQL scripts: arbitrary code which is executed on a specific event (by schedule in this case)
To solve these problems, we decided to significantly redesign this part of the framework and introduce hooks. Hooks are user-defined callbacks called either from the ctx.fire_hook
method or by scheduler (jobs
config section, we'll return to this topic later).
Let's assume we want to calculate some statistics on-demand to avoid blocking an indexer with heavy computations. Add the following lines to DipDup config:
hooks:
calculate_stats:
callback: calculate_stats
atomic: False
args:
major: bool
depth: int
A couple of things here to pay attention to:
- An
atomic
option defines whether hook callback will be wrapped in a single SQL transaction or not. If this option is set to true main indexing loop will be blocked until hook execution is complete. Some statements likeREFRESH MATERIALIZED VIEW
do not require to be wrapped in transactions, so choosing a value of theatomic
option could decrease the time needed to perform initial indexing. - Values of
args
mapping are used as type hints in a signature of a generated callback. We will return to this topic later in this article.
Now it's time to call dipdup init
. The following files will be created in the project's root:
├── hooks
│ └── calculate_stats.py
└── sql
└── calculate_stats
└── .keep
Content of the generated callback stub:
from dipdup.context import HookContext
async def calculate_stats(
ctx: HookContext,
major: bool,
depth: int,
) -> None:
await ctx.execute_sql('calculate_stats')
By default, hooks execute SQL scripts from the corresponding subdirectory of sql
. Remove or comment out the execute_sql
call to prevent this. This way, both Python and SQL code may be executed in a single hook if needed.
# ⚠ Default handlers require manual migration
Now it's time to get rid of deprecated "default handlers". Here's a mapping of old and new callbacks for internal DipDup events:
handlers (old) | sql | hooks (new) |
---|---|---|
on_configure | on_restart | on_restart |
on_reindex | on_reindex | |
on_rollback | on_rolback |
Perform the following actions:
- If you have any custom logic implemented in default handlers, move it to corresponding hooks using the table above to find the right destination.
- Remove default handlers from the project's
handlers
directory. sql
directory could be left as it is.
Like in previous releases, unprocessed rollback leads to reindexing. Other events have no default action.
# ⚠ jobs
become schedules for hooks
Since we already have an entity for user-defined callbacks (both Python and SQL ones), jobs
can refer to hooks without having their own callbacks.
jobs:
daily_cron_stats:
hook: calculate_stats
crontab: 0 0 * * * *
args:
major: True
depth: 9000
leet_interval_stats:
hook: calculate_stats
interval: 1337
args:
major: False
depth: 1
If you already had job callbacks implemented in your project before updating to 3.0.0, you should convert those callbacks to hooks manually:
- Comment out the
jobs
section in config. Add new items to thehooks
section. - Call
dipdup init
to update project structure and generate callback stubs. - Move code from old job callbacks to new hook callbacks.
- Remove the
jobs
directory from your project's root. - Restore the
jobs
section in config describing schedules for freshly created hooks as in an example above.
# Arguments typechecking
DipDup will ensure that arguments passed to the hooks have correct types when possible. CallbackTypeError
exception will be raised otherwise. Values of an args
mapping in a hook config should be either built-in types or __qualname__
of external type like decimal.Decimal
. Generic types are not supported: hints like Optional[int] = None
will be correctly parsed during codegen but ignored on type checking.
# Context (ctx
)
That is a brief reminder of what context is. The first argument of every callback in a DipDup project is called a context. Hook and handler callbacks receive instances of dipdup.context.HookContext
and dipdup.context.HandlerContext
, respectively. For now, these classes mostly share the same helper methods.
# ⚠ add_contract
and add_index
methods return coroutines
This change aims to save contracts and indexes spawned from within factories as soon as possible and thus correctly maintain the state of index factories.
# ⚠ commit
and reset
methods removed
Those methods were used to notify DipDup that the config has been modified during callback execution, and it's time to spawn missing indexes. Now the only correct way to add a new index in runtime is to call an add_index
method. Be careful! Modifying config via ctx.config
is not forbidden implicitly (this requirement is hard to enforce without extra CPU ticks), but adding a new item to the indexes
section will have no effect.
# New methods: fire_hook
, execute_sql
You can trigger hook execution either from handler callback or by job schedule. Or even from another hook if you're brave enough.
ctx.fire_hook('calculate_stats', major=True, depth=1)
The same applies to the execute_sql
method.
ctx.execute_sql('calculate_stats')
The execute_sql
argument could be either name of a file/directory inside of the sql
project directory or an absolute/relative path. If the path is a directory, all scripts having the .sql
extension within it will be executed in alphabetical order.
# Hasura
# ⚠ Hasura integration requires schema_name to be public
The current version of Hasura GraphQL Engine treats public
and other schemas differently. Table schema.customer
becomes schema_customer
root field (or schemaCustomer
if camel_case
option is enabled in DipDup config). Table public.customer
becomes customer
field, without schema prefix. There's no way to remove this prefix for now. You can track related issue at Hasura's GitHub to know when the situation will change. Since 3.0.0-rc1 DipDup enforces public
schema to avoid ambiguity and issues with the GenQL library. You can still use any schema name if Hasura integration is not enabled.
# Internal models
Internal table dipdup_state
used by DipDup to keep track of itself's state was removed. Four new models come to replace it:
model | table | description |
---|---|---|
dipdup.models.Schema | dipdup_schema | Hash of database schema to detect changes that require reindexing. |
dipdup.models.Index | dipdup_index | Indexing status, level of the latest processed block, template, and template values if applicable. Relates to Head when status is REALTIME (see dipdup.models.IndexStatus for possible values of status field) |
dipdup.models.Head | dipdup_head | The latest block received by a datasource from a WebSocket connection. |
dipdup.models.Contract | dipdup_contract | Nothing useful for us humans. It helps DipDup to keep track of dynamically spawned contracts. A Contract with the same name from the config takes priority over one from this table if {any, exists, provided?}. |
With help of these tables, you can set up monitoring of DipDup deployment to know when something goes wrong:
SELECT NOW() - timestamp FROM dipdup_head;
# Index factories
# ⚠ stateless
config option is removed
Index factories are now processed the same way as regular indexes do. DipDup will apply the following logic while restoring states of indexes on restart:
- Regular index: verify config hash and continue indexing
- Templated index: recreate index config from the template using saved values, verify config hash
- Templated index, but a template is missing: reindex
- Regular index, but missing in config: ignore (maybe it's just commented out for a while)
# Miscellaneous
- ⚠
first_block
/last_block
fields were renamed tofirst_level
andlast_level
respectively (used with--oneshot
CLI flag only). - ⚠
init
command does not overwrite typeclasses that have been already generated. Use the--overwrite-types
flag if it's not the desired behavior. - A long-awaited fix for a graceful shutdown. No more ugly stack traces on SIGTERM 🎉
- SQL scripts are executed with one transaction per statement. Queries that require to be executed in a single transaction now could be put to the same file.
- Exceptions, occurred during job callback execution are now considered critical and lead to DipDup crash.
- Fixed an issue when views and some other database entities survive reindexing.
- If callback execution takes longer than one second, a warning will be printed. Increase level of
dipdup.callbacks
logger to print it every time.
# ⚠ Known issues
Multiple issues related to WebSocket connection have been reported. TzKT outages are not processed correctly. We are aware of these issues and will try to fix them as soon as possible. DipDup crashes caused by WebSocket issues do not corrupt data already indexed, so a simple restart of the application is enough.
# What's next?
- The most critical task is the ability to subscribe to operations by an entrypoint rather than by specific addresses. This change should drastically reduce the load on TzKT server for index factories with hundreds of originations.
- Rollbacks of more than one block are infrequent but inevitable. We are going to implement the hotswap of database schemas to preserve data processed before rollback until reindexing is complete.
- Support streaming replication to make DipDup more scalable.
- Support sending transactions from DipDup in addition to indexing them. This is not a 20 minutes adventure, so no ETA yet.
DipDup is a free, open-source project driven by your, fellow Tezos developers, needs. Let us know what do you think about the recent changes and our further plans! Come join Baking Bad Telegram group, #baking-bad channel at tezos-dev Slack workspace, and our Discord server.