Contributing

Create or maintain fetchers

Tasks

Report problems with data

If you notice wrong data on the website, you can help by contributing at different levels.

First of all, you can tell DBnomics core team about the problem by creating a new issue and filling the template named "Problem with data". This template contains placeholders that you can replace with real values. The idea is to give as much details as possible to help the DBnomics team to investigate!

Then you can try to solve the issue by yourself if you'd like to. Once you identified the source code repository of the fetcher, you can fork it and submit a merge-request. We recommend doing that a discussion with the DBnomics core team on the issue you created.

In any case thank you for your contribution!

Validate data produced by a fetcher

Suppose you just finished writing or fixing a fetcher. Now you'd like to check the validity of data produced by convert.py. Run your fetcher if not already done:

mkdir source-data json-data
python download.py source-data
python convert.py source-data json-data

Now install the validation script and run it:

pip install dbnomics-data-model
dbnomics-validate --all-series --all-observations --developer-mode json-data

Example output:

- Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 3)
  Error code: duplicated-observations-period
  Message: Duplicated period
  Context:
    period: '2013-11-11'

- Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 5)
  Error code: duplicated-observations-period
  Message: Duplicated period
  Context:
    period: '2013-11-12'

[...]

Encountered errors codes:
    - duplicated-observations-period: 12448

At the end of the output you'll find a summary of the count of errors by type.

The --developer-mode option displays all errors, in particular the non fatal ones, in order to improve the quality of your fetcher. In production this option is not used to accelerate validation.

If your fetcher writes a huge quantity of data, you can remove the --all-series option to validate only a randomly chosen sample of series per dataset. You can also remove the --all-observations option to validate only a few observations per series.

Run a local instance of DBnomics

See dbnomics-docker.