Skip to content

Contributing

You can contribute in several ways:

Report problems with data

If you notice wrong data on the website, you can help by contributing at different levels.

First, you can notify the DBnomics core team about the problem by creating a new issue and filling in the template named "Problem with data". This template contains placeholders that you can replace with real values. The goal is to provide as much detail as possible to help the DBnomics team investigate.

Then you can try to solve the issue yourself if you'd like. Once you have identified the source code repository of the fetcher, you can fork it and submit a merge request. We recommend doing this after discussing with the DBnomics core team on the issue you created.

In any case, thank you for your contribution.

Validate data produced by a fetcher

Suppose you just finished writing or fixing a fetcher. Now you'd like to check the validity of data produced by convert.py. Run your fetcher if not already done:

mkdir source-data json-data
python download.py source-data
python convert.py source-data json-data

Now install the validation script and run it:

pip install dbnomics-data-model
dbnomics-validate --all-series --all-observations --developer-mode json-data

Example output:

- Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 3)
  Error code: duplicated-observations-period
  Message: Duplicated period
  Context:
    period: '2013-11-11'

- Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 5)
  Error code: duplicated-observations-period
  Message: Duplicated period
  Context:
    period: '2013-11-12'

[...]

Encountered errors codes:
    - duplicated-observations-period: 12448

At the end of the output you'll find a summary of the count of errors by type.

The --developer-mode option displays all errors, in particular the non fatal ones, in order to improve the quality of your fetcher. In production this option is not used to accelerate validation.

If your fetcher writes a huge quantity of data, you can remove the --all-series option to validate only a randomly chosen sample of series per dataset. You can also remove the --all-observations option to validate only a few observations per series.