Skip to content

Fetchers

What is a fetcher?

In DBnomics, data acquisition is done by fetchers, small programs that download data from provider infrastructures and convert it to a common data model and format.

There is one fetcher per data provider.

Here is a diagram that shows the main steps of a fetcher:

flowchart LR
    provider@{ shape: cloud, label: "Data Provider<br/>(e.g. OECD)" }

    subgraph fetcher["Fetcher"]
        direction LR
        download["Download"]
        source_data[("Source<br/>data")]
        convert["Convert"]
        converted_data[("Converted<br/>data")]
    end

    style fetcher fill:#ffffff,stroke:#000000,stroke-dasharray: 3 3

    download -->|download| provider
    download -->|write| source_data
    convert -->|read| source_data
    convert -->|write| converted_data

The source code of the fetchers is hosted in a GitLab group: https://git.nomics.world/dbnomics-fetchers. Each repository is dedicated to a fetcher, for example https://git.nomics.world/dbnomics-fetchers/insee-fetcher.

To write a new fetcher or maintain an existing one, see contributing.

Fetcher pipeline

The fetcher pipeline is the part of the DBnomics infrastructure that runs fetchers and makes their output available on the DBnomics website and Web API.

The source code of the fetcher pipeline is available in the dbnomics-fetcher-pipeline repository.

Fetcher pipelines are scheduled regularly (daily by default) in order to keep DBnomics data up to date.

flowchart LR
    scheduler["GitLab scheduler"]

    subgraph fetcher["Fetcher domain"]
        direction LR
        download["Download job"]
        source_data[("Source data")]
        convert["Convert job"]
        converted_data[("Converted data")]
    end

    index["Index job"]
    validate["Data validation job"]
    deploy["Deploy job"]

    style fetcher fill:#ffffff,stroke:#000000,stroke-dasharray: 6 4

    scheduler --> download
    download --> convert
    convert --> index
    convert --> validate
    download -.->|push| source_data
    convert -.->|push| converted_data
    index --> deploy

Dashboard

The dashboard shows the status of the latest pipeline executions for each fetcher.

DBnomics dashboard

Its source code is available in the dbnomics-dashboard repository.