Design goals
Redistribute data from providers as-is
We want our users to be aware that the data found on DBnomics is similar to the provider data. On the other hand, we want our users to avoid dealing with data representation specificities.
As a consequence, DBnomics distinguishes data from its format, and simplifies format only.
If DBnomics simplified or harmonized provider data, that would require more manual work (i.e. data curation), and this would be incompatible with DBnomics automatic data fetching (see next section), and it would be impossible for the user to know what the provider data was. So data curation is left to the user.
The following items are kept as-is from the provider:
- time series and their observations
- dataset dimensions: DBnomics does not harmonize dimension names and values.
- NA (non-available) values usage: DBnomics does not add or remove them. If a provider distributes a time series with an incomplete calendar (with some missing periods) DBnomics does not tries to complete it.
However some data formatting is harmonized:
- periods: some providers use different codes to represent them (
202001
,2020M01
for January, 2020). DBnomics always use2020-01
. See below for all period formats. - NA (non-available) values: some providers use
NaN
, some other-9999
, etc. DBnomics always useNA
.
Some providers distribute time series with no observation, or with only NA values, and DBnomics keeps them as-is as well. Here are some examples:
Update data regularly
We want up-to-date data on DBnomics, so data has to be updated automatically.
Data acquisition is done by small programs called DBnomics fetchers which are run automatically by the DBnomics platform.
Any manual data acquisition (e.g. copy-pasting values from a spreadsheet) would lead to outdated data.
We also want to keep track of the execution of fetchers, and that's way we have a dashboard.
Keep versions of provider data
Access data from programming languages
- Python (cf download data page)
- R
- Julia
- Stata
Access data from external software
- Gretl with dbnomics addon
- LibreOffice Calc (cf download data page)
Harmonized data model
Period format
Dimensions are provided as-is from provider data.
Period format is normalized:
YYYY
for yearsYYYY-MM
for months (e.g.2000-01
,2000-11
)YYYY-MM-DD
for days (MUST be padded forMM
andDD
)YYYY-Q[1-4]
for year quarters- example:
2018-Q1
represents jan to mar 2018, and2018-Q4
represents oct to dec 2018 YYYY-S[1-2]
for year semesters (aka bi-annual, semi-annual)- example:
2018-S1
represents jan to jun 2018, and2018-S2
represents jul to dec 2018 YYYY-B[1-6]
for pairs of months (aka bi-monthly)- example:
2018-B1
represents jan + feb 2018, and2018-B6
represents nov + dec 2018 YYYY-W[01-53]
for year weeks (MUST be padded)
Normalization is done by each fetcher based on the knowledge of the provider data.
For example, a period like 2000-qII
would be normalized as 2000-Q2
by the conversion script of the fetcher.
Note: in the case the time series periods have a daily format with a lower frequency (e.g. monthly), then the period format is simplified to match the frequency. For example, periods like 2000-01-01, 2000-02-01, 2000-03-01
are simplified as 2000-01, 2000-02, 2000-03
, but periods like 2000-01-15, 2000-02-15, 2000-03-15
can't be simplified because we would lose the first day information they convey.
Support different data models
DBnomics defines a data model inspired from SDMX, which has to be compatible with all supported providers, even if their own data model is not SDMX-compliant.
As a consequence, DBnomics data model defines hard constraints, but some other constraints have to be soft (cf data model).
Single instance
- it is technically possible to install new instances of DBnomics
- but it's interesting to have one public instance in order to avoid users searching on many of them
- dbnomics was not designed as a federated software