Data model
Useful links:
Entities
DBnomics data model is defined by the following entities:
Provider
code
: to have an URL well defined, a provider MUST have a code (example)
Dataset
code
: to have an URL well defined, a dataset MUST have a code (example)
Dataset releases
Sometimes providers publish datasets with releases.
In DBnomics each dataset release is a regular dataset named after the pattern {dataset_code}:{release_code}
.
For example, IMF provides WEO every 6 months (e.g. 2019-04
, 2019-10
, 2020-04
, etc.).
DBnomics datasets would be named WEO:2019-04
, WEO:2019-10
, WEO:2020-04
, etc.
The release code latest
is reserved for accessing the latest release of a dataset. This is featured in DBnomics website, the Web API, and all the DBnomics clients as long as they follow the HTTP redirections.
See also: dataset releases.
Series
code
: to have an URL well defined, a series MUST have a code (example)name
: SHOULD be unique
TODO: if I generate a series code from a name, what characters are valid?
Duplicate series names
Some providers give the same names to many series. DBnomics data model accepts duplicate series names, even if it's not recommended.
The user will be able to distinguish those time series with the same name by looking at their code or dimensions.
Remind that one of DBnomics features is to redistribute provider data as-is. This situation would have been the same by accessing data from the provider website.
However the data validation script displays an error if run in developer mode.
Missing series codes
Some providers distribute time series with arbitrary codes (e.g. AMECO/UING/AUT.1.0.0.0.UING), whereas some other don't give codes to series, but dimensions (e.g. {"FREQ": "A", "REF_AREA": "FR"}
). In the latter case, as the series code is a hard constraint, it has to be generated from the dimensions (e.g. A.FR
).
Dimensions
Ideally, giving a value to all the dimensions of a dataset should return a unique time series. But sometimes, due to errors in provider data or because of modelization choices from providers, they distribute datasets with more than one time series per dimension set.
For example, this search by dimension where every dimension has a value selected, matches 3 time series (see also API link).
TODO: if I generate a dimension code from a label, what characters are valid?