Data sources

SDMX makes a distinction between data providers and sources:

  • a data provider is the original publisher of statistical information and metadata.

  • a data source is a specific web service that provides access to statistical information.

Each data source might aggregate and provide data or metadata from multiple data providers. Or, an agency might operate a data source that only contains information they provide themselves; in this case, the source and provider are identical.

pandaSDMX identifies each data source using a string such as 'ABS', and has built-in support for a number of data sources. Use list_sources() to list these. Read the following sections, or the file sources.json in the package source code, for more details.

pandaSDMX also supports adding other data sources; see add_source() and Source.

Data source limitations

Each SDMX web service provides a subset of the full SDMX feature set, so the same request made to two different sources may yield different results, or an error message.

A key difference is between sources offering SDMX-ML and SDMX-JSON APIs. SDMX-JSON APIs do not support metadata, or structure queries; only data queries.

Note

For JSON APIs, start by browsing the source’s website to retrieve the dataflow you’re interested in. Then try to fine-tune a planned data request by providing a valid key (= selection of series from the dataset). Because structure metadata is unavailable, pandaSDMX cannot automatically validate keys.

In order to anticipate and handle these differences:

  1. add_source() accepts “data_content_type” and “supported” keys. For example:

    [
      {
        "id": "ABS",
        "data_content_type": "JSON"
      },
      {
        "id": "UNESCO",
        "unsupported": ["datastructure"]
      },
    ]
    

    pandaSDMX will raise NotImplementedError on an attempt to query the “datastructure” API endpoint of either of these data sources.

  2. pandasdmx.source includes adapters (subclasses of Source) with hooks used when querying sources and interpreting their HTTP responses. These are documented below: ABS, ESTAT, and SGR.

ABS: Australian Bureau of Statistics

SDMX-JSON — Website

class pandasdmx.source.abs.Source(**kwargs)[source]
handle_response(response, content)[source]

Handle ABS’ own text/html error page for some endpoints.

ESTAT: Eurostat

SDMX-ML — Website

  • Thousands of dataflows on a wide range of topics.

  • No categorisations available.

  • Long response times are reported. Increase the timeout attribute to avoid timeout exceptions.

  • Does not return DSDs for dataflow requests with the references='all' query parameter.

class pandasdmx.source.estat.Source(**kwargs)[source]

Handle Eurostat’s mechanism for large datasets.

For some requests, ESTAT returns a DataMessage that has no content except for a <footer:Footer> element containing a URL where the data will be made available as a ZIP file.

To configure finish_message(), pass its get_footer_url argument to pandasdmx.api.Request.get().

New in version 0.2.1.

finish_message(message, request, get_footer_url=30, 3, **kwargs)[source]

Handle the initial response.

This hook identifies the URL in the footer of the initial response, makes a second request (polling as indicated by get_footer_url), and returns a new DataMessage with the parsed content.

Parameters

get_footer_url ((int, int)) – Tuple of the form (seconds, attempts), controlling the interval between attempts to retrieve the data from the URL, and the maximum number of attempts to make.

handle_response(response, content)[source]

Handle the polled response.

The request for the indicated ZIP file URL returns an octet-stream; this handler saves it, opens it, and returns the content of the single contained XML file.

modify_request_args(kwargs)[source]

Modify arguments used to build query URL.

This hook is called by pandasdmx.Request.get() to modify the keyword arguments before the query URL is built.

The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.

See SGR for an example override.

Returns

Return type

None

ECB: European Central Bank

SDMX-ML — Website

  • Supports categorisations of data-flows.

  • Supports preview_data and series-key based key validation.

  • In general short response times.

ILO: International Labour Organization

SDMX-ML — Website

  • pandasdmx.source.ilo.Source handles some particularities of the ILO web service. Others that are not handled:

    • Data flow IDs take on the role of a filter. E.g., there are dataflows for individual countries, ages, sexes etc. rather than merely for different indicators.

    • The service returns 413 Payload Too Large errors for some queries, with messages like: “Too many results, please specify codelist ID”. Test for pandasdmx.exceptions.HTTPError (= requests.exceptions.HTTPError) and/or specify a resource_id.

  • It is highly recommended to read the API guide.

class pandasdmx.source.ilo.Source(**kwargs)[source]
modify_request_args(kwargs)[source]

Handle two limitations of ILO’s REST service.

  1. Service returns SDMX-ML 2.0 by default, whereas pandaSDMX only supports SDMX-ML 2.1. Set ?format=generic_2_1 query parameter.

  2. The service does not support values ‘parents’, ‘parentsandsiblings’ (the pandaSDMX default), and ‘all’ for the references query parameter. Override the default with ?references=none.

    Note

    Valid values are: none, parents, parentsandsiblings, children, descendants, all, or a specific structure reference such as ‘codelist’.

IMF: International Monetary Fund’s “SDMX Central” source

SDMX-ML — Website

  • Subset of the data available on http://data.imf.org.

  • Supports series-key-only and hence dataset-based key validation and construction.

INEGI: National Institute of Statistics and Geography (Mexico)

SDMX-ML — Website.

  • Spanish name: Instituto Nacional de Estadística y Geografía.

INSEE: National Institute of Statistics and Economic Studies (France)

SDMX-ML — Website

  • French name: Institut national de la statistique et des études économiques.

Warning

An issue has been reported apparently due to a missing pericite codelist in StructureMessages. This may cause crashes. Avoid downloading this type of message. Prepare the key as string using the web interface, and simply download a dataset.

ISTAT: National Institute of Statistics (Italy)

SDMX-ML — Website

  • Italian name: Istituto Nazionale di Statistica.

  • Similar server platform to Eurostat, with similar capabilities.

NB: Norges Bank (Norway)

SDMX-ML — Website

  • Few dataflows. So do not use categoryscheme.

  • It is unknown whether NB supports series-keys-only.

OECD: Organisation for Economic Cooperation and Development

SDMX-JSON — Website

SGR: SDMX Global Registry

SDMX-ML — Website

class pandasdmx.source.sgr.Source(**kwargs)[source]
handle_response(response, content)[source]

SGR responses do not specify content-type; set it directly.

modify_request_args(kwargs)[source]

SGR is a data source but not a data provider.

Override the agency argument by setting agency='all' to retrieve all data republished by SGR from different providers.

UNSD: United Nations Statistics Division

SDMX-ML — Website

  • Supports preview_data and series-key based key validation.

Warning

supports categoryscheme even though it offers very few dataflows. Use this feature with caution. Moreover, it seems that categories confusingly include dataflows which UNSD does not actually provide.

UNESCO: UN Educational, Scientific and Cultural Organization

SDMX-ML — Website

  • Free registration required; user credentials must be provided either as parameter or HTTP header with each request.

Warning

An issue with structure-specific datasets has been reported. It seems that Series are not recognized due to some oddity in the XML format.

WB: World Bank Group’s “World Integrated Trade Solution”

SDMX-ML — Website