pandaSDMX: Statistical Data and Metadata eXchange in Python

pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming the most intuitive and versatile tool to retrieve and acquire statistical data and metadata disseminated in SDMX format. It supports out of the box the SDMX services of the European statistics office (Eurostat), the European Central Bank (ECB), the French National Institute for statistics (INSEE), and the OECD (JSON only). pandaSDMX can export data and metadata as pandas DataFrames, the gold-standard of data analysis in Python. From pandas you can export data and metadata to Excel, R and friends. As from version 0.4, pandaSDMX can export data to many other file formats and database backends via Odo.

Main features

  • support for many SDMX features including
    • generic data sets in SDMXML format
    • compact data sets in SDMXJSON format (OECD only)
    • data structure definitions, code lists and concept schemes
    • dataflow definitions and content-constraints
    • categorisations and category schemes
  • pythonic representation of the SDMX information model
  • When requesting datasets, validate column selections against code lists and content-constraints if available
  • export data and structural metadata such as code lists as multi-indexed pandas DataFrames or Series, and many other formats and database backends via Odo
  • read and write SDMX messages to and from local files
  • configurable HTTP connections
  • support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite
  • extensible through custom readers and writers for alternative input and output formats of data and metadata
  • growing test suite


Suppose we want to analyze annual unemployment data for some European countries. All we need to know in advance is the data provider, eurostat. pandaSDMX makes it super easy to search the directory of dataflows, and the complete structural metadata about the datasets available through the selected dataflow. We will skip this step here. The impatient reader may directly jump to Basic usage. The dataflow with the ID ‘une_rt_a’ contains the data we want. The dataflow definition references a datastructure definition with the ID ‘DSD_une_rt_a’. It contains or references all the metadata describing data sets available through this dataflow: the dimensions, concept schemes, and corresponding code lists.

In [1]: from pandasdmx import Request

In [2]: estat = Request('ESTAT')

# Download the metadata and expose it as a dict mapping resource names to pandas DataFrames
In [3]: metadata = estat.datastructure('DSD_une_rt_a').write()

# Show some code lists
In [4]: metadata.codelist.ix[['AGE', 'UNIT']]
             dim_or_attr                             name
AGE  AGE               D                              AGE
     TOTAL             D                            Total
     Y25-74            D              From 25 to 74 years
     Y_LT25            D               Less than 25 years
UNIT UNIT              D                             UNIT
     PC_ACT            D  Percentage of active population
     PC_POP            D   Percentage of total population
     THS_PER           D                 Thousand persons

Next we download a data set. We use codes from the code list ‘GEO’ to obtain data on Greece, Ireland and Spain only.

In [5]: resp = Request('ESTAT').data('une_rt_a', key={'GEO': 'EL+ES+IE'}, params={'startPeriod': '2007'})

# We use a generator expression to narrow down the column selection
# and write these columns to a pandas DataFrame
In [6]: data = resp.write((s for s in if s.key.AGE == 'TOTAL'))

# Explore the data set. First, show dimension names
In [7]: data.columns.names
Out[7]: FrozenList(['UNIT', 'AGE', 'SEX', 'GEO', 'FREQ'])

# and corresponding dimension values
In [8]: data.columns.levels
Out[8]: FrozenList([['PC_ACT', 'PC_POP', 'THS_PER'], ['TOTAL'], ['F', 'M', 'T'], ['EL', 'ES', 'IE'], ['A']])

# Show aggregate unemployment rates across ages and sexes as
# percentage of active population
In [9]: data.loc[:, ('PC_ACT', 'TOTAL', 'T')]
GEO            EL    ES    IE
FREQ            A     A     A
2015         24.9  22.1   9.4
2014         26.5  24.5  11.3
2013         27.5  26.1  13.1
2012         24.5  24.8  14.7
2011         17.9  21.4  14.7
2010         12.7  19.9  13.9
2009          9.6  17.9  12.0
2008          7.8  11.3   6.4
2007          8.4   8.2   4.7

Indices and tables