Ten-line usage example¶

Suppose we want to analyze annual unemployment data for some European countries. All we need to know in advance is the data provider: Eurostat.

pandaSDMX makes it easy to search the directory of dataflows, and the complete structural metadata about the datasets available through the selected dataflow. (This example skips these steps; see the walkthrough.)

The data we want is in a data flow with the identifier UNE_RT_A. This dataflow references a data structure definition (DSD) with that same ID. The DSD contains or references all the metadata describing data sets available through this dataflow: the concepts, things measured, dimensions, and lists of codes used to label each dimension.

In [1]: import pandasdmx as sdmx

In [2]: estat = sdmx.Request('ESTAT')

Download the metadata:

In [3]: metadata = estat.datastructure('UNE_RT_A')

In [4]: metadata
Out[4]: 
<pandasdmx.StructureMessage>
  <Header>
    id: 'DSD1676610120'
    prepared: '2023-02-17T05:02:00.711000+00:00'
    sender: <Agency ESTAT>
    source: 
    test: False
  response: <Response [200]>
  Codelist (6): UNIT AGE GEO OBS_FLAG SEX FREQ
  ConceptScheme (1): UNE_RT_A
  DataStructureDefinition (1): UNE_RT_A

Explore the contents of some code lists:

In [5]: for cl in 'AGE', 'UNIT':
   ...:     print(sdmx.to_pandas(metadata.codelist[cl]))
   ...: 
                              name parent
AGE                                      
TOTAL                        Total    AGE
LFD              Late foetal death    AGE
LFD1   Late foetal death (group 1)    AGE
LFD2   Late foetal death (group 2)    AGE
MN0                   Zero minutes    AGE
...                            ...    ...
AVG                        Average    AGE
NRP                    No response    AGE
NSP                  Not specified    AGE
OTH                          Other    AGE
UNK                        Unknown    AGE

[654 rows x 2 columns]
                                                            name parent
UNIT                                                                   
TOTAL                                                      Total   UNIT
NR                                                        Number   UNIT
NR_HAB                                     Number per inhabitant   UNIT
THS                                                     Thousand   UNIT
MIO                                                      Million   UNIT
...                                                          ...    ...
PD_PCH_SM_NAC  Price index (implicit deflator), percentage ch...   UNIT
CRC_MEUR                 Current replacement costs, million euro   UNIT
CRC_MNAC       Current replacement costs, million units of na...   UNIT
PYR_MEUR           Previous year replacement costs, million euro   UNIT
PYR_MNAC       Previous year replacement costs, million units...   UNIT

[695 rows x 2 columns]

Next we download a dataset. To obtain data on Greece, Ireland and Spain only, we use codes from the code list ‘GEO’ to specify a key for the dimension named ‘geo’. We also use a query parameter, ‘startPeriod’, to limit the scope of the data returned:

In [6]: resp = estat.data(
   ...:     'UNE_RT_A',
   ...:     key={'geo': 'EL+ES+IE'},
   ...:     params={'startPeriod': '2015'},
   ...:     )
   ...: 

resp is a DataMessage object. We use its to_pandas() method to convert it to a pandas.DataFrame, and select on the age dimension we saw in the metadata above:

In [7]: data = resp.to_pandas(
   ...:     datetime={'dim': 'TIME_PERIOD', 'freq': 'freq'}).xs('Y15-74', level='age',
   ...:         axis=1, drop_level=False)
   ...: 

We can now explore the data set as expressed in a familiar pandas object. First, show dimension names:

In [8]: data.columns.names
Out[8]: FrozenList(['age', 'unit', 'sex', 'geo'])

…and corresponding key values along these dimensions:

In [9]: data.columns.levels
Out[9]: FrozenList([['Y15-24', 'Y15-29', 'Y15-74', 'Y20-64', 'Y25-54', 'Y25-74', 'Y55-74'], ['PC_ACT', 'PC_POP', 'THS_PER'], ['F', 'M', 'T'], ['EL', 'ES', 'IE']])

Select some data of interest: show aggregate unemployment rates across ages (‘Y15-74’ on the AGE dimension) and sexes (‘T’ on the SEX dimension), expressed as a percentage of active population (‘PC_ACT’ on the UNIT dimension):

In [10]: data.loc[:, ('Y15-74', 'PC_ACT', 'T')]
Out[10]: 
geo            EL    ES   IE
TIME_PERIOD                 
2015         25.0  22.1  9.9
2016         23.9  19.6  8.4
2017         21.8  17.2  6.7
2018         19.7  15.3  5.8
2019         17.9  14.1  5.0
2020         17.6  15.5  5.9
2021         14.7  14.8  6.2

Ten-line usage example¶

pandaSDMX

Navigation

Related Topics