How to…¶
Access other SDMX data sources¶
pandaSDMX
ships with a file, sources.json, that includes information about the capabilities of many data sources.
However, any data source that generates SDMX 2.1 messages is supported.
There are multiple ways to access these:
Create a
pandasdmx.Request
without a named data source, then call theget()
method using the url argument:import pandasdmx as sdmx req = sdmx.Request() req.get(url='https://sdmx.example.org/path/to/webservice', ...)
Call
add_source()
with a JSON snippet describing the data provider.Create a subclass of
Source
, providing attribute values and optional implementations of hooks.
View log messages¶
See the description of pandasdmx.logger
.
Use the ‘references’ query parameter¶
SDMX web services support a references
parameter in HTTP requests which can take values such as ‘all’, ‘descendants’, etc.
This parameter instructs the web service to include, when generating a Data- or StructureMessage, the objects implicitly designated by the references
parameter alongside the explicit resource.
For example, for the request:
>>> response = some_agency.dataflow('SOME_ID', params={'references': 'all'})
the response will include:
the dataflow ‘SOME_ID’ explicitly specified,
the DSD referenced by the dataflow’s
structure
attribute,the code lists referenced by the DSD, and
any content-constraints which reference the dataflow or the DSD.
It is much more efficient to request many objects in a single request.
Thus, pandaSDMX provides default values for references
in common queries.
For instance, when a single dataflow is requested by specifying its ID, pandaSDMX sets references
to ‘all’.
On the other hand, when the dataflow ID is wildcarded, it is more practical not to request all referenced objects alongside as the response would likely be excessively large, and the user is deemed to be interested in the bird’s eye perspective (list of dataflows) prior to focusing on a particular dataflow and its descendents and ancestors.
The default value for the references
parameter can be overridden.
Some web services differ in how they handle references
—for instance, ESTAT.
See Data sources for details.
Use category schemes to explore data¶
SDMX supports category-schemes to categorize dataflow definitions and other objects. This helps retrieve, e.g., a dataflow of interest. Note that not all agencies support categoryschemes. A good example is the ECB. However, as the ECB’s SDMX service offers less than 100 dataflows, using categoryschemes is not strictly necessary. A counter-example is Eurostat which offers more than 6000 dataflows, yet does not categorize them. Hence, the user must search through the flat list of dataflows.
To search the list of dataflows by category, we request the category scheme from the ECB’s SDMX service and explore the response:
In [1]: import pandasdmx as sdmx
In [2]: ecb = sdmx.Request('ecb')
In [3]: cat_response = ecb.categoryscheme()
Like any other scheme, a category scheme is essentially a dict mapping ID’s to the actual SDMX objects. To display the categorised items, in our case the dataflow definitions contained in the category on exchange rates, we iterate over the Category instance:
In [4]: sdmx.to_pandas(cat_response.category_scheme.MOBILE_NAVI)
Out[4]:
name parent
MOBILE_NAVI
01 Monetary operations MOBILE_NAVI
02 Prices, output, demand and labour market MOBILE_NAVI
03 Monetary and financial statistics MOBILE_NAVI
04 Euro area accounts MOBILE_NAVI
05 Government finance MOBILE_NAVI
06 External transactions and positions MOBILE_NAVI
07 Exchange rates MOBILE_NAVI
08 Payments and securities trading, clearing, set... MOBILE_NAVI
09 Banknotes and Coins MOBILE_NAVI
10 Indicators of Financial Integration MOBILE_NAVI
11 Real Time Database (research database) MOBILE_NAVI
In [5]: cat_response.category_scheme.MOBILE_NAVI
Out[5]: <CategoryScheme ECB:MOBILE_NAVI(1.0) (11 items): Economic concepts>
New in version 0.5.
Select data frame layouts returned by to_pandas()
¶
to_pandas()
provides multiple ways to customize the type and layout of pandas objects returned for DataMessage
input.
One is the datetime argument; see Convert dimensions to pandas.DatetimeIndex or PeriodIndex.
The other is the rtype argument.
To select the same behaviour as pandaSDMX 0.9, give rtype = ‘compat’. This value is the default in pandaSDMX 1.0, but may change in a future version. With ‘compat’, the returned layout varies with the concept of “dimension at the observation level,” as follows:
Dimension At Observation Level |
Return Type |
---|---|
|
|
Same as datetime =
|
|
Other |
|
Limitations:
pandaSDMX can only obey rtype = ‘compat’ when reading or converting an entire
DataMessage
; not aDataSet
. While the concept of “dimension at observation level” is mentioned in the IM in relation to data sets, it is not formally included as an attribute of any class, or with any default value. (For instance, it is not included in theDimensionDescriptor
of aDataStructureDefinition
.) It can only be determined from the header of a SDMX-ML or -JSON data message.Except for
AllDimensions
, each row and column of the returned data frame contains multiple observations, so attributes cannot be included without ambiguity about which observation(s) have the attribute. In these cases, attributes are omitted; use rtype = ‘rows’ to retrieve them.
With the argument rtype = ‘rows’, or by setting DEFAULT_RTYPE
to ‘rows’:
In [6]: sdmx.writer.DEFAULT_RTYPE = 'rows'
…data are always returned with one row per observation.
Convert SDMX data to other formats¶
Pandas supports output to many popular file formats.
Call these methods on the objects returned by to_pandas()
.
For instance:
msg = sdmx.read_sdmx('data.xml')
sdmx.to_pandas(msg).to_excel('data.xlsx')
pandaSDMX can also be used with odo by registering methods for discovery and conversion:
import odo
from odo.utils import keywords
import pandas as pd
from toolz import keyfilter
import toolz.curried.operator as op
class PandaSDMX(object):
def __init__(self, uri):
self.uri = uri
@odo.resource.register(r'.*\.sdmx')
def _resource(uri, **kwargs):
return PandaSDMX(uri)
@odo.discover.register(PandaSDMX)
def _discover(obj):
return odo.discover(sdmx.to_pandas(sdmx.read_sdmx(obj.uri)))
@odo.convert.register(pd.DataFrame, PandaSDMX)
def _convert(obj, **kwargs):
msg = sdmx.read_sdmx(obj.uri)
return sdxm.to_pandas(msg, **keyfilter(op.contains(keywords(write)),
kwargs))
Deprecated since version 1.0: odo appears unmaintained since about 2016, so pandaSDMX no longer provides built-in registration.
New in version 0.4: pandasdmx.odo_register()
was added, providing automatic registration.
Validate SDMXML files against the official XML schemas¶
You can validate SDMXML messages against the XML schemas
included in the SDMX 2.1 standard. To do this, you
need to download the schemas from the sdmx.org website and
copy them to a local path. The convenience function
pandasdmx.api.install_schemas()
does this for you. By default, the schemas are installed in the platform- and user-specific
appdata dir. On Linux this is in /etc, on Windows in c:/users/<username>/appdata/local/pandasdmx/…
Optionally, you can pass a custom schema_dir .
After installing the schemas, you can request some SDMX message
as usual and pass it to
pandasdmx.api.Request.validate()
Alternatively, you can pass a file-like containing the XML data.
Here is an example:
import pandasdmx
pandasdmx.install_schemas()
ecb = pandasdmx.Request("ECB")
exr = ecb.dataflow("EXR")
ecb.validate(exr) # should return True