5. Basic usage¶
5.1. Overview¶
This chapter illustrates the main steps of a typical workflow, namely:
- Choose a data provider
- Download the catalogue of dataflows available from the data provider and select a dataflow for further inspection
- download metadata on the selected dataflow including the datastructure definition, concepts, codelists and content constraints describing the datasets available through that dataflow
- Analyze the metadata as pandas DataFrames or by directly inspecting the Pythonic information model
- Specify the needed portions of the data from the dataflow by constructing a selection (“key”) of series and a period/time range for the prospective dataset
- Download the actual dataset specified by dataflow ID, key and period/time range
- write the dataset or selected series thereof to a pandas DataFrame or Series to analyze the dataset
Each of the steps share common tasks which flow from the architecture of pandaSDMX:
- Use an
pandasdmx.api.Request
instance to get an SDMX message from a web service or file. - Explore the returned
pandasdmx.api.Response
instance. The SDMX message is contained in itsmsg
attribute. Note that there are two types of message: DataMessage and StructureMessage. The former contains a data set, the latter contains structural metadata about one or more dataflows, most importantly one or more dataflow definitions and related metadata such as the datastructure definition, codelists, constraints etc.- check for errors
- explore the SDMX message contained in the
pandasdmx.api.Response
instance - write data or metadata to a pandas DataFrame or Series by Calling
pandasdmx.api.Response.write()
on the Response instance.
5.2. Connecting to an SDMX web service, caching¶
First, we instantiate pandasdmx.api.Request
. The constructor accepts an optional
agency ID as string. The list of supported agencies can be viewed
here, or as shown below.
In [1]: from pandasdmx import Request # '*' would do the same
In [2]: ecb = Request('ECB')
ecb
is now configured so as to make requests to the European Central Bank. If you want to
send requests to multiple agencies, instantiate multiple Request
objects.
5.2.1. Configuring the http connection¶
To pre-configure the HTTP connections to be established by a Request
instance,
you can pass all keyword arguments consumed by the underlying HTTP library
requests.
For a complete description of the options see the requests
documentation.
For example, a proxy server can be specified for subsequent requests like so:
In [3]: ecb_via_proxy = Request('ECB', proxies={'http': 'http://1.2.3.4:5678'})
HTTP request parameters are exposed through a dict. It may be modified between requests.
In [4]: ecb_via_proxy.client.config
Out[4]: {'proxies': {'http': 'http://1.2.3.4:5678'}, 'stream': True, 'timeout': 30.1}
The Request.client
attribute acts a bit like a requests.Session
in that it
conveniently stores the configuration for subsequent HTTP requests. Modify it to change the configuration. For convenience, pandasdmx.api.Request
has
a timeout
property to set the timeout in seconds for http requests.
5.2.2. Caching received files¶
Since v0.3.0, requests-cache is supported. To use it,
pass an optional cache
keyword argument to Request()
constructor.
If given, it must be a dict whose items will be passed to requests_cache.install_cache
function. Use it
to cache SDMX messages in databases such as MongoDB, Redis or SQLite.
See the requests-cache` docs for further information.
5.2.3. Loading a file instead of requesting it via http¶
Request
instances
can load SDMX messages from local files.
Issuing r = Request()
without passing any agency ID
instantiates a Request
object not tied to any agency. It may only be used to
load SDMX messages from files, unless a pre-fabricated URL is passed to pandasdmx.api.Request.get()
.
5.3. Obtaining and exploring metadata about datasets¶
This section illustrates how to download and explore metadata. Assume we are looking for time-series on exchange rates. Our best guess is that the European Central Bank provides a relevant dataflow. We could google for the dataflow ID or browse the ECB’s website. However, we choose to use SDMX metadata to get a complete overview of the dataflows the ECB provides.