8.1. pandasdmx package

8.1.2. Submodules

8.1.3. pandasdmx.api module

This module defines two classes: pandasdmx.api.Request and pandasdmx.api.Response. Together, these form the high-level API of pandasdmx. Requesting data and metadata from an SDMX server requires a good understanding of this API and a basic understanding of the SDMX web service guidelines only the chapters on REST services are relevant as pandasdmx does not support the SOAP interface.

class pandasdmx.api.Request(agency='', cache=None, log_level=None, **http_cfg)

Bases: object

Get SDMX data and metadata from remote servers or local files.

agency
clear_cache()
get(resource_type='', resource_id='', agency='', version=None, key='', params={}, headers={}, fromfile=None, tofile=None, url=None, get_footer_url=(30, 3), memcache=None, writer=None, dsd=None)

get SDMX data or metadata and return it as a pandasdmx.api.Response instance.

While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs for the SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Request instance for each data provider you want to access, or pass a pre-fabricated URL through the url parameter.

Parameters:
  • resource_type (str) – the type of resource to be requested. Values must be one of the items in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It is used for URL construction, not to read the received SDMX file. Hence, if fromfile is given, resource_type may be ‘’. Defaults to ‘’.
  • resource_id (str) – the id of the resource to be requested. It is used for URL construction. Defaults to ‘’.
  • agency (str) – ID of the agency providing the data or metadata. Used for URL construction only. It tells the SDMX web service which agency the requested information originates from. Note that an SDMX service may provide information from multiple data providers. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to __init__() which specifies the SDMX web service to be accessed.
  • key (str, dict) – select columns from a dataset by specifying dimension values. If type is str, it must conform to the SDMX REST API, i.e. dot-separated dimension values. If ‘key’ is of type ‘dict’, it must map dimension names to allowed dimension values. Two or more values can be separated by ‘+’ as in the str form. The DSD will be downloaded and the items are validated against it before downloading the dataset.
  • params (dict) – defines the query part of the URL. The SDMX web service guidelines (www.sdmx.org) explain the meaning of permissible parameters. It can be used to restrict the time range of the data to be delivered (startperiod, endperiod), whether parents, siblings or descendants of the specified resource should be returned as well (e.g. references=’parentsandsiblings’). Sensible defaults are set automatically depending on the values of other args such as resource_type. Defaults to {}.
  • headers (dict) – http headers. Given headers will overwrite instance-wide headers passed to the constructor. Defaults to None, i.e. use defaults from agency configuration
  • fromfile (str) – path to the file to be loaded instead of accessing an SDMX web service. Defaults to None. If fromfile is given, args relating to URL construction will be ignored.
  • tofile (str) – file path to write the received SDMX file on the fly. This is useful if you want to load data offline using fromfile or if you want to open an SDMX file in an XML editor.
  • url (str) – URL of the resource to download. If given, any other arguments such as resource_type or resource_id are ignored. Default is None.
  • get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts). Determines the behavior in case the received SDMX message has a footer where one of its lines is a valid URL. get_footer_url defines how many attempts should be made to request the resource at that URL after waiting so many seconds before each attempt. This behavior is useful when requesting large datasets from Eurostat. Other agencies do not seem to send such footers. Once an attempt to get the resource has been successful, the original message containing the footer is dismissed and the dataset is returned. The tofile argument is propagated. Note that the written file may be a zip archive. pandaSDMX handles zip archives since version 0.2.1. Defaults to (30, 3).
  • memcache (str) – If given, return Response instance if already in self.cache(dict),
  • download resource and cache Response instance. (otherwise) –
writer(str): optional custom writer class.
Should inherit from pandasdmx.writer.BaseWriter. Defaults to None, i.e. one of the included writers is selected as appropriate.
dsd(model.DataStructure): DSD to be passed on to the sdmxml reader
to process a structure-specific dataset without an incidental http request.
Returns:
instance containing the requested
SDMX Message.
Return type:pandasdmx.api.Response
classmethod list_agencies()

Return a sorted list of valid agency IDs. These can be used to create Request instances.

classmethod load_agency_profile(source)

Classmethod loading metadata on a data provider. source must be a json-formated string or file-like object describing one or more data providers (URL of the SDMX web API, resource types etc. The dict Request._agencies is updated with the metadata from the source.

Returns None

preview_data(flow_id, key=None, count=True, total=True)

Get keys or number of series for a prospective dataset query allowing for keys with multiple values per dimension. It downloads the complete list of series keys for a dataflow rather than using constraints and DSD. This feature is, however, not supported by all data providers. ECB and UNSD are known to work.

Args:

flow_id(str): dataflow id

key(dict): optional key mapping dimension names to values or lists of values.
Must have been validated before. It is not checked if key values are actually valid dimension names and values. Default: {}
count(bool): if True (default), return the number of series
of the dataset designated by flow_id and key. If False, the actual keys are returned as a pandas DataFrame or dict of dataframes, depending on the value of ‘total’.
total(bool): if True (default), return the aggregate number
of series or a single dataframe (depending on the value of ‘count’). If False, return a dict mapping keys to dataframes of series keys. E.g., if key={‘COUNTRY’:’IT+CA+AU’}, the dict will have 3 items describing the series keys for each country respectively. If ‘count’ is True, dict values will be int rather than PD.DataFrame.
series_keys(flow_id, cache=True)

Get an empty dataset with all possible series keys.

Return a pandas DataFrame. Each column represents a dimension, each row a series key of datasets of the given dataflow.

timeout
class pandasdmx.api.ResourceGetter(resource_type)

Bases: object

Descriptor to wrap Request.get vor convenient calls without specifying the resource as arg.

class pandasdmx.api.Response(msg, url, headers, status_code, writer=None)

Bases: object

Container class for SDMX messages.

It is instantiated by .

msg

pandasdmx.model.Message – a pythonic representation of the SDMX message

status_code

int – the status code from the http response, if any

url

str – the URL, if any, that was sent to the SDMX server

headers

dict – http response headers returned by ‘’requests’‘

write()

wrapper around the writer’s write method. Arguments are propagated to the writer.

write(source=None, **kwargs)

Wrappe r to call the writer’s write method if present.

Parameters:source (pandasdmx.model.Message, iterable) – stuff to be written. If a pandasdmx.model.Message is given, the writer itself must determine what to write unless specified in the keyword arguments. If an iterable is given, the writer should write each item. Keyword arguments may specify what to do with the output depending on the writer’s API. Defaults to self.msg.
Returns:anything the writer returns.
Return type:type
write_source(filename)

write xml file by calling the ‘write’ method of lxml root element. Useful to save the xml source file for offline use. Similar to passing tofile arg to Request.get()

Parameters:filename (str) – name/path of target file
Returns:whatever the LXML deserializer returns.
exception pandasdmx.api.SDMXException

Bases: Exception

8.1.4. pandasdmx.model module

This module is part of the pandaSDMX package

SDMX 2.1 information model
  1. 2014 Dr. Leo (fhaxbox66@gmail.com)
class pandasdmx.model.AnnotableArtefact(reader, elem, **kwargs)

Bases: pandasdmx.model.SDMXObject

annotations
class pandasdmx.model.Annotation(reader, elem, **kwargs)

Bases: pandasdmx.model.SDMXObject

annotationtype
id
text
title
url
class pandasdmx.model.AttributeDescriptor(*args, **kwargs)

Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Categorisation(*args, **kwargs)

Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.Categorisations(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject, pandasdmx.utils.DictLike

class pandasdmx.model.Category(*args, **kwargs)

Bases: pandasdmx.model.Item

class pandasdmx.model.CategoryScheme(*args, **kwargs)

Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Code(*args, **kwargs)

Bases: pandasdmx.model.Item

class pandasdmx.model.Codelist(*args, **kwargs)

Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Component(*args, **kwargs)

Bases: pandasdmx.model.IdentifiableArtefact

concept
concept_identity
local_repr
class pandasdmx.model.ComponentList(*args, **kwargs)

Bases: pandasdmx.model.IdentifiableArtefact, pandasdmx.model.Scheme

class pandasdmx.model.Concept(*args, **kwargs)

Bases: pandasdmx.model.Item

class pandasdmx.model.ConceptScheme(*args, **kwargs)

Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Constrainable

Bases: object

class pandasdmx.model.Constraint(*args, **kwargs)

Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.ContentConstraint(*args, **kwargs)

Bases: pandasdmx.model.Constraint

class pandasdmx.model.CubeRegion(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.DataAttribute(*args, **kwargs)

Bases: pandasdmx.model.Component

related_to
usage_status
class pandasdmx.model.DataMessage(*args, **kwargs)

Bases: pandasdmx.model.Message

class pandasdmx.model.DataSet(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

dim_at_obs
iter_groups
obs(with_values=True, with_attributes=True)

return an iterator over observations in a flat dataset. An observation is represented as a namedtuple with 3 fields (‘key’, ‘value’, ‘attrib’).

obs.key is a namedtuple of dimensions. Its field names represent dimension names, its values the dimension values.

obs.value is a string that can in in most cases be interpreted as float64 obs.attrib is a namedtuple of attribute names and values.

with_values and with_attributes: If one or both of these flags is False, the respective value will be None. Use these flags to increase performance. The flags default to True.

series

return an iterator over Series instances in this DataSet. Note that DataSets in flat format, i.e. header.dim_at_obs = “AllDimensions”, have no series. Use DataSet.obs() instead.

class pandasdmx.model.DataStructureDefinition(*args, **kwargs)

Bases: pandasdmx.model.Structure, pandasdmx.model.Constrainable

class pandasdmx.model.DataflowDefinition(*args, **kwargs)

Bases: pandasdmx.model.StructureUsage, pandasdmx.model.Constrainable

class pandasdmx.model.Dimension(*args, **kwargs)

Bases: pandasdmx.model.Component

class pandasdmx.model.DimensionDescriptor(*args, **kwargs)

Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Facet(facet_type=None, facet_value_type='', itemscheme_facet='', *args, **kwargs)

Bases: object

facet_type = {}
facet_value_type = ('String', 'Big Integer', 'Integer', 'Long', 'Short', 'Double', 'Boolean', 'URI', 'DateTime', 'Time', 'GregorianYear', 'GregorianMonth', 'GregorianDate', 'Day', 'MonthDay', 'Duration')
itemscheme_facet = ''
class pandasdmx.model.Footer(reader, elem, **kwargs)

Bases: pandasdmx.model.SDMXObject

code
severity
text
class pandasdmx.model.Group(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.Header(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

error
id
prepared
receiver
sender
class pandasdmx.model.IdentifiableArtefact(*args, **kwargs)

Bases: pandasdmx.model.AnnotableArtefact

uri
class pandasdmx.model.Item(*args, **kwargs)

Bases: pandasdmx.model.NameableArtefact

children
class pandasdmx.model.ItemScheme(*args, **kwargs)

Bases: pandasdmx.model.MaintainableArtefact, pandasdmx.model.Scheme

is_partial
class pandasdmx.model.KeyValue(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.MaintainableArtefact(*args, **kwargs)

Bases: pandasdmx.model.VersionableArtefact

is_external_ref
is_final
maintainer
service_url
structure_url
class pandasdmx.model.MeasureDescriptor(*args, **kwargs)

Bases: pandasdmx.model.ComponentList

class pandasdmx.model.MeasureDimension(*args, **kwargs)

Bases: pandasdmx.model.Dimension

class pandasdmx.model.Message(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.NameableArtefact(*args, **kwargs)

Bases: pandasdmx.model.IdentifiableArtefact

description
name
class pandasdmx.model.PrimaryMeasure(*args, **kwargs)

Bases: pandasdmx.model.Component

class pandasdmx.model.Ref(reader, elem, **kwargs)

Bases: pandasdmx.model.SDMXObject

agency_id
id
maintainable_parent_id
package
ref_class
resolve()
version
class pandasdmx.model.ReportingYearStartDay(*args, **kwargs)

Bases: pandasdmx.model.DataAttribute

class pandasdmx.model.Representation(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.SDMXObject(reader, elem, **kwargs)

Bases: object

class pandasdmx.model.Scheme(*args, **kwargs)

Bases: pandasdmx.utils.DictLike

aslist()
class pandasdmx.model.Series(*args, **kwargs)

Bases: pandasdmx.model.SDMXObject

group_attrib

return a namedtuple containing all attributes attached to groups of which the given series is a member for each group of which the series is a member

obs(with_values=True, with_attributes=True, reverse_obs=False)

return an iterator over observations in a series. An observation is represented as a namedtuple with 3 fields (‘key’, ‘value’, ‘attrib’). obs.key is a namedtuple of dimensions, obs.value is a string value and obs.attrib is a namedtuple of attributes. If with_values or with_attributes is False, the respective value is None. Use these flags to increase performance. The flags default to True.

class pandasdmx.model.Structure(*args, **kwargs)

Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.StructureMessage(*args, **kwargs)

Bases: pandasdmx.model.Message

class pandasdmx.model.StructureUsage(*args, **kwargs)

Bases: pandasdmx.model.MaintainableArtefact

structure
class pandasdmx.model.TimeDimension(*args, **kwargs)

Bases: pandasdmx.model.Dimension

class pandasdmx.model.VersionableArtefact(*args, **kwargs)

Bases: pandasdmx.model.NameableArtefact

valid_from
valid_to
version

8.1.5. pandasdmx.remote module

This module is part of pandaSDMX. It contains a classes for http access.

class pandasdmx.remote.REST(cache, http_cfg)

Bases: object

Query SDMX resources via REST or from a file

The constructor accepts arbitrary keyword arguments that will be passed to the requests.get function on each call. This makes the REST class somewhat similar to a requests.Session. E.g., proxies or authorisation data needs only be provided once. The keyword arguments are stored in self.config. Modify this dict to issue the next ‘get’ request with changed arguments.

get(url, fromfile=None, params={}, headers={})

Get SDMX message from REST service or local file

Parameters:
  • url (str) – URL of the REST service without the query part If None, fromfile must be set. Default is None
  • params (dict) – will be appended as query part to the URL after a ‘?’
  • fromfile (str) – path to SDMX file containing an SDMX message. It will be passed on to the reader for parsing.
  • headers (dict) – http headers. Overwrite instance-wide headers. Default is {}
Returns:

three objects:

  1. file-like object containing the SDMX message
  2. the complete URL, if any, including the query part constructed from params
  3. the status code

Return type:

tuple

Raises:

HTTPError if SDMX service responded with – status code 401. Otherwise, the status code is returned

max_size = 16777216
request(url, params={}, headers={})

Retrieve SDMX messages. If needed, override in subclasses to support other data providers.

Parameters:url (str) – The URL of the message.
Returns:the xml data as file-like object
pandasdmx.remote.is_url(s)

return True if s (str) is a valid URL, False otherwise.

8.1.6. Module contents

pandaSDMX - a Python package for SDMX - Statistical Data and Metadata eXchange

class pandasdmx.Request(agency='', cache=None, log_level=None, **http_cfg)

Bases: object

Get SDMX data and metadata from remote servers or local files.

agency
clear_cache()
get(resource_type='', resource_id='', agency='', version=None, key='', params={}, headers={}, fromfile=None, tofile=None, url=None, get_footer_url=(30, 3), memcache=None, writer=None, dsd=None)

get SDMX data or metadata and return it as a pandasdmx.api.Response instance.

While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs for the SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Request instance for each data provider you want to access, or pass a pre-fabricated URL through the url parameter.

Parameters:
  • resource_type (str) – the type of resource to be requested. Values must be one of the items in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It is used for URL construction, not to read the received SDMX file. Hence, if fromfile is given, resource_type may be ‘’. Defaults to ‘’.
  • resource_id (str) – the id of the resource to be requested. It is used for URL construction. Defaults to ‘’.
  • agency (str) – ID of the agency providing the data or metadata. Used for URL construction only. It tells the SDMX web service which agency the requested information originates from. Note that an SDMX service may provide information from multiple data providers. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to __init__() which specifies the SDMX web service to be accessed.
  • key (str, dict) – select columns from a dataset by specifying dimension values. If type is str, it must conform to the SDMX REST API, i.e. dot-separated dimension values. If ‘key’ is of type ‘dict’, it must map dimension names to allowed dimension values. Two or more values can be separated by ‘+’ as in the str form. The DSD will be downloaded and the items are validated against it before downloading the dataset.
  • params (dict) – defines the query part of the URL. The SDMX web service guidelines (www.sdmx.org) explain the meaning of permissible parameters. It can be used to restrict the time range of the data to be delivered (startperiod, endperiod), whether parents, siblings or descendants of the specified resource should be returned as well (e.g. references=’parentsandsiblings’). Sensible defaults are set automatically depending on the values of other args such as resource_type. Defaults to {}.
  • headers (dict) – http headers. Given headers will overwrite instance-wide headers passed to the constructor. Defaults to None, i.e. use defaults from agency configuration
  • fromfile (str) – path to the file to be loaded instead of accessing an SDMX web service. Defaults to None. If fromfile is given, args relating to URL construction will be ignored.
  • tofile (str) – file path to write the received SDMX file on the fly. This is useful if you want to load data offline using fromfile or if you want to open an SDMX file in an XML editor.
  • url (str) – URL of the resource to download. If given, any other arguments such as resource_type or resource_id are ignored. Default is None.
  • get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts). Determines the behavior in case the received SDMX message has a footer where one of its lines is a valid URL. get_footer_url defines how many attempts should be made to request the resource at that URL after waiting so many seconds before each attempt. This behavior is useful when requesting large datasets from Eurostat. Other agencies do not seem to send such footers. Once an attempt to get the resource has been successful, the original message containing the footer is dismissed and the dataset is returned. The tofile argument is propagated. Note that the written file may be a zip archive. pandaSDMX handles zip archives since version 0.2.1. Defaults to (30, 3).
  • memcache (str) – If given, return Response instance if already in self.cache(dict),
  • download resource and cache Response instance. (otherwise) –
writer(str): optional custom writer class.
Should inherit from pandasdmx.writer.BaseWriter. Defaults to None, i.e. one of the included writers is selected as appropriate.
dsd(model.DataStructure): DSD to be passed on to the sdmxml reader
to process a structure-specific dataset without an incidental http request.
Returns:
instance containing the requested
SDMX Message.
Return type:pandasdmx.api.Response
classmethod list_agencies()

Return a sorted list of valid agency IDs. These can be used to create Request instances.

classmethod load_agency_profile(source)

Classmethod loading metadata on a data provider. source must be a json-formated string or file-like object describing one or more data providers (URL of the SDMX web API, resource types etc. The dict Request._agencies is updated with the metadata from the source.

Returns None

preview_data(flow_id, key=None, count=True, total=True)

Get keys or number of series for a prospective dataset query allowing for keys with multiple values per dimension. It downloads the complete list of series keys for a dataflow rather than using constraints and DSD. This feature is, however, not supported by all data providers. ECB and UNSD are known to work.

Args:

flow_id(str): dataflow id

key(dict): optional key mapping dimension names to values or lists of values.
Must have been validated before. It is not checked if key values are actually valid dimension names and values. Default: {}
count(bool): if True (default), return the number of series
of the dataset designated by flow_id and key. If False, the actual keys are returned as a pandas DataFrame or dict of dataframes, depending on the value of ‘total’.
total(bool): if True (default), return the aggregate number
of series or a single dataframe (depending on the value of ‘count’). If False, return a dict mapping keys to dataframes of series keys. E.g., if key={‘COUNTRY’:’IT+CA+AU’}, the dict will have 3 items describing the series keys for each country respectively. If ‘count’ is True, dict values will be int rather than PD.DataFrame.
series_keys(flow_id, cache=True)

Get an empty dataset with all possible series keys.

Return a pandas DataFrame. Each column represents a dimension, each row a series key of datasets of the given dataflow.

timeout