Implementation notes

pandasdmx.model implements the SDMX version 2.1. (What is an ‘information model’?) This page gives brief explanations of how pandaSDMX implements the standards, focusing on additional features, conveniences, or interpretations/naming choices that are not strictly determined by the standards.

Although this page is organized to correspond to the standards, it does not recapitulate them (as stated)—nor does it set out to teach all their details. For those purposes, see Resources; or the Walkthrough, which includes some incidental explanations.

Abstract classes and data types

Many classes inherit from one of the following. For example, every Code is a NameableArtefact; [1] this means it has name and description attributes. Because every NameableArtefact is an IdentifiableArtefact, a Code also has id, URI, and URN attributes.

AnnotableArtefact

IdentifiableArtefact

  • has an id, URI, and URN.

  • is “annotable”; this means it also has the annotations attribute of an AnnotableArtefact.

The id uniquely identifies the object against others of the same type in a SDMX message. The URI and URN are globally unique. See Wikipedia for a discussion of the differences between the two.

NameableArtefact

  • has a name and description, and

  • is identifiable, therefore also annotable.

VersionableArtefact

  • has a version number,

  • may be valid between certain times (valid_from, valid_to), and

  • is nameable, identifiable, and annotable.

MaintainableArtefact

  • is under the authority of a particular maintainer, and

  • is versionable, nameable, identifiable, and annotable.

In an SDMX message, a maintainable object might not be given in full; only as a reference (with is_external_reference set to True). If so, it might have a structure_url, where the maintainer provides more information about the object.

The API reference for pandasdmx.model shows the parent classes for each class, to describe whether they are versionable, nameable, identifiable, and/or maintainable.

Because SDMX is used worldwide, an InternationalString type is used in the IM—for instance, the name of a Nameable object is an InternationalString, with zero or more localizations in different locales.

Items and schemes

ItemScheme, Item

These abstract classes allow for the creation of flat or hierarchical taxonomies.

ItemSchemes are maintainable (see above); their items is a collection of Items. See the class documentation for details.

Data

Observation

A single data point/datum. The value is stored as the value attribute.

DataSet

A collection of Observations, SeriesKeys, and/or GroupKeys.

Note

There are no ‘Series’ or ‘Group’ classes in the IM!

Instead, the idea of ‘data series’ within a DataSet is modeled as:

  • SeriesKeys and GroupKeys are associated with a DataSet.

  • Observations are each associated with one SeriesKey and, optionally, referred to by one or more GroupKeys.

One can choose to think of a SeriesKey and the associated Observations, collectively, as a ‘data series’. But, in order to avoid confusion with the IM, pandaSDMX does not provide ‘Series’ or ‘Group’ objects.

pandaSDMX provides:

Depending on its structure, a DataSet may be flat, cross-sectional or time series.

Key

Values (Key.values) for one or more Dimensions. The meaning varies:

Ordinary Keys, e.g. Observation.dimension

The dimension(s) varying at the level of a specific observation.

SeriesKey

The dimension(s) shared by all Observations in a conceptual series.

GroupKey.

The dimension(s) comprising the group. These may be a subset of all the dimensions in the DataSet, in which case all matching Observations are considered part of the ‘group’—even if they are associated with different SeriesKeys.

GroupKeys are often used to attach AttributeValues; see below.

AttributeValue

Value (AttributeValue.value) for a DataAttribute (AttributeValue.value_for).

May be attached to any of: DataSet, SeriesKey, GroupKey, or Observation. In the first three cases, the attachment means that the attribute applies to all Observations associated with the object.

Data structures

Concept, ConceptScheme

An abstract idea or general notion, such as ‘age’ or ‘country’.

Concepts are one kind of Item, and are collected in an ItemScheme subclass called ConceptScheme.

Dimension, DataAttribute

These are Components of a data structure, linking a Concept (concept_identity) to its Representation (local_representation); see below.

A component can be either a DataAttribute that appears as an AttributeValue in data sets; or a Dimension that appears in Keys.

Representation, Facet

For example: the concept ‘country’ can be represented as:

  • as a value of a certain type (e.g. ‘Canada’, a str), called a Facet;

  • using a Code from a specific CodeList (e.g. ‘CA’); multiple lists of codes are possible (e.g. ‘CAN’). See below.

DataStructureDefinition (DSD)

Collects structures used in data sets and data flows. These are stored as dimensions, attributes, group_dimensions, and measures.

For example, dimensions is a DimensionDescriptor object that collects a number of Dimensions in a particular order. Data that is “structured by” this DSD must have all the described dimensions.

See the API documentation for details.

Metadata

Code, Codelist

Category, CategoryScheme, Categorization

Categories serve to classify or categorise things like dataflows, e.g. by subject matter.

A Categorisation links the thing to be categorised, e.g., a DataFlowDefinition, to a particular Category.

Constraints

Constraint, ContentConstraint

Classes that specify a subset of data or metadata to, for example, limit the contents of a data flow.

A ContentConstraint may have:

  1. Zero or more CubeRegion stored at data_content_region.

  2. Zero or one DataKeySet stored at Constraint.data_content_keys.

Currently, ContentConstraint.to_query_string(), used by Request.get() to validate keys based on a data flow definition, only uses data_content_region, if any. data_content_keys are ignored. None of the data sources supported by pandaSDMX appears to use this latter form.

Formats

The IM provides terms and concepts for data and metadata, but does not specify how that (meta)data is stored or represented. The SDMX standards include multiple ways to store data, in the following formats:

SDMX-ML

Based on eXtensible Markup Language (XML). SDMX-ML provides a complete specification: it can represent every class and property in the IM.

Reference: https://sdmx.org/?page_id=5008

  • An SDMX-ML document contains exactly one Message. See pandaSDMX.message for the different types of Messages and their component parts.

  • See reader.sdmxml.

SDMX-JSON

Based on JavaScript Object Notation (JSON). The SDMX-JSON format is only defined for data, not metadata.

Reference: https://github.com/sdmx-twg/sdmx-json

  • See reader.sdmxjson.

New in version 0.5: Support for SDMX-JSON.

SDMX-CSV

Based on Comma-Separated Value (CSV). Like SDMX-JSON, the SDMX-CSV format are only defined for data, not metadata.

Reference: https://github.com/sdmx-twg/sdmx-csv

pandaSDMX does not currently support SDMX-CSV.

pandaSDMX:

  • reads all kinds of SDMX-ML and SDMX-JSON messages.

  • contains, in the tests/data/ source directory, specimens of messages in both data formats. These are used by the test suite to check that the code functions as intended, but can also be viewed to understand the data formats.

Web services

The SDMX standards describe both RESTful and SOAP web service APIs. See Resources for the SDMG Technical Working Group’s specification of the REST API. The Eurostat and ECB help materials provide descriptions and examples of HTTP using URLs, parameters and headers to construct queries.

pandaSDMX supports:

  • REST web services, i.e. not SOAP services;

  • Data retrieved in SDMX version 2.1 formats. Some existing services offer a parameter to select SDMX 2.1 or 2.0 format; pandaSDMX does not support the latter. Other services only provide SDMX 2.0-formatted data; these cannot be used with pandaSDMX.

Request constructs valid URLs and automatically add some parameter and header values. These can be overridden; see Request.get(). In some cases, Request will make an additional query to fetch metadata and validate a query.

pandasdmx.Source and its subclasses handle idiosyncrasies of the web services operated by different agencies, such as:

  • parameters or headers that are not supported, or must take very specific, non-standard values, or

  • unusual ways of returning data.

For data sources that support it, pandaSDMX automatically adds the HTTP header Accept: application/vnd.sdmx.structurespecificdata+xml; when the dsd argument is provided to Request.get().

See Data sources and the source code for the details for each data source.