Statistical Data and Metadata eXchange (SDMX) for the Python data ecosystem¶
pandaSDMX
is an Apache 2.0-licensed Python library that implements SDMX 2.1
(ISO 17369:2013), a format for
exchange of statistical data and metadata used by national statistical
agencies, central banks, and international organisations.
pandaSDMX
can be used to:
explore the data available from over 20 data providers such as the World Bank, BIS, ILO, ECB, Eurostat, OECD, UNICEF and United Nations;
parse data and metadata in SDMX-ML (XML) or SDMX-JSON formats—either:
from local and remote files, or
retrieved from pandasdmx web services, with query validation and caching;
convert data and metadata into pandas objects, for use with the analysis, plotting, and other tools in the Python data science ecosystem; See also the companion project `intake_sdmx <https://intake_sdmx.readthedocs.io>`__, a plugin for the intake data acquisition and distribution framework;
apply the SDMX Information Model to your own data;
validate SDMX files against the official XML schemas;
…and much more.
Get started¶
Assuming a working copy of Python 3.9 or higher
is installed on your system,
you can get pandaSDMX
either by typing from the command prompt:
$ pip install pandasdmx
or from a conda environment:
$ conda install pandasdmx -c conda-forge
Next, look at a usage example in only 10 lines of code. Then dive into the longer, narrative walkthrough and finally peruse the more advanced chapters as needed.
Bear in mind that SDMX was designed to be flexible enough to accommodate almost any data. This also means it is complex, with many abstract concepts for describing data, metadata, and their relationships. These are called the “SDMX Information Model” (IM).
This documentation gently explains the
functionality provided by pandaSDMX
itself enabling you to make the
best use of all supported data sources.
A decent understanding of the IM is conveyed in passing.
However, if you got hooked, follow the list of resources and references,
or read the API documentation and implementation notes
for the pandasdmx.model
and pandasdmx.message
modules that implement the IM.
pandaSDMX
user guide¶
- Data sources
- Data source limitations
ABS
: Australian Bureau of StatisticsABS_XML
: Australian Bureau of Statistics - XML-based APIBBK
: Bundesbank (German Central Bank)BIS
: Bank for International SettlementsEC_COMP
: European Commission - DG CompetitionEC_EMPL
: European Commission - DG EmploymentEC_GROW
: European Commission - DG GrowthECB
: European Central BankESTAT
: EurostatILO
: International Labour OrganizationIMF
: International Monetary Fund’s “SDMX Central” sourceINEGI
: National Institute of Statistics and Geography (Mexico)INSEE
: National Institute of Statistics and Economic Studies (France)ISTAT
: National Institute of Statistics (Italy)LSD
: National Institute of Statistics (Lithuania)NB
: Norges Bank (Norway)NBB
: National Bank of BelgiumOECD
: Organisation for Economic Cooperation and DevelopmentSGR
: SDMX Global RegistrySPC
: Pacific Data HubSTAT_EE
: Statistics EstoniaUNSD
: United Nations Statistics DivisionUNICEF
: UN International Children’s Emergency FundCD2030
: Countdown 2030WB
: World Bank Group’s “World Integrated Trade Solution”WB_WDI
: World Bank Group’s “World Development Indicators”
- API reference
- Implementation notes
- How to…
- What’s new?
- Development roadmap
Contributing to pandaSDMX and getting help¶
Report bugs, suggest features or view the source code on GitHub.
The sdmx-python Google Group and mailing list may have answers for some questions.