--- title: "Quick start guide to pixieweb" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick start guide to pixieweb} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette provides a quick start guide to get up and running with `pixieweb` as fast as possible. For a more comprehensive introduction to the `pixieweb` package, see [Introduction to pixieweb](introduction-to-pixieweb.html). ```{r setup} library("pixieweb") ``` In this guide we walk you through five steps to connect to a PX-Web API, find a table, inspect its variables, download data and plot it. We'll use Statistics Sweden (SCB) as our example throughout. PX-Web tables are *multi-dimensional data cubes*. Each table has one or more **variables** (dimensions) — for example, region, sex and year — and one or more **content codes** (what is being measured). To download data you specify which values you want along each variable. See `vignette("introduction-to-pixieweb")` for a deeper explanation of the data model. ### 1. Connect to an API `pixieweb` ships with a catalogue of known PX-Web instances. `px_api()` accepts either a known alias (such as `"scb"`, `"ssb"`, `"statfi"`) or a full base URL. ```{r api_mock, eval = FALSE} scb <- px_api("scb", lang = "en") scb ``` ```{r api, echo = FALSE} scb <- pixieweb:::vd_scb scb ``` To see all known APIs, run `px_api_catalogue()`. ### 2. Find a table PX-Web organises data into **tables**. Use `get_tables()` to search the catalogue (server-side on v2 APIs, folder walk on v1): ```{r tables_mock, eval = FALSE} tables <- get_tables(scb, query = "population") dplyr::glimpse(tables) ``` ```{r tables, echo = FALSE} tables <- pixieweb:::vd_tables dplyr::glimpse(tables) ``` The result is a tibble. You can narrow it further on the client side with `table_search()`, and inspect candidate tables with `table_describe()`: ```{r table_search} tables |> table_search("municipal") |> table_describe(max_n = 2, format = "md", heading_level = 4) ``` `table_describe()` shows the subject path, time period range, and data source alongside the title — making it much easier to pick the right table. ### 3. Explore variables Once you have a table ID, inspect what variables (dimensions) it has. `get_variables()` returns a tibble with one row per variable: ```{r variables_mock, eval = FALSE} vars <- get_variables(scb, "TAB638") vars |> variable_describe() ``` ```{r variables, echo = FALSE} vars <- pixieweb:::vd_variables vars |> variable_describe() ``` You can inspect the available values for any single variable with `variable_values()`: ```{r variable_values_mock, eval = FALSE} vars |> variable_values("Region") ``` ```{r variable_values, echo = FALSE} pixieweb:::vd_region_values ``` ### 4. Get data Now we know which variables the table has and what values are available. Pass your selections to `get_data()`: - **`ContentsCode`** tells the API *what* to measure (population, deaths, etc.). `"*"` means "all measures in this table". - Variables you **omit** are *eliminated* — the API returns a pre-computed aggregate (for example, omitting a `Sex` variable gives totals for both sexes). Not every variable is eliminable; see `vignette("introduction-to-pixieweb")` for the distinction between mandatory and eliminable variables. - Selection helpers like `px_top()`, `px_from()` and `px_range()` let you select values without knowing exact codes. Use them when you want "the latest N periods" or "everything from 2020 onward". ```{r get_data_mock, eval = FALSE} pop <- get_data(scb, "TAB638", Region = c("0180", "1480", "1280"), ContentsCode = "*", Tid = px_top(5) ) dplyr::glimpse(pop) ``` ```{r get_data, echo = FALSE} pop <- pixieweb:::vd_pop dplyr::glimpse(pop) ``` Notice the `_text` suffix: `get_data()` returns both raw code columns (`Region = "0180"`) and human-readable label columns (`Region_text = "Stockholm"`). Use `_text` columns for display and plotting; use the raw codes for filtering and joining. ### 5. Inspect and visualise results Finally, time to plot our data: ```{r plot, fig.width = 7, fig.height = 4} library("ggplot2") pop_plot <- pop |> # Keep only the Population content code (the table also has # "Population growth"); convert year to Date for nice axis breaks dplyr::filter(ContentsCode == "BE0101N1") |> dplyr::mutate(year = as.Date(paste0(Tid, "-01-01"))) ggplot(pop_plot, aes(year, value, colour = Region_text)) + # One line per region geom_line(linewidth = 1) + geom_point(size = 2) + # Years as dates — ensures whole-year breaks, not decimals scale_x_date(date_breaks = "1 year", date_labels = "%Y") + scale_y_continuous(labels = scales::comma) + labs( title = "Population, Sweden's three most populous municipalities", x = "Year", y = "Population", colour = NULL, caption = px_cite(pop) ) + theme_minimal() + theme(legend.position = "top") ``` Note the use of `px_cite()` to produce a citation for the downloaded data, and the conversion of `Tid` to `Date` before plotting: years as `Date` keep `scale_x_date()` on whole years (e.g. 2020, 2021, 2022) rather than producing decimal breaks like 2020, 2022.5, 2025. This is a pattern you will want to reuse for any time-series analysis across `pixieweb`, `rKolada` and `rTrafa`. Other useful helpers you may want to explore next: - `data_minimize()` — remove columns where all values are identical - `data_legend()` — generate a caption string from variable metadata - `prepare_query()` — build queries with sensible defaults for tables with many variables ## Next steps - **Deeper walkthrough** — `vignette("introduction-to-pixieweb")` covers the full data model, codelists, wide output, saved queries and advanced query composition. - **Multiple countries** — `vignette("multi-api")` shows how to compare data across national statistics agencies (SCB, SSB, Statistics Finland and more). - **ggplot2 reference** — for more on visualisation. ## Related packages `pixieweb` is part of a family of R packages for Swedish and Nordic open statistics that share the same design philosophy: - [rKolada](https://lchansson.github.io/rKolada/) — R client for the [Kolada](https://kolada.se/) database of Swedish municipal and regional Key Performance Indicators - [rTrafa](https://lchansson.github.io/rTrafa/) — R client for the [Trafa](https://api.trafa.se/) API of Swedish transport statistics