---
title: "Quick start guide to pixieweb"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick start guide to pixieweb}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette provides a quick start guide to get up and running with `pixieweb` as fast as possible. For a more comprehensive introduction to the `pixieweb` package, see [Introduction to pixieweb](introduction-to-pixieweb.html).

```{r setup}
library("pixieweb")
```

In this guide we walk you through five steps to connect to a PX-Web API, find a table, inspect its variables, download data and plot it. We'll use Statistics Sweden (SCB) as our example throughout.

PX-Web tables are *multi-dimensional data cubes*. Each table has one or more **variables** (dimensions) — for example, region, sex and year — and one or more **content codes** (what is being measured). To download data you specify which values you want along each variable. See `vignette("introduction-to-pixieweb")` for a deeper explanation of the data model.

### 1. Connect to an API

`pixieweb` ships with a catalogue of known PX-Web instances. `px_api()` accepts either a known alias (such as `"scb"`, `"ssb"`, `"statfi"`) or a full base URL.

```{r api_mock, eval = FALSE}
scb <- px_api("scb", lang = "en")
scb
```

```{r api, echo = FALSE}
scb <- pixieweb:::vd_scb
scb
```

To see all known APIs, run `px_api_catalogue()`.

### 2. Find a table

PX-Web organises data into **tables**. Use `get_tables()` to search the catalogue (server-side on v2 APIs, folder walk on v1):

```{r tables_mock, eval = FALSE}
tables <- get_tables(scb, query = "population")

dplyr::glimpse(tables)
```

```{r tables, echo = FALSE}
tables <- pixieweb:::vd_tables
dplyr::glimpse(tables)
```

The result is a tibble. You can narrow it further on the client side with `table_search()`, and inspect candidate tables with `table_describe()`:

```{r table_search}
tables |>
  table_search("municipal") |>
  table_describe(max_n = 2, format = "md", heading_level = 4)
```

`table_describe()` shows the subject path, time period range, and data source alongside the title — making it much easier to pick the right table.

### 3. Explore variables

Once you have a table ID, inspect what variables (dimensions) it has. `get_variables()` returns a tibble with one row per variable:

```{r variables_mock, eval = FALSE}
vars <- get_variables(scb, "TAB638")
vars |> variable_describe()
```

```{r variables, echo = FALSE}
vars <- pixieweb:::vd_variables
vars |> variable_describe()
```

You can inspect the available values for any single variable with `variable_values()`:

```{r variable_values_mock, eval = FALSE}
vars |> variable_values("Region")
```

```{r variable_values, echo = FALSE}
pixieweb:::vd_region_values
```

### 4. Get data

Now we know which variables the table has and what values are available. Pass your selections to `get_data()`:

- **`ContentsCode`** tells the API *what* to measure (population, deaths, etc.). `"*"` means "all measures in this table".
- Variables you **omit** are *eliminated* — the API returns a pre-computed aggregate (for example, omitting a `Sex` variable gives totals for both sexes). Not every variable is eliminable; see `vignette("introduction-to-pixieweb")` for the distinction between mandatory and eliminable variables.
- Selection helpers like `px_top()`, `px_from()` and `px_range()` let you select values without knowing exact codes. Use them when you want "the latest N periods" or "everything from 2020 onward".

```{r get_data_mock, eval = FALSE}
pop <- get_data(scb, "TAB638",
  Region = c("0180", "1480", "1280"),
  ContentsCode = "*",
  Tid = px_top(5)
)

dplyr::glimpse(pop)
```

```{r get_data, echo = FALSE}
pop <- pixieweb:::vd_pop
dplyr::glimpse(pop)
```

Notice the `_text` suffix: `get_data()` returns both raw code columns (`Region = "0180"`) and human-readable label columns (`Region_text = "Stockholm"`). Use `_text` columns for display and plotting; use the raw codes for filtering and joining.

### 5. Inspect and visualise results

Finally, time to plot our data:

```{r plot, fig.width = 7, fig.height = 4}
library("ggplot2")

pop_plot <- pop |>
  # Keep only the Population content code (the table also has
  # "Population growth"); convert year to Date for nice axis breaks
  dplyr::filter(ContentsCode == "BE0101N1") |>
  dplyr::mutate(year = as.Date(paste0(Tid, "-01-01")))

ggplot(pop_plot, aes(year, value, colour = Region_text)) +
  # One line per region
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  # Years as dates — ensures whole-year breaks, not decimals
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Population, Sweden's three most populous municipalities",
    x = "Year",
    y = "Population",
    colour = NULL,
    caption = px_cite(pop)
  ) +
  theme_minimal() +
  theme(legend.position = "top")
```

Note the use of `px_cite()` to produce a citation for the downloaded data, and the conversion of `Tid` to `Date` before plotting: years as `Date` keep `scale_x_date()` on whole years (e.g. 2020, 2021, 2022) rather than producing decimal breaks like 2020, 2022.5, 2025. This is a pattern you will want to reuse for any time-series analysis across `pixieweb`, `rKolada` and `rTrafa`.

Other useful helpers you may want to explore next:

- `data_minimize()` — remove columns where all values are identical
- `data_legend()` — generate a caption string from variable metadata
- `prepare_query()` — build queries with sensible defaults for tables with many variables

## Next steps

- **Deeper walkthrough** — `vignette("introduction-to-pixieweb")` covers
  the full data model, codelists, wide output, saved queries and
  advanced query composition.
- **Multiple countries** — `vignette("multi-api")` shows how to compare
  data across national statistics agencies (SCB, SSB, Statistics
  Finland and more).
- **ggplot2 reference** — <https://ggplot2-book.org/> for more on
  visualisation.

## Related packages

`pixieweb` is part of a family of R packages for Swedish and Nordic
open statistics that share the same design philosophy:

- [rKolada](https://lchansson.github.io/rKolada/) — R client for the
  [Kolada](https://kolada.se/) database of Swedish municipal and
  regional Key Performance Indicators
- [rTrafa](https://lchansson.github.io/rTrafa/) — R client for the
  [Trafa](https://api.trafa.se/) API of Swedish transport statistics