New tools for visualising and explaining multivariate spatio-temporal data

H. Sherry Zhang, PhD stduent

Monash University, Australia

Apr 17, 2023

Hi!

  • A final year PhD student in the Department of Econometrics and Business Statistics, Monash University, Australia

  • My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.

  • Find me on

  • slides for today: https://sherryzhang-ireland2023.netlify.app/

Visual diagnostics for constrained optimisation with application to guided tours

Projection pursuit for dimension reduction

Optimisation in projection pursuit

Data: \(\mathbf{X}_{n \times p}\);

Projection matrix: \(\mathbf{A}_{p\times d}\)

Projection: \(\mathbf{Y}_{n \times d} = \mathbf{X} \cdot \mathbf{A}\)

Index function \(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)

Optimisation: \[\arg \max_{\mathbf{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d\]

Visualise the projection matrix space

Data: \(\mathbf{X}_{n \times p}\);

Projection matrix: \(\mathbf{A}_{p\times d}\)

Projection: \(\mathbf{Y}_{n \times d} = \mathbf{X} \cdot \mathbf{A}\)

Index function \(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)

Optimisation: \[\arg \max_{\mathbf{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d\]

Simulation:

  • simulated 5D data project to 1D
  • two optimisers

Visualise the projection matrix space

Visualise the projection matrix space

Cubble: An R package for organizing and wrangling multivariate spatio-temporal data in R

Motivation

Australian weather station data

stations
# A tibble: 88 × 6
   id            lat  long  elev name              wmo_id
   <chr>       <dbl> <dbl> <dbl> <chr>              <dbl>
 1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214
 2 ASN00002032 -17.0  128. 203   warmun             94213
 3 ASN00003080 -17.6  124.  77.5 curtin aero        94204
 4 ASN00005007 -22.2  114.   5   learmonth airport  94302
 5 ASN00006044 -25.9  114.   9   denham             94402
 6 ASN00007600 -28.1  118. 407   mount magnet aero  94429
 7 ASN00008296 -29.2  116. 271.  morawa airport     94417
 8 ASN00009114 -31.0  115.   4   lancelin           95606
 9 ASN00009240 -32.0  116. 384   bickley            95610
10 ASN00009542 -33.7  122. 142   esperance aero     95638
# … with 78 more rows

ts
# A tibble: 32,208 × 5
   id          date        prcp  tmax  tmin
   <chr>       <date>     <dbl> <dbl> <dbl>
 1 ASN00001006 2020-01-01   164  38.3  25.3
 2 ASN00001006 2020-01-02     0  40.6  30.5
 3 ASN00001006 2020-01-03    16  39.7  27.2
 4 ASN00001006 2020-01-04     0  38.2  27.3
 5 ASN00001006 2020-01-05     2  39.3  26.7
 6 ASN00001006 2020-01-06    60  32.9  25.6
 7 ASN00001006 2020-01-07   146  34.1  25.5
 8 ASN00001006 2020-01-08    40  36.6  26.2
 9 ASN00001006 2020-01-09     0  38.2  27.6
10 ASN00001006 2020-01-10     0  38.9  29.7
# … with 32,198 more rows

What’s available for spatio-temporal data? - stars

Cubble: a spatio-temporal vector data structure

Cubble: a spatio-temporal vector data structure

Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.

Cast your data into a cubble

(weather <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
))
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
   id            lat  long  elev name              wmo_id ts                
   <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>            
 1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214 <tibble [366 × 4]>
 2 ASN00002032 -17.0  128. 203   warmun             94213 <tibble [366 × 4]>
 3 ASN00003080 -17.6  124.  77.5 curtin aero        94204 <tibble [366 × 4]>
 4 ASN00005007 -22.2  114.   5   learmonth airport  94302 <tibble [366 × 4]>
 5 ASN00006044 -25.9  114.   9   denham             94402 <tibble [366 × 4]>
 6 ASN00007600 -28.1  118. 407   mount magnet aero  94429 <tibble [366 × 4]>
 7 ASN00008296 -29.2  116. 271.  morawa airport     94417 <tibble [366 × 4]>
 8 ASN00009114 -31.0  115.   4   lancelin           95606 <tibble [366 × 4]>
 9 ASN00009240 -32.0  116. 384   bickley            95610 <tibble [366 × 4]>
10 ASN00009542 -33.7  122. 142   esperance aero     95638 <tibble [366 × 4]>
# … with 78 more rows
  • the spatial data (stations) can be an sf object and temporal data (ts) can be a tsibble object.

Switch between the two forms

long form

(weather_long <- weather %>% 
  face_temporal())
# cubble:  date, id [88]: long form
# bbox:    [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl],
#   name [chr], wmo_id [dbl]
   id          date        prcp  tmax  tmin
   <chr>       <date>     <dbl> <dbl> <dbl>
 1 ASN00001006 2020-01-01   164  38.3  25.3
 2 ASN00001006 2020-01-02     0  40.6  30.5
 3 ASN00001006 2020-01-03    16  39.7  27.2
 4 ASN00001006 2020-01-04     0  38.2  27.3
 5 ASN00001006 2020-01-05     2  39.3  26.7
 6 ASN00001006 2020-01-06    60  32.9  25.6
 7 ASN00001006 2020-01-07   146  34.1  25.5
 8 ASN00001006 2020-01-08    40  36.6  26.2
 9 ASN00001006 2020-01-09     0  38.2  27.6
10 ASN00001006 2020-01-10     0  38.9  29.7
# … with 32,198 more rows

back to the nested form:

(weather_back <- weather_long %>% 
   face_spatial())
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl],
#   tmin [dbl]
   id        lat  long  elev name  wmo_id ts      
   <chr>   <dbl> <dbl> <dbl> <chr>  <dbl> <list>  
 1 ASN000… -15.5  128.   3.8 wynd…  95214 <tibble>
 2 ASN000… -17.0  128. 203   warm…  94213 <tibble>
 3 ASN000… -17.6  124.  77.5 curt…  94204 <tibble>
 4 ASN000… -22.2  114.   5   lear…  94302 <tibble>
 5 ASN000… -25.9  114.   9   denh…  94402 <tibble>
 6 ASN000… -28.1  118. 407   moun…  94429 <tibble>
 7 ASN000… -29.2  116. 271.  mora…  94417 <tibble>
 8 ASN000… -31.0  115.   4   lanc…  95606 <tibble>
 9 ASN000… -32.0  116. 384   bick…  95610 <tibble>
10 ASN000… -33.7  122. 142   espe…  95638 <tibble>
# … with 78 more rows
identical(weather_back, weather)
[1] TRUE

Access variables in the other form

Reference temporal variables with $

weather %>% 
  mutate(avg_tmax = mean(ts$tmax, na.rm = TRUE))
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
   id            lat  long  elev name              wmo_id ts                 avg_tmax
   <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>                <dbl>
 1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214 <tibble [366 × 4]>     36.7
 2 ASN00002032 -17.0  128. 203   warmun             94213 <tibble [366 × 4]>     35.8
 3 ASN00003080 -17.6  124.  77.5 curtin aero        94204 <tibble [366 × 4]>     35.9
 4 ASN00005007 -22.2  114.   5   learmonth airport  94302 <tibble [366 × 4]>     33.2
 5 ASN00006044 -25.9  114.   9   denham             94402 <tibble [366 × 4]>     27.1
 6 ASN00007600 -28.1  118. 407   mount magnet aero  94429 <tibble [366 × 4]>     30.3
 7 ASN00008296 -29.2  116. 271.  morawa airport     94417 <tibble [366 × 4]>     28.8
 8 ASN00009114 -31.0  115.   4   lancelin           95606 <tibble [366 × 4]>     24.8
 9 ASN00009240 -32.0  116. 384   bickley            95610 <tibble [366 × 4]>     22.8
10 ASN00009542 -33.7  122. 142   esperance aero     95638 <tibble [366 × 4]>     22.8
# … with 78 more rows

Move spatial variables into the long form

weather_long %>% unfold(long, lat)
# cubble:  date, id [88]: long form
# bbox:    [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
   id          date        prcp  tmax  tmin  long   lat
   <chr>       <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
 1 ASN00001006 2020-01-01   164  38.3  25.3  128. -15.5
 2 ASN00001006 2020-01-02     0  40.6  30.5  128. -15.5
 3 ASN00001006 2020-01-03    16  39.7  27.2  128. -15.5
 4 ASN00001006 2020-01-04     0  38.2  27.3  128. -15.5
 5 ASN00001006 2020-01-05     2  39.3  26.7  128. -15.5
 6 ASN00001006 2020-01-06    60  32.9  25.6  128. -15.5
 7 ASN00001006 2020-01-07   146  34.1  25.5  128. -15.5
 8 ASN00001006 2020-01-08    40  36.6  26.2  128. -15.5
 9 ASN00001006 2020-01-09     0  38.2  27.6  128. -15.5
10 ASN00001006 2020-01-10     0  38.9  29.7  128. -15.5
# … with 32,198 more rows

Explore temporal pattern across space with a glyph map

Why do you need a glyph map?

Why do you need a glyph map?

Glyph map transformation

DATA %>%
  ggplot() +
  geom_glyph(
    aes(x_major = X_MAJOR, x_minor = X_MINOR,
        y_major = Y_MAJOR, y_minor = Y_MINOR)) +
  ...

Avg. max. temperature on the map

cb <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
)

cb_glyph <- cb %>%
  face_temporal() %>%
  group_by(month = lubridate::month(date)) %>% 
  summarise(tmax = mean(tmax, na.rm = TRUE)) %>% 
  unfold(long, lat)

cb_glyph %>% 
  ggplot(aes(x_major = long, x_minor = month,
             y_major = lat, y_minor = tmax)) +
  geom_sf(data = oz_simp, fill = "grey90", 
          color = "white", inherit.aes = FALSE) +
  geom_glyph_box(width = 1.3, height = 0.5) + 
  geom_glyph(width = 1.3, height = 0.5) + 
  ggthemes::theme_map()

Additional Information

Slides created via quarto available at

https://sherryzhang-ireland2023.netlify.app/

All the materials used to prepare the slides are available at

https://github.com/huizezhang-sherry/ireland2023


cubble package: https://CRAN.R-project.org/package=cubble

ferrn package: https://CRAN.R-project.org/package=ferrn

ferrn paper: https://doi.org/10.32614/RJ-2021-105


H. Sherry Zhang

Supervised by Dianne Cook, Patricia Menéndez, Ursula Laa, and Nicolas Langrené