H. Sherry Zhang, PhD stduent
Monash University, Australia
Apr 17, 2023
A final year PhD student in the Department of Econometrics and Business Statistics, Monash University, Australia
My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.
Find me on
huizezhangsh
,huizezhang-sherry
, andslides for today: https://sherryzhang-ireland2023.netlify.app/
Data: \(\mathbf{X}_{n \times p}\);
Projection matrix: \(\mathbf{A}_{p\times d}\)
Projection: \(\mathbf{Y}_{n \times d} = \mathbf{X} \cdot \mathbf{A}\)
Index function \(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)
Optimisation: \[\arg \max_{\mathbf{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d\]
Data: \(\mathbf{X}_{n \times p}\);
Projection matrix: \(\mathbf{A}_{p\times d}\)
Projection: \(\mathbf{Y}_{n \times d} = \mathbf{X} \cdot \mathbf{A}\)
Index function \(f: \mathbb{R}^{n \times d} \mapsto \mathbb{R}\)
Optimisation: \[\arg \max_{\mathbf{A}} f(\mathbf{X} \cdot \mathbf{A}) ~~~ s.t. ~~~ \mathbf{A}^{\prime} \mathbf{A} = I_d\]
Simulation:
# A tibble: 88 × 6
id lat long elev name wmo_id
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 ASN00001006 -15.5 128. 3.8 wyndham aero 95214
2 ASN00002032 -17.0 128. 203 warmun 94213
3 ASN00003080 -17.6 124. 77.5 curtin aero 94204
4 ASN00005007 -22.2 114. 5 learmonth airport 94302
5 ASN00006044 -25.9 114. 9 denham 94402
6 ASN00007600 -28.1 118. 407 mount magnet aero 94429
7 ASN00008296 -29.2 116. 271. morawa airport 94417
8 ASN00009114 -31.0 115. 4 lancelin 95606
9 ASN00009240 -32.0 116. 384 bickley 95610
10 ASN00009542 -33.7 122. 142 esperance aero 95638
# … with 78 more rows
# A tibble: 32,208 × 5
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01 164 38.3 25.3
2 ASN00001006 2020-01-02 0 40.6 30.5
3 ASN00001006 2020-01-03 16 39.7 27.2
4 ASN00001006 2020-01-04 0 38.2 27.3
5 ASN00001006 2020-01-05 2 39.3 26.7
6 ASN00001006 2020-01-06 60 32.9 25.6
7 ASN00001006 2020-01-07 146 34.1 25.5
8 ASN00001006 2020-01-08 40 36.6 26.2
9 ASN00001006 2020-01-09 0 38.2 27.6
10 ASN00001006 2020-01-10 0 38.9 29.7
# … with 32,198 more rows
Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.
(weather <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = date, coords = c(long, lat)
))
# cubble: id [88]: nested form
# bbox: [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
id lat long elev name wmo_id ts
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
1 ASN00001006 -15.5 128. 3.8 wyndham aero 95214 <tibble [366 × 4]>
2 ASN00002032 -17.0 128. 203 warmun 94213 <tibble [366 × 4]>
3 ASN00003080 -17.6 124. 77.5 curtin aero 94204 <tibble [366 × 4]>
4 ASN00005007 -22.2 114. 5 learmonth airport 94302 <tibble [366 × 4]>
5 ASN00006044 -25.9 114. 9 denham 94402 <tibble [366 × 4]>
6 ASN00007600 -28.1 118. 407 mount magnet aero 94429 <tibble [366 × 4]>
7 ASN00008296 -29.2 116. 271. morawa airport 94417 <tibble [366 × 4]>
8 ASN00009114 -31.0 115. 4 lancelin 95606 <tibble [366 × 4]>
9 ASN00009240 -32.0 116. 384 bickley 95610 <tibble [366 × 4]>
10 ASN00009542 -33.7 122. 142 esperance aero 95638 <tibble [366 × 4]>
# … with 78 more rows
stations
) can be an sf
object and temporal data (ts
) can be a tsibble
object.long form
# cubble: date, id [88]: long form
# bbox: [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl],
# name [chr], wmo_id [dbl]
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01 164 38.3 25.3
2 ASN00001006 2020-01-02 0 40.6 30.5
3 ASN00001006 2020-01-03 16 39.7 27.2
4 ASN00001006 2020-01-04 0 38.2 27.3
5 ASN00001006 2020-01-05 2 39.3 26.7
6 ASN00001006 2020-01-06 60 32.9 25.6
7 ASN00001006 2020-01-07 146 34.1 25.5
8 ASN00001006 2020-01-08 40 36.6 26.2
9 ASN00001006 2020-01-09 0 38.2 27.6
10 ASN00001006 2020-01-10 0 38.9 29.7
# … with 32,198 more rows
back to the nested form:
# cubble: id [88]: nested form
# bbox: [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl],
# tmin [dbl]
id lat long elev name wmo_id ts
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
1 ASN000… -15.5 128. 3.8 wynd… 95214 <tibble>
2 ASN000… -17.0 128. 203 warm… 94213 <tibble>
3 ASN000… -17.6 124. 77.5 curt… 94204 <tibble>
4 ASN000… -22.2 114. 5 lear… 94302 <tibble>
5 ASN000… -25.9 114. 9 denh… 94402 <tibble>
6 ASN000… -28.1 118. 407 moun… 94429 <tibble>
7 ASN000… -29.2 116. 271. mora… 94417 <tibble>
8 ASN000… -31.0 115. 4 lanc… 95606 <tibble>
9 ASN000… -32.0 116. 384 bick… 95610 <tibble>
10 ASN000… -33.7 122. 142 espe… 95638 <tibble>
# … with 78 more rows
[1] TRUE
Reference temporal variables with $
# cubble: id [88]: nested form
# bbox: [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
id lat long elev name wmo_id ts avg_tmax
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <dbl>
1 ASN00001006 -15.5 128. 3.8 wyndham aero 95214 <tibble [366 × 4]> 36.7
2 ASN00002032 -17.0 128. 203 warmun 94213 <tibble [366 × 4]> 35.8
3 ASN00003080 -17.6 124. 77.5 curtin aero 94204 <tibble [366 × 4]> 35.9
4 ASN00005007 -22.2 114. 5 learmonth airport 94302 <tibble [366 × 4]> 33.2
5 ASN00006044 -25.9 114. 9 denham 94402 <tibble [366 × 4]> 27.1
6 ASN00007600 -28.1 118. 407 mount magnet aero 94429 <tibble [366 × 4]> 30.3
7 ASN00008296 -29.2 116. 271. morawa airport 94417 <tibble [366 × 4]> 28.8
8 ASN00009114 -31.0 115. 4 lancelin 95606 <tibble [366 × 4]> 24.8
9 ASN00009240 -32.0 116. 384 bickley 95610 <tibble [366 × 4]> 22.8
10 ASN00009542 -33.7 122. 142 esperance aero 95638 <tibble [366 × 4]> 22.8
# … with 78 more rows
Move spatial variables into the long form
# cubble: date, id [88]: long form
# bbox: [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
id date prcp tmax tmin long lat
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01 164 38.3 25.3 128. -15.5
2 ASN00001006 2020-01-02 0 40.6 30.5 128. -15.5
3 ASN00001006 2020-01-03 16 39.7 27.2 128. -15.5
4 ASN00001006 2020-01-04 0 38.2 27.3 128. -15.5
5 ASN00001006 2020-01-05 2 39.3 26.7 128. -15.5
6 ASN00001006 2020-01-06 60 32.9 25.6 128. -15.5
7 ASN00001006 2020-01-07 146 34.1 25.5 128. -15.5
8 ASN00001006 2020-01-08 40 36.6 26.2 128. -15.5
9 ASN00001006 2020-01-09 0 38.2 27.6 128. -15.5
10 ASN00001006 2020-01-10 0 38.9 29.7 128. -15.5
# … with 32,198 more rows
Modified from Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models (Wickham, 2012)
cb <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = date, coords = c(long, lat)
)
cb_glyph <- cb %>%
face_temporal() %>%
group_by(month = lubridate::month(date)) %>%
summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
unfold(long, lat)
cb_glyph %>%
ggplot(aes(x_major = long, x_minor = month,
y_major = lat, y_minor = tmax)) +
geom_sf(data = oz_simp, fill = "grey90",
color = "white", inherit.aes = FALSE) +
geom_glyph_box(width = 1.3, height = 0.5) +
geom_glyph(width = 1.3, height = 0.5) +
ggthemes::theme_map()
Slides created via quarto available at
All the materials used to prepare the slides are available at
cubble package: https://CRAN.R-project.org/package=cubble
ferrn package: https://CRAN.R-project.org/package=ferrn
ferrn paper: https://doi.org/10.32614/RJ-2021-105
H. Sherry Zhang
Supervised by Dianne Cook, Patricia Menéndez, Ursula Laa, and Nicolas Langrené