class: inverse middle background-image: url(images/people2.png) background-position: 99% 98% background-size: 55% # *The Paradox of the Positive* ## Exploratory tools for visualising the individuals in (multivariate) longitudinal data ### Di Cook, Monash University <br> .small[Joint with Nick Tierney and Tania Prvan] <br> Caucus for Women in Statistics Webinar <br> .tiny[October 6/7 2022] <br> .footnote[Image credit: 2020 Australian Open spectators by Di Cook] --- background-image: url(images/singer_willett.png) background-size: 50% .huge[ ☀️ ] .footnote[Example from Singer and Willett (2003) Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence] --- class: inverse <video width="750" height="563" controls="controls" name="Exploring longitudinal data" src="http://ggobi.org/book/chap-misc/Longitudinal.mov"> </video> .footnote[[Exploring Longitudinal Data with GGobi](http://ggobi.org/book/chap-misc/Longitudinal.mov) by Di Cook on [GGobi website](http://ggobi.org)] --- class: inverse middle .huge[ 🌧 ] # Shiver. -- <p> The variation from individual to individual is much greater than the overall trend. While there may be an overall trend that matches our common belief, many individuals have a different experience. --- background-image: \url(https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg) background-size: cover class: inverse center # A divergence of purpose <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .pull-left[ Statistics<br> for policy ] .pull-right[ Statistics <br> for the public ] .footnote[Image source: [Suzie Graber]((https://suziegruber.com/wp-content/uploads/2018/06/Frayed-Rope-2-Deposit-web.jpg)] <!-- <img src="slides_files/figure-html/dichotomy-1.png" width="40%" /> --> --- background-image: \url(https://upload.wikimedia.org/wikipedia/commons/2/21/Frederick_Douglass_by_Samuel_J_Miller%2C_1847-52.png) background-size: 15% background-position: 100% 0% # Paradox of the positive > *Douglass orates that positive statements about American values, such as liberty, citizenship, and freedom, were an offense to the enslaved population of the United States because of their lack of freedom, liberty, and citizenship. As well, Douglass referred not only to the captivity of enslaved people, but to the merciless exploitation and the cruelty and torture that slaves were subjected to in the United States. Rhetoricians R.L. Heath and D. Waymer called this topic the "paradox of the positive" because it highlights how something positive and meant to be positive can also exclude individuals.* .footnote[[Wikipedia: What to a slave is the fourth of July ](https://en.wikipedia.org/wiki/What_to_the_Slave_Is_the_Fourth_of_July%3F)] --- .pull-left[
Aside: Should race even be a variable used in analyses?
] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">"First, if racism is a principal factor organizing social life, why not study racism rather than race? Second, why use an unscientific system of classification in scientific research?" AJPH 22 years ago, loud and clear, in plain sight, <a href="https://twitter.com/mindphul?ref_src=twsrc%5Etfw">@mindphul</a> <a href="https://t.co/aLQ5BqquIS">https://t.co/aLQ5BqquIS</a></p>— Melanie Wall (@mwallbiostat) <a href="https://twitter.com/mwallbiostat/status/1282418693750894594?ref_src=twsrc%5Etfw">July 12, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- .large[I'm going to talk about] --
<i class="fas fa-hand-pointer fa-2x faa-float animated faa-slow " style=" color:#75A34D;"></i>
.large[.purple[new tools for longitudinal data]] --
<i class="fas fa-hand-spock fa-2x faa-wrench animated faa-slow " style=" color:#75A34D;"></i>
.large[.green[to explore the individuals]] --
<i class="fas fa-hand-peace fa-2x faa-vertical animated faa-slow " style=" color:#75A34D;"></i>
.large[.orange[in the R package `brolgar`.]] --- # What is the data structure `brolgar` builds on `tsibble`, by Earo Wang. ``` *## # A tsibble: 6,402 x 9 [!] *## # Key: id [888] ## id ln_wages xp ged xp_since_ged black hispanic high_grade unemploy_…¹ ## <int> <dbl> <dbl> <int> <dbl> <int> <int> <int> <dbl> ## 1 31 1.49 0.015 1 0.015 0 1 8 3.21 ## 2 31 1.43 0.715 1 0.715 0 1 8 3.21 ## 3 31 1.47 1.73 1 1.73 0 1 8 3.21 ## 4 31 1.75 2.77 1 2.77 0 1 8 3.3 ## 5 31 1.93 3.93 1 3.93 0 1 8 2.89 ## 6 31 1.71 4.95 1 4.95 0 1 8 2.49 ## 7 31 2.09 5.96 1 5.96 0 1 8 2.6 ## 8 31 2.13 6.98 1 6.98 0 1 8 4.8 ## 9 36 1.98 0.315 1 0.315 0 0 9 4.89 ## 10 36 1.80 0.983 1 0.983 0 0 9 7.4 ## # … with 6,392 more rows, and abbreviated variable name ¹unemploy_rate ``` --- # Making spaghetti .pull-left[ ```r wages %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line(alpha=0.3) + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-2-1.svg" width="100%" /> ] --- class: inverse middle center # from a spaghetti mess <img src="spaghetti_mess.gif" width="640" height="480"> .footnote[Source: giphy] --- class: inverse middle center # to controlled spaghetti handling <img src="spaghetti_clean.gif" width="640" height="480"> .footnote[Source: giphy] --- class: inverse middle center # to perfection <img src="spaghetti_perfect.gif" width="640" height="480"> .footnote[Source: giphy] --- # Its not regular .pull-left[ Using features, compute the number of measurements for each subject ```r wages %>% * features(ln_wages, n_obs) %>% ggplot(aes(x = n_obs)) + geom_bar() + xlab("Number of observations") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-3-1.svg" width="100%" /> ] --- # We could filter on this .pull-left[ ```r *wages <- wages %>% add_n_obs() wages %>% * filter(n_obs > 3) %>% select(id, ln_wages, xp, n_obs) ``` ] .pull-right[ ``` ## # A tsibble: 6,145 x 4 [!] *## # Key: id [764] ## id ln_wages xp n_obs ## <int> <dbl> <dbl> <int> ## 1 31 1.49 0.015 8 ## 2 31 1.43 0.715 8 ## 3 31 1.47 1.73 8 ## 4 31 1.75 2.77 8 ## 5 31 1.93 3.93 8 ## 6 31 1.71 4.95 8 ## 7 31 2.09 5.96 8 ## 8 31 2.13 6.98 8 ## 9 36 1.98 0.315 10 ## 10 36 1.80 0.983 10 ## # … with 6,135 more rows ``` ] --- # Subjects don't all start at the same time .pull-left[ Using features to extract minimum time ```r wages %>% * features(xp, list(min = min)) %>% ggplot(aes(x = min)) + geom_histogram(binwidth=0.5) + xlim(c(0, 13)) + xlab("First time in study") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-5-1.svg" width="100%" /> ] --- # There's a range of experience .pull-left[ Using features to extract range of time index ```r wages_xp_range <- wages %>% * features(xp, feat_ranges) ggplot(wages_xp_range, aes(x = range_diff)) + geom_histogram() + xlab("Range of experience") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-6-1.svg" width="100%" /> ] --- # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r set.seed(20200720) wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-7-1.svg" width="100%" /> ] --- count: false # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-8-1.svg" width="100%" /> ] --- count: false # Small spoonfuls of spaghetti .pull-left[ Sample some individuals ```r wages %>% * sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-9-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample experienced individuals ```r wages %>% add_n_obs() %>% * filter(n_obs > 7) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-10-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample INexperienced individuals ```r wages %>% add_n_obs() %>% * filter(n_obs < 5) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-11-1.svg" width="100%" /> ] --- # Take a spoonful of different lengths .pull-left[ Sample average experience ```r wages %>% add_n_obs() %>% * filter(n_obs > 4, n_obs < 8) %>% sample_n_keys(size = 10) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + xlim(c(0,13)) + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` .tiny[Wages conversion 0.5 = $1.65; 4.5 = $90] ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-12-1.svg" width="100%" /> ] --- # Also we can - `facet_strata`: show the whole pot, neatly separated into equally portioned - `facet_sample`: show most of the pot in neatly separated portions --- <img src="slides_files/figure-html/unnamed-chunk-13-1.svg" width="100%" /> --- # Special features Compute longnostics for each subject - Slope, intercept from simple linear model - Variance, standard deviation - Jumps, differences --- # Increasing .pull-left[ ```r wages_slope <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% * add_key_slope(ln_wages ~ xp) %>% as_tsibble(key = id, index = xp) wages_slope %>% * filter(.slope_xp > 0.4) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-14-1.svg" width="100%" /> ] --- # Decreasing .pull-left[ ```r wages_slope %>% * filter(.slope_xp < (-0.7)) %>% ggplot(aes(x = xp, y = ln_wages, group = id)) + geom_line() + ylim(c(0, 4.5)) + xlab("Years of experience") + ylab("Log wages") + invthm ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-15-1.svg" width="100%" /> ] --- background-image: \url(https://cdn.mos.cms.futurecdn.net/xVDtHe3txNCijnegF8y4d6-970-80.jpg) background-size: 30% background-position: 70% 99% # A different style of five number summary Who is average? Who is different? Find those individuals who are representative of the min, median, maximum, etc of growth, using `keys_near()` .footnote[Image credit: Image credit: Flickr/paul dynamik] --- .pull-left[ ```r wages_threenum <- wages %>% add_n_obs() %>% filter(n_obs > 4) %>% key_slope(ln_wages ~ xp) %>% * keys_near(key = id, * var = .slope_xp, * funs = l_three_num) %>% left_join(wages, by = "id") %>% as_tsibble(key = id, index = xp) ``` ] .pull-right[ <img src="slides_files/figure-html/three_number_plot-1.svg" width="100%" /> ] --- <img src="slides_files/figure-html/five_number-1.svg" width="100%" /> --- # Sculpting spaghetti .pull-left[ Mixed effects model, education as fixed effect, subject random effect using slope. ```r wages_fit_int <- * lmer(ln_wages ~ xp + high_grade + * (xp |id), data = wages) wages_aug <- wages %>% add_predictions(wages_fit_int, var = "pred_int") %>% add_residuals(wages_fit_int, var = "res_int") ``` ] .pull-right[ <img src="slides_files/figure-html/model_plot-1.svg" width="100%" /> ] --- # Sample and show the data, too ```r set.seed(1) wages_aug %>% add_n_obs() %>% filter(n_obs > 4) %>% sample_n_keys(size = 12) %>% * ggplot(aes(x = xp, y = pred_int, group = id, colour = factor(id))) + geom_line() + * geom_point(aes(x = xp, y = ln_wages, colour = factor(id))) + scale_colour_ochre(palette = "emu_woman_paired") + facet_wrap(~id, ncol=4) + xlab("Years of experience") + ylab("Log wages") + invthm + theme(legend.position = "none") ``` --- <img src="slides_files/figure-html/unnamed-chunk-16-1.svg" width="100%" /> --- .pull-left[ # Multivariate Multiple response variables, expecting some association between them. - ln_wages - expens (simulated) - savings (simulated) ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-17-1.svg" width="100%" /> ] --- # Multivariate ```r library(tourr) wages_mv_df <- wages_mv %>% select(ln_wages, expens, savings, id) %>% as.data.frame() wages_12227 <- data.frame(from = 6137:6146, to = 6138:6147) %>% as.matrix() wages_735 <- data.frame(from = 408:418, to = 409:419) %>% as.matrix() edges <- rbind(wages_12227, wages_735) animate_xy(wages_mv_df[,1:3], axes = "bottomleft", col="grey90", edges = wages_12227) animate_xy(wages_mv_df[,1:3], axes = "bottomleft", col="grey90", edges = wages_735) render_gif(wages_mv_df[,1:3], grand_tour(), display_xy(axes = "bottomleft", col="grey90", edges = wages_12227), gif_file="./tour1.gif", apf=1/15, frames=200) render_gif(wages_mv_df[,1:3], grand_tour(), display_xy(axes = "bottomleft", col="grey90", edges = wages_735), gif_file="./tour2.gif", apf=1/15, frames=200) ``` --- .pull-left[ Subject 12227 <img src="tour1.gif" width="90%"> ] -- .pull-left[ Subject 735 <img src="tour2.gif" width="90%"> ] --- class: inverse middle center # Some of these techniques for exploring and describing individuals can be seen being used in practice --- background-image: \url(images/NYTimes1.png) background-size: 80% .footnote[[NYTimes Coronavirus coverage](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)] --- background-image: \url(images/NYTimes2.png) background-size: 100% .footnote[[NYTimes Coronavirus coverage](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)] --- --- # What I hope you have heard today - We need .orange[more research], and .orange[acceptance], on methods for communicating the individual experience. - When the variation is large, summarising the variation is the honest thing to do. - .orange[Statistics needs to address the individual experience], so we can better engage with the public. --- class: inverse middle center # Wrapping up --- background-image: \url(images/mouldy-spaghetti.jpg) background-size: cover class: inverse center # This wages data is stale .footnote[Image source: [https://www.stayathomemum.com.au](https://www.stayathomemum.com.au/my-lifestyle/mouldy-food-how-far-is-too-far-gone/)] --- Openly available data is from https://www.nlsinfo.org/content/cohorts/nlsy79/get-data .pull-left[ It's messy, but it's too clean .small[ - Original cohort included 12,686 individuals, but this data has only 888 individuals - Process to produce the data was not transparent - Some discrepancies from original source ] ] .pull-right[ New data is available .small[ - Refreshed data is available in the CRAN package `yowie` https://numbats.github.io/yowie/ - The process for processing new wages data from NLSY is documented in [Amaliah et al (2022)](https://www.tandfonline.com/doi/epub/10.1080/26939169.2022.2094300). ] ] --- background-image: \url(https://upload.wikimedia.org/wikipedia/commons/9/99/Brolga_%2835984742503%29.jpg) background-size: 40% background-position: 99% 99% class: inverse ### Read more about the `brolgar` package at # http://brolgar.njtierney.com <br> **br**owse **o**ver **l**ongitudinal data <br> **g**raphically and **a**nalytically in **R** .footnote[Image source: [wikicommons](https://upload.wikimedia.org/wikipedia/commons/9/99/Brolga_%2835984742503%29.jpg)] --- class: inverse middle background-image: url(images/people2.png) background-position: 99% 1% background-size: 35% # Acknowledgements Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan), with **iris theme** created from [xaringanthemer](https://github.com/gadenbuie/xaringanthemer). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .footnote[Image credit: 2020 Australian Open spectators by Di Cook]