Making a mess again - with the data

SISBID 2025
https://github.com/dicook/SISBID

Warmup

Turn the french_fries data from wide format into a long format with variables type and rating.

# A tibble: 6 × 9
  time  treatment subject   rep potato buttery grassy rancid painty
  <fct> <fct>     <fct>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
1 1     1         3           1    2.9     0      0      0      5.5
2 1     1         3           2   14       0      0      1.1    0  
3 1     1         10          1   11       6.4    0      0      0  
4 1     1         10          2    9.9     5.9    2.9    2.2    0  
5 1     1         15          1    1.2     0.1    0      1.1    5.1
6 1     1         15          2    8.8     3      3.6    1.5    2.3

05:00

What would you like to find out about the french fries data set?

Put your questions in the chat!

What would we like to know?

  • Is the design complete?
  • Are replicates like each other?
  • How do the ratings on the different scales differ?
  • Are raters giving different scores on average?
  • Do ratings change over the weeks?

Each of these questions requires a different summary of the data.

Pivot french fries to long

# A tibble: 6 × 6
  time  treatment subject   rep type    rating
  <fct> <fct>     <fct>   <dbl> <chr>    <dbl>
1 1     1         3           1 potato     2.9
2 1     1         3           1 buttery    0  
3 1     1         3           1 grassy     0  
4 1     1         3           1 rancid     0  
5 1     1         3           1 painty     5.5
6 1     1         3           2 potato    14  

Pivot long to wide

Examples:

  • Are replicates like each other?
    • compare rep 1 to rep 2 values
  • How do the ratings on the different scales differ?
    • compare ratings across scales
  • Are raters giving different scores on average?
    • compare ratings across raters
  • Do ratings change over the weeks?
    • compare ratings week-by-week

Pivot to wide form

tidyr’s pivot_wider function creates variables with comparable values

Long form:

time treatment subject rep type rating
1 1 3 1 potato 2.9
1 1 3 1 buttery 0.0
1 1 3 1 grassy 0.0
1 1 3 1 rancid 0.0

Pivot to wide form

tidyr’s pivot_wider function creates variables with comparable values

Wide form:

treatment subject rep type 1 2 3 4 5 6 7 8 9 10
1 3 1 potato 2.9 9.0 11.8 13.6 14.0 0.4 2.9 3.5 1.1 NA
1 3 1 buttery 0.0 0.3 0.2 0.1 0.3 1.2 0.0 0.5 0.4 NA
1 3 1 grassy 0.0 0.1 0.0 0.0 0.0 0.0 0.0 1.3 0.0 NA

Pivot to wide form

# A tibble: 6 × 14
  treatment subject   rep type     `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`
  <fct>     <fct>   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1         3           1 potato   2.9   9    11.8  13.6  14     0.4   2.9   3.5
2 1         3           1 butte…   0     0.3   0.2   0.1   0.3   1.2   0     0.5
3 1         3           1 grassy   0     0.1   0     0     0     0     0     1.3
4 1         3           1 rancid   0     5.8   6     1.7   0     0     0     0  
5 1         3           1 painty   5.5   0.3   0     0     1.7   9.5   5.5   3.8
6 1         3           2 potato  14     5.5   7.8   5.3  12.9   3.3   0.8   0.6
# ℹ 2 more variables: `9` <dbl>, `10` <dbl>

pivot_wider:

  • creates a new column for each value of the variable in names_from
  • fills values in using the variable in values_from

Comparing ratings: different weeks

Note the use of the backtick for variable names with special characters or numbers.

Your turn: Are the replicates similar?

Goal: Plot the replicates against each other using a scatterplot.

  1. Convert the data into long form
  2. Get the replicates spread into separate columns by replicate.
  3. Make the plot.

05:00

Are ratings similar across scales?

  • Scales: potato-y, buttery, grassy, rancid and painty?
  • Pivot into long form, plot with facet_...(~scale).
# A tibble: 6 × 6
  time  treatment subject   rep type    rating
  <fct> <fct>     <fct>   <dbl> <chr>    <dbl>
1 1     1         3           1 potato     2.9
2 1     1         3           1 buttery    0  
3 1     1         3           1 grassy     0  
4 1     1         3           1 rancid     0  
5 1     1         3           1 painty     5.5
6 1     1         3           2 potato    14  

Are ratings similar across scales?

Side-by-side boxplots

Your turn: Correlation b/w scales?

  • Create a wide form of the data by type of scale.

  • cor allows you to create a correlation matrix.
    Run ?cor to look up how to get rid of NA values in the result.

  • Draw a scatterplot of two scales with the highest (positive or negative) correlation value.

05:00

Ratings by week

Use the long form of the data and plot:

Your turn: Ratings by time & scale?

  • Find a linear model describing the average rating by week (time) and type of scale as shown below.
  • Which form of the dataset should we use?
  • Challenge: can you plot the fitted lines from the model?

05:00

Resources