less than 1 minute read

An alternative worklfow to Data frame Manipulation with tidyr Created by César Herrera

tidyr use can be consulted here

Here, I present an alternative way to manipulate data frames, transform to wide to long and vice versa, using the package reshape2

reshape2

Load libraries

library(here)
library(reshape2)

Working environment

here::here()

Load data

gdata <- read.csv("gapminder_data.csv")
gdata_wide <- read.csv("gapminder_wide.csv")

Convert wide data to long data

gdata_long_r2 <- melt(gdata_wide, 
                      variable.name = "variable_year",
                      value.names = "value",
                      id.vars = c("continent", "country"))

One column combines the information of variable type and year. We will split these into two columns

columns_long <- colsplit(gdata_long_r2$variable, "_", c("variable", "year"))

Adding our new columns (variable, year) to the long format data

gdata_long_r2 <- cbind(gdata_long_r2[,-3], columns_long)

Create summaries with reshape2

tidy_summary_r2 <- dcast(gdata_long_r2, continent+country~variable, value.var='value',
                         fun.aggregate = mean, 
                         na.rm = TRUE)

Another alternative: creating summaries with base R and reshape2 pivot

Create summaries using base R

gdata_long_r2_summary <- by(gdata_long_r2, 
                            INDICES = list(gdata_long_r2$continent, 
                                           gdata_long_r2$country,
                                           gdata_long_r2$variable),
                            FUN = function(x){
                              data.frame(continent = unique(x$continent),
                                         country = unique(x$country),
                                         variable = unique(x$variable),
                                         mean = mean(x$value))
                            })

Then combine the results into a data frame

gdata_long_r2_summary <- do.call(rbind, gdata_long_r2_summary)

Pivot from the long format to thw wide format

tidy_summary <- dcast(gdata_long_r2_summary, continent+country~variable, value.var='mean')