R pivot_longer() and Python melt()

R pivot_longer() and Python melt()

Reshaping a DataFrame from wide to long format

Author

Inma using Quarto

Published

January 6, 2024

What does pivot_longer() do? :

  • Reshapes data from a wide format to a longer format.

  • Transforms multiple columns representing different variables into two key-value columns:

    • Key column: Holds the names of the original variables.

    • Value column: Contains the corresponding values from those variables.

Arguments:

  • data: The data frame to be reshaped.

  • cols: The columns to be pivoted into longer format (can be specified by name or position).

  • names_to: The name of the new key column containing the original variable names.

  • values_to: The name of the new value column containing the corresponding values.

A simple example:

Loading the necessary libraries: The pivot_longer() function is accessed from the tidyr package, which is part of the tidyverse suite.

library(tidyverse)

We create a mock dataset:

# Create a dataframe with 10 rows and 6 columns
clothes <- data.frame(
  item = c("t-shirts", "jackets", "hoodies", "jeans", "shorts", "dresses", "skirts", "scarfs", "leggings", "gloves"),
  price = c(10.05, 24.50, 39.90, 44.95, 9.90, 29.95, 17.25, 8.45, 12.99, 7.55),
  pink = c(2,6,3,4,2,5,2,0,2,8),
  blue = c(3,4,3,5,2,0,2,6,3,1),
  yellow = c(1,5,8,5,8,4,2,5,2,0)  
 )

We print our dataframe:

# Printing clothes
print(clothes)
       item price pink blue yellow
1  t-shirts 10.05    2    3      1
2   jackets 24.50    6    4      5
3   hoodies 39.90    3    3      8
4     jeans 44.95    4    5      5
5    shorts  9.90    2    2      8
6   dresses 29.95    5    0      4
7    skirts 17.25    2    2      2
8    scarfs  8.45    0    6      5
9  leggings 12.99    2    3      2
10   gloves  7.55    8    1      0

We apply the pivot_longer function to our clothes data-frame, choosing the columns 3,4 and 5 to be pivoted into a column named “colors” and the values to the column named quantities:

clothes_long <- pivot_longer(clothes,
                             3:5,
                             names_to = "colors",
                             values_to = "quantities")

We print the pivoted version of our data:

# Printing clothes_long
print(clothes_long)
# A tibble: 30 × 4
   item     price colors quantities
   <chr>    <dbl> <chr>       <dbl>
 1 t-shirts  10.0 pink            2
 2 t-shirts  10.0 blue            3
 3 t-shirts  10.0 yellow          1
 4 jackets   24.5 pink            6
 5 jackets   24.5 blue            4
 6 jackets   24.5 yellow          5
 7 hoodies   39.9 pink            3
 8 hoodies   39.9 blue            3
 9 hoodies   39.9 yellow          8
10 jeans     45.0 pink            4
# ℹ 20 more rows

See the pivot_longer documentation for more details.

Equivalente en Python

Venga, este trozo lo hacemos en español.

¿Qué hace la funcion melt() de Pandas? :

  • Lo mismo que la funcion pivot_longer() de tidyr

Usaremos el mismo ejemplo:

Creamos el DataFrame, después de importar Pandas:

import pandas as pd
ropas = pd.DataFrame({
    "articulo": ["camisetas", "chaquetas", "sudaderas", "pantalones", "bermudas", "vestidos", "faldas", "bufandas", "mallas", "guantes"],
    "precio": [10.05, 24.50, 39.90, 44.95, 9.90, 29.95, 17.25, 8.45, 12.99, 7.55],
    "rosa": [2, 6, 3, 4, 2, 5, 2, 0, 2, 8],
    "azul": [3, 4, 3, 5, 2, 0, 2, 6, 3, 1],
    "amarillo": [1, 5, 8, 5, 8, 4, 2, 5, 2, 0]
})

print(ropas)
     articulo  precio  rosa  azul  amarillo
0   camisetas   10.05     2     3         1
1   chaquetas   24.50     6     4         5
2   sudaderas   39.90     3     3         8
3  pantalones   44.95     4     5         5
4    bermudas    9.90     2     2         8
5    vestidos   29.95     5     0         4
6      faldas   17.25     2     2         2
7    bufandas    8.45     0     6         5
8      mallas   12.99     2     3         2
9     guantes    7.55     8     1         0

Y le damos la vuelta a las columnas de los colores:

# Pivot longer using melt
ropas_larga = pd.melt(ropas, id_vars=['articulo', 'precio'],
                    value_vars=['rosa', 'azul', 'amarillo'],
                    var_name='cantidad', 
                    value_name='colores')

print(ropas_larga)
      articulo  precio  cantidad  colores
0    camisetas   10.05      rosa        2
1    chaquetas   24.50      rosa        6
2    sudaderas   39.90      rosa        3
3   pantalones   44.95      rosa        4
4     bermudas    9.90      rosa        2
5     vestidos   29.95      rosa        5
6       faldas   17.25      rosa        2
7     bufandas    8.45      rosa        0
8       mallas   12.99      rosa        2
9      guantes    7.55      rosa        8
10   camisetas   10.05      azul        3
11   chaquetas   24.50      azul        4
12   sudaderas   39.90      azul        3
13  pantalones   44.95      azul        5
14    bermudas    9.90      azul        2
15    vestidos   29.95      azul        0
16      faldas   17.25      azul        2
17    bufandas    8.45      azul        6
18      mallas   12.99      azul        3
19     guantes    7.55      azul        1
20   camisetas   10.05  amarillo        1
21   chaquetas   24.50  amarillo        5
22   sudaderas   39.90  amarillo        8
23  pantalones   44.95  amarillo        5
24    bermudas    9.90  amarillo        8
25    vestidos   29.95  amarillo        4
26      faldas   17.25  amarillo        2
27    bufandas    8.45  amarillo        5
28      mallas   12.99  amarillo        2
29     guantes    7.55  amarillo        0

Aunque nos lo ha ordenado por colores en vez de por artículo. Pero bueno, nos vale así ¿verdad?

Te puedes bajar este documento en formato Quarto en mi GitHub