R pivot_longer() and Python melt()
Reshaping a DataFrame from wide to long format
Inma using Quarto
January 6, 2024
What does pivot_longer() do? :
Reshapes data from a wide format to a longer format.
Transforms multiple columns representing different variables into two key-value columns:
Key column: Holds the names of the original variables.
Value column: Contains the corresponding values from those variables.
Arguments:
data
: The data frame to be reshaped.cols
: The columns to be pivoted into longer format (can be specified by name or position).names_to
: The name of the new key column containing the original variable names.values_to
: The name of the new value column containing the corresponding values.
A simple example:
Loading the necessary libraries: The pivot_longer()
function is accessed from the tidyr
package, which is part of the tidyverse suite.
We create a mock dataset:
# Create a dataframe with 10 rows and 6 columns
clothes <- data.frame(
item = c("t-shirts", "jackets", "hoodies", "jeans", "shorts", "dresses", "skirts", "scarfs", "leggings", "gloves"),
price = c(10.05, 24.50, 39.90, 44.95, 9.90, 29.95, 17.25, 8.45, 12.99, 7.55),
pink = c(2,6,3,4,2,5,2,0,2,8),
blue = c(3,4,3,5,2,0,2,6,3,1),
yellow = c(1,5,8,5,8,4,2,5,2,0)
)
We print our dataframe:
item price pink blue yellow
1 t-shirts 10.05 2 3 1
2 jackets 24.50 6 4 5
3 hoodies 39.90 3 3 8
4 jeans 44.95 4 5 5
5 shorts 9.90 2 2 8
6 dresses 29.95 5 0 4
7 skirts 17.25 2 2 2
8 scarfs 8.45 0 6 5
9 leggings 12.99 2 3 2
10 gloves 7.55 8 1 0
We apply the pivot_longer function to our clothes data-frame, choosing the columns 3,4 and 5 to be pivoted into a column named “colors” and the values to the column named quantities:
We print the pivoted version of our data:
# A tibble: 30 × 4
item price colors quantities
<chr> <dbl> <chr> <dbl>
1 t-shirts 10.0 pink 2
2 t-shirts 10.0 blue 3
3 t-shirts 10.0 yellow 1
4 jackets 24.5 pink 6
5 jackets 24.5 blue 4
6 jackets 24.5 yellow 5
7 hoodies 39.9 pink 3
8 hoodies 39.9 blue 3
9 hoodies 39.9 yellow 8
10 jeans 45.0 pink 4
# ℹ 20 more rows
See the pivot_longer documentation for more details.
Equivalente en Python
Venga, este trozo lo hacemos en español.
¿Qué hace la funcion melt() de Pandas? :
- Lo mismo que la funcion pivot_longer() de
tidyr
Usaremos el mismo ejemplo:
Creamos el DataFrame, después de importar Pandas:
import pandas as pd
ropas = pd.DataFrame({
"articulo": ["camisetas", "chaquetas", "sudaderas", "pantalones", "bermudas", "vestidos", "faldas", "bufandas", "mallas", "guantes"],
"precio": [10.05, 24.50, 39.90, 44.95, 9.90, 29.95, 17.25, 8.45, 12.99, 7.55],
"rosa": [2, 6, 3, 4, 2, 5, 2, 0, 2, 8],
"azul": [3, 4, 3, 5, 2, 0, 2, 6, 3, 1],
"amarillo": [1, 5, 8, 5, 8, 4, 2, 5, 2, 0]
})
print(ropas)
articulo precio rosa azul amarillo
0 camisetas 10.05 2 3 1
1 chaquetas 24.50 6 4 5
2 sudaderas 39.90 3 3 8
3 pantalones 44.95 4 5 5
4 bermudas 9.90 2 2 8
5 vestidos 29.95 5 0 4
6 faldas 17.25 2 2 2
7 bufandas 8.45 0 6 5
8 mallas 12.99 2 3 2
9 guantes 7.55 8 1 0
Y le damos la vuelta a las columnas de los colores:
# Pivot longer using melt
ropas_larga = pd.melt(ropas, id_vars=['articulo', 'precio'],
value_vars=['rosa', 'azul', 'amarillo'],
var_name='cantidad',
value_name='colores')
print(ropas_larga)
articulo precio cantidad colores
0 camisetas 10.05 rosa 2
1 chaquetas 24.50 rosa 6
2 sudaderas 39.90 rosa 3
3 pantalones 44.95 rosa 4
4 bermudas 9.90 rosa 2
5 vestidos 29.95 rosa 5
6 faldas 17.25 rosa 2
7 bufandas 8.45 rosa 0
8 mallas 12.99 rosa 2
9 guantes 7.55 rosa 8
10 camisetas 10.05 azul 3
11 chaquetas 24.50 azul 4
12 sudaderas 39.90 azul 3
13 pantalones 44.95 azul 5
14 bermudas 9.90 azul 2
15 vestidos 29.95 azul 0
16 faldas 17.25 azul 2
17 bufandas 8.45 azul 6
18 mallas 12.99 azul 3
19 guantes 7.55 azul 1
20 camisetas 10.05 amarillo 1
21 chaquetas 24.50 amarillo 5
22 sudaderas 39.90 amarillo 8
23 pantalones 44.95 amarillo 5
24 bermudas 9.90 amarillo 8
25 vestidos 29.95 amarillo 4
26 faldas 17.25 amarillo 2
27 bufandas 8.45 amarillo 5
28 mallas 12.99 amarillo 2
29 guantes 7.55 amarillo 0
Aunque nos lo ha ordenado por colores en vez de por artículo. Pero bueno, nos vale así ¿verdad?
Te puedes bajar este documento en formato Quarto en mi GitHub