You can access the R code for this analysis in my GitHub: https://github.com/InmaculadaRM/LiveBirthsMap
# I load my favorite packages (I don´t always use all of them but I keep all in
# my template).
library(tidyverse)
library(janitor)
library(lubridate)
library(kableExtra)
library(formatR)
library(scales)
library(sp)
library(sf)
library(gridExtra)
library(latticeExtra)
library(cowplot)
To perfom this analysis, two datasets has been retrieved from The Scottish Health and Social Care Open Data platform for their analysis. And one from the Spatial Data Metadata Portal, Scotland’s catalogue of spatial data.
Births by hospital, Containing 8266 observations with information for 5 variables The number of live and stillbirths by hospital of birth sourced from the Scottish Morbidity Record 02 (SMR02).
Hospitals in Scotland. 277 observations of 16 variables. with a listing of all NHS hospitals across Scotland.
geographical spatial data for the Scottish Health Boards, a ESRI Shape file spatial data defining the boundaries of NHS Health Boards in Scotland,
Reading the data and cleaning variable names:
#read in .csv files with the data and clea_names
births <- read_csv("https://www.opendata.nhs.scot/dataset/df10dbd4-81b3-4bfa-83ac-b14a5ec62296/resource/d534ae02-7890-4fbc-8cc7-f223d53fb11b/download/10.3_birthsbyhospital.csv") %>%
clean_names() %>%
separate(financial_year, into = c("year", NA), sep = "/")
hospitals <- read_csv("https://www.opendata.nhs.scot/dataset/cbd1802e-0e04-4282-88eb-d7bdcfb120f0/resource/c698f450-eeed-41a0-88f7-c1e40a568acc/download/current-hospital_flagged20211216.csv") %>%
clean_names()
#read in .shp file
## you need to download all the files in your computer and change the path in the code
path = "D:/SpatiaDataFiles/SG_NHS_HealthBoards_2019.shp"
hb_spatial <- st_read(path)
## Reading layer `SG_NHS_HealthBoards_2019' from data source
## `D:\SpatiaDataFiles\SG_NHS_HealthBoards_2019.shp' using driver `ESRI Shapefile'
## Simple feature collection with 14 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 5512.998 ymin: 530250.8 xmax: 470332 ymax: 1220302
## CRS: NA
We can see that the category ‘outcome’ could be ‘Live’, ‘Still’ or ‘Unknown’. We are going to represent live births
head(births)
## # A tibble: 6 × 5
## year ca hospital outcome smr02births
## <chr> <chr> <chr> <chr> <dbl>
## 1 1997 RA2704 A103H Live 11
## 2 1997 RA2704 A103H Still 1
## 3 1997 RA2704 B120H Live 39
## 4 1997 RA2704 C206H Live 1
## 5 1997 RA2704 C313H Live 2
## 6 1997 RA2704 C418H Live 2
table(births$outcome)
##
## Live Still Unknown
## 7440 1329 65
births %>%
filter(year==2022 & outcome=="Live") %>%
summarize(new_babies_2022 = sum(smr02births))
## # A tibble: 1 × 1
## new_babies_2022
## <dbl>
## 1 45061
#number of babies born at home
births %>%
filter(year==2022 & outcome=="Live" & hospital=="D201N") %>%
summarize("Babies born at home in 2022"= sum(smr02births))
## # A tibble: 1 × 1
## `Babies born at home in 2022`
## <dbl>
## 1 209
baby_year <- births %>% filter (outcome=="Live") %>%
group_by(year) %>%
summarise(number_of_babies = sum(smr02births))
kable(baby_year)
year | number_of_babies |
---|---|
1997 | 58282 |
1998 | 56471 |
1999 | 54073 |
2000 | 52498 |
2001 | 50799 |
2002 | 50977 |
2003 | 52585 |
2004 | 53366 |
2005 | 52971 |
2006 | 54982 |
2007 | 57983 |
2008 | 58525 |
2009 | 58066 |
2010 | 57696 |
2011 | 57952 |
2012 | 56406 |
2013 | 55274 |
2014 | 55365 |
2015 | 54571 |
2016 | 53644 |
2017 | 51938 |
2018 | 50556 |
2019 | 48642 |
2020 | 46158 |
2021 | 47518 |
2022 | 45061 |
ggplot(baby_year, aes(year, number_of_babies)) + geom_col(fill="#0097a7", alpha=0.3)+
geom_text(aes(label = number_of_babies), vjust=-0.3, size =2.8, color='#005B70') +
labs(
title = "Number of live births in Scottish hospitals",
subtitle = "(by financial year)",
caption="Data from: Public health Scotland") +
ylab("number of births")
still_year <- births %>%
filter (outcome=="Still") %>%
group_by(year) %>%
summarise(still_births = sum(smr02births))
kable(still_year)
year | still_births |
---|---|
1997 | 307 |
1998 | 318 |
1999 | 240 |
2000 | 286 |
2001 | 254 |
2002 | 245 |
2003 | 322 |
2004 | 256 |
2005 | 269 |
2006 | 295 |
2007 | 293 |
2008 | 298 |
2009 | 297 |
2010 | 277 |
2011 | 260 |
2012 | 231 |
2013 | 235 |
2014 | 200 |
2015 | 199 |
2016 | 216 |
2017 | 214 |
2018 | 173 |
2019 | 155 |
2020 | 190 |
2021 | 175 |
2022 | 157 |
ggplot(still_year, aes(year, still_births)) + geom_col(fill="brown", alpha=0.4) +
ylim(0, 1000)
Live births at home. (Maybe not all home births were recorded in this dataset).
home <- births %>%
# D201N is the code for home births
filter(hospital== "D201N") %>%
group_by(year) %>%
summarize(home_delivered = sum(smr02births))
ggplot(home, aes(year, home_delivered)) + geom_col(fill="#0097a7", alpha=0.3) +
ylab("Number of babies") +
geom_text(aes(label = home_delivered), vjust=-0.1, size =3, color='#0097a7') +
labs(
title = "Trends in home delivery births in Scotland",
subtitle = "(by financial year)",
caption="Data from: Public health Scotland")
admissions_deaths %>% ggplot(aes(x = reorder(injury_type, death_ratio), y = death_ratio)) + geom_col(color=“red”, fill=‘pink’) + coord_flip() + labs( title = “Death ratio by Injury type”, subtitle = “Scotland 2013-2022”, caption = “Data source: Public Health Scotland”, y = “Deaths/Admissions ratio”, x = ““, fill =”total_deaths” ) + geom_text(aes(label = round(death_ratio, 3)), hjust = -0.1, size = 3, color=‘red’)
#subseting live births in 2022 grouped by hospital
newborns22 <- births %>%
# D201N is the code for home births (52 births in 2021)
filter(year==2022 & outcome=="Live" & hospital!= "D201N") %>%
group_by(hospital) %>%
summarize(babies_2022 = sum(smr02births)) %>%
arrange(desc(babies_2022))
head(hospitals)
## # A tibble: 6 × 15
## hospital_code hospital_name address_line1 address_line2 address_line2qf
## <chr> <chr> <chr> <chr> <chr>
## 1 A101H Arran War Memorial … Lamlash Isle of Arran <NA>
## 2 A103H Ayrshire Central Ho… Kilwinning R… Irvine <NA>
## 3 A105H Kirklandside Hospit… Kirklandside Kilmarnock <NA>
## 4 A110H Lady Margaret Hospi… College St Millport <NA>
## 5 A111H University Hospital… Kilmarnock R… Kilmarnock <NA>
## 6 A112H Brooksby Day Hospit… 18 Greenock … Largs <NA>
## # ℹ 10 more variables: address_line3 <chr>, address_line3qf <chr>,
## # address_line4 <chr>, address_line4qf <chr>, postcode <chr>,
## # health_board <chr>, hscp <chr>, council_area <chr>,
## # intermediate_zone <chr>, data_zone <chr>
Finding column´s names in the hospitals dataset
names(hospitals)
## [1] "hospital_code" "hospital_name" "address_line1"
## [4] "address_line2" "address_line2qf" "address_line3"
## [7] "address_line3qf" "address_line4" "address_line4qf"
## [10] "postcode" "health_board" "hscp"
## [13] "council_area" "intermediate_zone" "data_zone"
Joining births dataset with hospital dataset:
births_2022 <- newborns22 %>%
left_join(hospitals, by=c("hospital" = "hospital_code")) %>%
select(hospital, hospital_name, health_board, babies_2022)
kable(births_2022,
caption = "Live births in Scottish hospitals in 2022") %>%
kable_styling(latex_options = "striped", font_size = 12)
hospital | hospital_name | health_board | babies_2022 |
---|---|---|---|
S314H | Royal Infirmary of Edinburgh at Little France | S08000024 | 5534 |
G405H | Queen Elizabeth University Hospital | S08000031 | 5154 |
G108H | The Princess Royal Maternity Unit | S08000031 | 4586 |
N161H | Aberdeen Maternity Hospital | S08000020 | 4509 |
L308H | University Hospital Wishaw | S08000032 | 4042 |
C418H | Royal Alexandra Hospital | S08000031 | 3201 |
T101H | Ninewells Hospital | S08000030 | 3184 |
V217H | Forth Valley Royal Hospital | S08000019 | 2774 |
A111H | University Hospital Crosshouse | S08000015 | 2690 |
F705H | Victoria Maternity Unit | S08000029 | 2519 |
S308H | St John’s Hospital | S08000024 | 2351 |
H202H | Raigmore Hospital | S08000022 | 1818 |
Y146H | Dumfries & Galloway Royal Infirmary | S08000017 | 1075 |
B120H | Borders General Hospital | S08000016 | 663 |
W107H | Western Isles Hospital | S08000028 | 119 |
T304H | Arbroath Infirmary | S08000030 | 106 |
N411H | Dr Gray’s Hospital | S08000020 | 96 |
Z102H | Gilbert Bain Hospital | S08000026 | 95 |
T202H | Perth Royal Infirmary | S08000030 | 92 |
N333H | Peterhead Community Hospital | S08000020 | 75 |
N331H | Inverurie Hospital | S08000020 | 67 |
R103H | The Balfour | S08000025 | 50 |
C121H | Lorn & Islands Hospital | S08000022 | 11 |
H212H | Belford Hospital | S08000022 | 10 |
C106H | Cowal Community Hospital | S08000022 | 9 |
H103H | Caithness General Hospital | S08000022 | 9 |
C313H | Inverclyde Royal Hospital | S08000031 | 5 |
C206H | Vale of Leven General Hospital | S08000031 | 4 |
H224H | Mid-Argyll Community Hospital and Integrated Care Centre | S08000022 | 3 |
W108H | Uist & Barra Hospital | S08000028 | 1 |
#calculate births for each health board
births_hb<- births_2022 %>%
group_by(health_board) %>%
summarise(Newborns = sum(babies_2022)) %>%
arrange(desc(Newborns))
kable(births_hb,
caption = "Live births by Health Boards in 2022") %>%
kable_styling(latex_options = "striped", font_size = 12)
health_board | Newborns |
---|---|
S08000031 | 12950 |
S08000024 | 7885 |
S08000020 | 4747 |
S08000032 | 4042 |
S08000030 | 3382 |
S08000019 | 2774 |
S08000015 | 2690 |
S08000029 | 2519 |
S08000022 | 1860 |
S08000017 | 1075 |
S08000016 | 663 |
S08000028 | 120 |
S08000026 | 95 |
S08000025 | 50 |
Joining our births & hospital data with the spatial data for the NHS Health boards boundaries:
#join the spatial data with
births_spatial <- hb_spatial %>%
left_join(births_hb, by = c("HBCode" = "health_board"))
baby2022_map <- ggplot(births_spatial, aes(fill = Newborns)) +
geom_sf(size = 0.1, color = "#0097a7") +
scale_fill_viridis_c(option = "mako", direction = -1) +
labs(
title = "Live births in Scotland 2022",
subtitle = "by Health Boards",
caption="Data from: Public health Scotland & Scottish Goverment spatial data") +
coord_sf() +
theme_void()
baby2022_map
See more data fun and drawings in the author´s website www.inmaruiz.com
R: R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URLcitatiohttps://www.R-project.org/.
janitor: Firke S (2021). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.1.0, https://CRAN.R-project.org/package=janitor..
Tidyverse: Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Knitr: Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.40. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016
kableExtra: Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.
ggplot: H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016
formatR Xie Y (2023). formatR: Format R Code Automatically. R package version 1.14, https://CRAN.R-project.org/package=formatR.
lubridate Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
rgdal Bivand R, Keitt T, Rowlingson B (2023). rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library. R package version 1.6-4, https://CRAN.R-project.org/package=rgdal.
sp Pebesma, E.J., R.S. Bivand, 2005. Classes and methods for spatial data in R. R News 5 (2), https://cran.r-project.org/doc/Rnews/. Roger S. Bivand, Edzer Pebesma, Virgilio Gomez-Rubio, 2013. Applied spatial data analysis with R, Second edition. Springer, NY. https://asdar-book.org/
sf Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-00.
gridExtra Auguie B (2017). gridExtra: Miscellaneous Functions for “Grid” Graphics. R package version 2.3, https://CRAN.R-project.org/package=gridExtra.
laticeExtra Sarkar D, Andrews F (2022). latticeExtra: Extra Graphical Utilities Based on Lattice. R package version 0.6-30, https://CRAN.R-project.org/package=latticeExtra.
cowplot Wilke C (2020). cowplot: Streamlined Plot Theme and Plot Annotations for ‘ggplot2’. R package version 1.1.1, https://CRAN.R-project.org/package=cowplot.
Spatial Data Metadata Portal, Scotland’s catalogue of spatial data.