exploring K-means clustering algorithm with python (real data)

Analyzing historical weather data with Python:  A machine learning clustering exercise with Scikit-Learn plus a cool map visualization of the clusters with Folium.

There is a 100% reproducible jupyter notebook workflow available in my github site, where you can:

  • Follow the cleaning process of the data.
  • Learn how to decide the optimal number of clusters using the Elbow plot. 
  • Train the K-Means algorithm using  the Scikit-Learn package.
  • Learn how to plot the obtained clusters in a map with the help of the Folium package. 

Clusters of similar weather.

health data science

Data Analysis of Unintentional Injuries in Scotland – data from 2011 to 2020 – (R code)

  • Having a fall was the most common reason for hospital admission for all age groups and sexes.
  • For under 75 years old, there have been more hospital admissions for males than for females, being this difference greater for the age group 15-44 years. This may be due to males more prone to engage themselves in risky activities and behaviors than their females counterparts.
  • For the 75+ years group, there have been more admissions of females than males. Probably due to the higher proportion of female versus males in the total population within this age bracket.
  • Males between 15 to 44 years have the higher total number of admissions for poisoning, accidental exposure, other injuries, traffic accident and struck by.
  • Among all the unintentional injury causes registered in admission, poisoning have the higher death rate (0.283) Followed by transport accidents (0.059) and falls (0.024).

Births in Scotland 2021 - Spatial data with R

Live births in Scotland by Health boards visualized using R programming language – Find the code in a reproduclible Rmarkdown file at my git hub page 

Are you curious about the most prescribed drugs in Scotland?

If so, you’re in luck! I’ve written a R-Markdown document that compares code in R and Python for the most common and basic functions in exploratory data analysis.

The dataset I used is a real-world dataset from Public Health Scotland that contains information on community pharmacy activity. After running the code, you will find the top 40 prescribed items in Scotland in February 2023.

So, whether you’re a Python user who needs to use R, or an R user who wants to learn Python, I encourage you to check out this exercise. It’s a great way to learn how to use both languages for data analysis. The code is 100% reproducible, so you can run it yourself to see how it works.

Here’s the link: https://github.com/InmaculadaRM/Top40Drugs

If you run the code, were those items what you were expecting? (I was surprised to see a group of drugs that weren’t as fashionable when I used to work in the pharmacy. Can you guess which group?)

Machine Learning classification with Phyton

Drawings and storytelling in Data Science can add interest, explanability and engagement to subjects and concepts otherwise hard to follow or understand.  Here, a summary extracted from the results of my Machine Learning course assignment at the University of Stirling.

Visualizing common heart diseases in Scotland by age group for 2021. The datasets used for those plots are available at The Scottish Health and Social Care Open Data platform

Data wrangling and machine learning with R  in Spark –  Drawings for my project in the Big Data Analytics course at the University of Edinburgh. R Markdown document with the code  and a presentation for a broader audience.


some more drawings with the data science theme

When I was a kid, in my natal city, all chicken eggs were white. and very often you were lucky to have two yolks in one egg. Now, most of the eggs are brown and it is very rare to find a two yolks egg.
I have used an invented probability for white and brown eggs to have two yolks in orden to explain conditional probability …mainly for myself as I keep forgetting the concept. 

I’m on my first steps of learning HTML, CSS, SVG, D3.js and React with the aim to do interactive data visualizations. This is what I am capable of so far: My first in viz . (Not too much, I know … but it’s just the starting point.)