Visualizing Obesity Prevalence Across US States with R

Idrissa Tankari
4 min readApr 1, 2024

--

Download The Dataset

Setting Up the Environment

Retrieving Geographic Data:

We begin by loading essential packages such as maps and usdata, setting the foundation for our analysis. Utilizing the map_data() function, we fetch geographic data for US states, ensuring we have a robust canvas to portray obesity prevalence.

library(maps)
library(usdata)

# Retrieve state-level geographic data
us.states <- map_data("state")

Refining State Names:

Consistency is key in data analysis. Hence, we employ toTitleCase() To standardize state names, ensuring seamless integration with our obesity prevalence dataset. This step lays a solid groundwork for subsequent merging operations.

us.states$region <- toTitleCase(us.states$region)
us.states <- rename(us.states, State = region)

Reading and Filtering Obesity Prevalence Data:

We import obesity prevalence statistics for 2022 from a CSV file, curating a dataset that encapsulates vital health insights. Employing strategic filtering techniques, we exclude non-contiguous territories and focus solely on the 50 US states, streamlining our analysis for precision.

obesity.data <- read.csv("obesity_prevalence_2022.csv")
obesity.data.2 <- subset(obesity.data, State != "Alaska" & State != "Hawaii" & State != "Puerto Rico" &
State != "Virgin Islands" & State != "Guam")

Merging and Categorizing Data:

With geographic and prevalence datasets at our disposal, we merge them based on state names, creating a comprehensive dataset ripe for exploration. Leveraging this unified dataset, we categorize obesity prevalence into distinct ranges, fostering clarity and comprehension.

merged.data.2b <- merge(us.states, obesity.data.2, by = "State")

# Categorize obesity prevalence into ranges
merged.data.2b$cat <- "n/a"
merged.data.2b$cat[merged.data.2b$Prevalence >= 20 & merged.data.2b$Prevalence < 25] <- "20%-<25%"
merged.data.2b$cat[merged.data.2b$Prevalence >= 25 & merged.data.2b$Prevalence < 30] <- "25%-<30%"
merged.data.2b$cat[merged.data.2b$Prevalence >= 30 & merged.data.2b$Prevalence < 35] <- "30%-<35%"
merged.data.2b$cat[merged.data.2b$Prevalence >= 35 & merged.data.2b$Prevalence < 40] <- "35%-<40%"
merged.data.2b$cat[merged.data.2b$Prevalence >= 40 & merged.data.2b$Prevalence < 45] <- "40%-<45%"
merged.data.2b$cat[merged.data.2b$Prevalence >= 45 & merged.data.2b$Prevalence < 50] <- "45%-<50%"
merged.data.2b$cat[merged.data.2b$Prevalence > 50] <- ">50%"
merged.data.2b$cat[is.na(merged.data.2b$Prevalence)] <- "Insufficient data"

Visualization with ggplot2

Visualizing Obesity Prevalence

Armed with a meticulously curated dataset, we harness the power of ggplot2 to craft an engaging choropleth map that vividly illustrates obesity prevalence variations across US states in 2022:

Constructing the Choropleth Map:

Using ggplot() and geom_polygon(), we construct a captivating choropleth map, where each state polygon is filled with a color representing its obesity prevalence category. This visual representation facilitates an intuitive understanding of prevalence disparities.

library(ggplot2)

# Create ggplot for visualization
e1 <- ggplot(merged.data.2b, aes(x = long, y = lat, group = State, fill = cat)) +
geom_polygon(color = "black", linewidth = 0.1) + theme_void() +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
labs(fill = "")

Customizing Aesthetics:

To enhance visual clarity and appeal, we fine-tune the map aesthetics with elements such as theme_void() and coord_map(), ensuring optimal readability and coherence. Additionally, we customize fill colors to create a visually compelling narrative.

fill_colors <- c("darkolivegreen3", "yellow", "tan1", "orangered", "brown3", "white")

# Customize fill colors and add title
e2 <- e1 + scale_fill_manual(values = fill_colors)
e3 <- e2 + labs(title = "Prevalence of Self-Reported Obesity Among US Adults by State, 2022")

Conclusion

Through meticulous data preparation and thoughtful visualization design, we transform raw information into actionable insights, empowering stakeholders to address the obesity epidemic with informed strategies and interventions.

--

--

Idrissa Tankari
Idrissa Tankari

Written by Idrissa Tankari

Data Scientist with 8+ years of experience in Marketing, Accounting and Finance.

No responses yet