Lab 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Dalanda Jalloh

Published

February 26, 2026

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences

Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

Questions to answer: - How many hospitals are in your dataset? #222 - How many census tracts? #3445 - What coordinate reference system is each dataset in? #WGS84 (hospitals) #NAD83 (census_tracts)

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

vars_65_plus <- c(
  "B01001_020","B01001_021","B01001_022",
  "B01001_023","B01001_024","B01001_025",
  "B01001_044","B01001_045","B01001_046",
  "B01001_047","B01001_048","B01001_049") 

dem_tract_data <- get_acs(
  geography = "tract",
  variables = c(
    total_pop = "B01003_001",
    median_income = "B19013_001"
  ),
  state = "PA",
  year = 2023,
  output = "wide"  # Makes analysis easier
) %>% 
  select(-total_popM)

dem_tract_age <- get_acs(
  geography = "tract",
  variables = vars_65_plus,
  state = "PA",
  year = 2023,
) %>% 
group_by(GEOID, NAME) %>% 
  summarise(pop_65up= sum(estimate)) %>% 
  ungroup()

dem_data_total<- left_join(dem_tract_age, dem_tract_data) 

# Join to tract boundaries

census_tracts <- tracts(state = "PA", cb = TRUE)

totalpt2 <- census_tracts %>%
  inner_join(dem_data_total, by = "GEOID") %>% 
  select(-NAME.x)

missing_income<- totalpt2 %>% 
  filter(is.na(median_incomeE)) %>% 
  tally()

median_income_tracts<- totalpt2 %>% 
  summarise(median_income = median(median_incomeE, na.rm = TRUE))

Questions to answer: - What year of ACS data are you using? #2023 - How many tracts have missing income data? #65 - What is the median income across all PA census tracts? #72943 —

Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
hist(dem_data_total$median_incomeE)

quantile(dem_data_total$median_incomeE, na.rm = TRUE)

       0%       25%       50%       75%      100% 
 13307.00  57863.50  72943.50  96690.75 250001.00

hist(dem_data_total$pop_65up)

quantile(dem_data_total$pop_65up)

  0%  25%  50%  75% 100% 
   0  429  665  945 2541

#income_cutoff<- 57863.50 or less 
#elderly_cutoff<- 945 and up 

totalpt2<- totalpt2 %>% 
  mutate( 
    vuln_income=case_when(
      median_incomeE <= 57863.50 ~ "vuln_income"), 
    vuln_elderly=case_when(
       pop_65up>= 945 ~ "vuln_elderly")) 
 
  
totalvuln<- totalpt2 %>% 
  filter(vuln_income== vuln_income | vuln_elderly== vuln_elderly) %>% 
  tally() 

totalvulnper <- totalpt2 %>%
  summarise(
    vulnerable = sum(vuln_income == "vuln_income" |
                     vuln_elderly == "vuln_elderly", na.rm = TRUE),
    total_tracts = n(),
    percent_vulnerable = vulnerable / total_tracts * 100
  )

Questions to answer: - What income threshold did you choose and why? I choose a median income threshold of 57863.50 or less. I plotted the histogram of the values and choose the 25% quartile of values which would represent the lowest 25% of median income’s in Pennsylvania. - What elderly population threshold did you choose and why? I selected tracts with 945 or more elderly residents because the value represents the 75th percentile of the distribution, meaning those tracts contain the highest number of 65 and up year olds.

How many tracts meet your vulnerability criteria? 1622 tracts
What percentage of PA census tracts are considered vulnerable by your definition? 47% —

Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection I choose 5070 because it seemed to be universal in the US while the PA state plane was divided by south and north.

Questions to answer:

# What is the average distance to the nearest hospital for vulnerable tracts?
avg_distance<- totalvuln_proj %>% 
  summarise(avg_dis = mean(dist_to_nearest_hosp_miles, na.rm = TRUE))
#3.87 miles 
          
#What is the maximum distance?
max_distance<- totalvuln_proj %>% 
  summarise(max_dis= max(dist_to_nearest_hosp_miles, na.rm = TRUE))
#25.93

#How many vulnerable tracts are more than 15 miles from the nearest hospital?
vul_tract_more_15 <- totalvuln_proj %>%
  filter(dist_to_nearest_hosp_miles >= 15) %>%
  tally()
#39

Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable
totalvuln_proj<- totalvuln_proj %>% 
  mutate(underserved= dist_to_nearest_hosp_miles >= 15)

Questions to answer: - How many tracts are underserved? #39 - What percentage of vulnerable tracts are underserved #2.4% - Does this surprise you? Why or why not? #The percentage of vulnerable tracts is relatively low (2.4%). I am surpised because usually we assume that vulnerable populations are not in proximity to a hospital. But the numbers show that that might not be the case. Access to care might be a question of cost, transportation and available services.

Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

# Spatial join tracts to counties
# Aggregate statistics by county

totalvuln_projtest<- totalvuln_proj %>% 
  group_by(NAMELSADCO) %>% 
  summarise(
    vulnerable= sum(vuln_income == "vuln_income" | vuln_elderly == "vuln_elderly", na.rm = TRUE),
    underserved= sum(underserved, na.rm = TRUE),
    pct_underserved = underserved / 1622 * 100,
    avg_dist_hosp = mean(dist_to_nearest_hosp_miles, na.rm = TRUE),
    total_vulnerable_population = sum(total_popE, na.rm = TRUE)
  )

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? ##Clearfield County #Bradford County #Monroe County #Chester County #Crawford County - Which counties have the most vulnerable people living far from hospitals? #Forest County #Clearfield County #Synder County #Perry County #Tioga County - Are there any patterns in where underserved counties are located? #The counties with the most underserved tracts based on the criteria I defined, have longer distances away from a hospital center

highest_perc <- totalvuln_projtest %>%
  arrange(desc(pct_underserved)) %>%
  slice(1:5)

#I want to get total population to comapre with the total vulnerable population 

pop_pa_counties <- get_acs(
  geography = "county",
  variables = "B01003_001",   
  state = "PA",               
  year = 2023                 
)

pop_pa_counties <- pop_pa_counties %>%
  mutate(NAME = str_remove(NAME, ", Pennsylvania"))

totalvuln_projtest<- totalvuln_projtest %>% 
  rename(NAME=NAMELSADCO)

totalvuln_projtest1 <- totalvuln_projtest %>%
  left_join(pop_pa_counties, by = "NAME") %>% 
  select(!"moe") %>% 
   select(-c(GEOID, variable))

totalvuln_projtest1<- totalvuln_projtest1 %>% 
  relocate(estimate, .after = total_vulnerable_population)

totalvuln_projtest1<- totalvuln_projtest1 %>%
  mutate(pct_total_vuln= total_vulnerable_population/estimate) 

totalvuln_projtest1<- totalvuln_projtest1 %>% 
  relocate(pct_total_vuln, .after= estimate)

totalvuln_projtest2<- totalvuln_projtest1 %>% 
  select(c(NAME, avg_dist_hosp, pct_total_vuln)) %>% 
  st_drop_geometry()

totalvuln_projtest3 <- totalvuln_projtest2 %>%
  filter(NAME %in% c(
    "Forest County",
    "Fulton County",
    "Clearfield County",
    "Northumberland County",
    "Snyder County",
    "Perry County",
    "Bedford County",
    "Montour County",
    "Crawford County",
    "Fayette County"
  )) %>%
 arrange(desc(pct_total_vuln)) %>% 
  mutate(avg_dist_hosp = round(avg_dist_hosp, 2)) %>% 
mutate(pct_total_vuln = round(pct_total_vuln * 100, 2))

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

library(knitr)
# Create and format priority counties table
kable(totalvuln_projtest3,
      col.names = c("County", "Avg Miles to Nearest Hospital", "Percentage of Vulnerable Population"),
      caption = "Top 10 priority Counties",
      format.args = list(big.mark = ","))

Top 10 priority Counties
County	Avg Miles to Nearest Hospital	Percentage of Vulnerable Population
Forest County	18.34	100.00
Fulton County	7.21	100.00
Clearfield County	13.01	80.57
Montour County	1.14	78.90
Northumberland County	10.13	76.24
Fayette County	5.85	73.80
Snyder County	13.57	70.75
Bedford County	8.23	66.74
Crawford County	6.10	66.44
Perry County	10.60	66.17

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)

Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map

totalvulnjoin<- totalvuln_projtest2 %>% 
  left_join(totalvuln_projtest)

totalvulnjoin <- st_as_sf(totalvulnjoin, sf_column_name = "geometry")

ggplot() +
  # County fill
  geom_sf(data = totalvulnjoin,
          aes(fill = pct_total_vuln),
          color = NA) +
  
  # Color scale with formatted legend
  scale_fill_viridis_c(
    option = "plasma",
    name = "% Vulnerable\nTracts Underserved",
    labels = function(x) paste0(round(x * 100, 1), "%")
  ) +
  
  # Hospital points
  geom_sf(data = hospitals,
          color = "red",
          size = 1.5) +
  
  # County boundaries on top
  geom_sf(data = pa_counties,
          fill = NA,
          color = "black",
          linewidth = 0.5) +
  
  # Titles and caption
  labs(
    title = "Vulnerable Tracts that are Underserved in Pennsylvania by Population",
    subtitle = "Vulnerable Population is defined by high number of elderly population and low median income*\nRed points indicate hospital locations.",
    caption = "Source: ACS 2018–2023, TIGER/Line Shapefiles, Hospital Location Data.\nDistance calculated from tract centroids to nearest hospital."
  ) +
  
  # Clean theme
  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    plot.caption = element_text(size = 8),
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  )

nrow(totalvuln_projtest1)

[1] 67

pa_counties1<- pa_counties %>% 
  rename(NAME=COUNTY_NAM)

totalvuln_projtest1<- totalvuln_projtest1 %>% 
  mutate(NAME = str_remove(NAME, " COUNTY"))

pa_counties1$pct_total_vuln <- 
  totalvuln_projtest1$pct_total_vuln[
    match(pa_counties1$NAME, totalvuln_projtest1$NAME)
  ]
sum(is.na(pa_counties1$pct_total_vuln))

[1] 67

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels

Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map


ggplot() +
  
  # Base layer (all tracts)
  geom_sf(data = totalvuln_proj,
          aes(fill = "Not Underserved"),
          color = NA) +
  
  # Highlight underserved tracts
  geom_sf(data = totalvuln_proj %>% filter(underserved == TRUE),
          aes(fill = "Underserved Vulnerable Tract"),
          color = NA) +
  
  # Hospitals
  geom_sf(data = hospitals,
          aes(color = "Hospital"),
          size = 1.2) +
  
  # County boundaries
  geom_sf(data = pa_counties1,
          fill = NA,
          color = "black",
          linewidth = 0.5) +
  
  # Legend styling
  scale_fill_manual(
    values = c(
      "Not Underserved" = "gray85",
      "Underserved Vulnerable Tract" = "red"
    ),
    name = "Tract Status"
  ) +
  
  scale_color_manual(
    values = c("Hospital" = "red"),
    name = ""
  ) +
  
  labs(
    title = "Underserved Census Tracts in Pennsylvania",
    subtitle = "Red shaded tracts are vulnerable and underserved.\n White tracts represent tracts that are not vulnerable or underserved.\nRed points indicate hospital locations.",
    caption = "Source: ACS 2018–2023, TIGER/Line Shapefiles, Hospital Location Data.\nDistance calculated from tract centroids to nearest hospital."
  ) +
  
  theme_void() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    plot.caption = element_text(size = 8),
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  )

nrow(totalvuln_proj)

[1] 1622

table(totalvuln_proj$vulnerable, useNA = "ifany")

< table of extent 0 >

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization

totalvuln_projtest1bar <- totalvuln_projtest1 %>%
  filter(underserved != 0)

ggplot(totalvuln_projtest1bar,
       aes(x = reorder(NAME, underserved),
           y = underserved)) +
  geom_col(fill = "green") +
  coord_flip() +
  labs(
    title = "Number of Underserved Tracts by County",
    caption = str_wrap(
      "Source: ACS 2018–2023, TIGER/Line Shapefiles. This chart represents the number of underserved census tracts in vulnerable counties in Pennsylvania. Vulnerability was determined by low median income and higher elderly population. The underserved variable was determined by calculating the number of vulnerable tracts more than 15 miles from the nearest hospital.",
      width = 90
    ),
    x = "County",
    y = "Count of Underserved Tracts"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    plot.caption = element_text(size = 8),
    plot.caption.position = "plot"
  )

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)

Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question

Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement

Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment —

Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies —

Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance

Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention

Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions

Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity

Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Recommended Starting Points

If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics

If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations

Your Analysis

Your Task:

Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics

Questions to answer: - What dataset did you choose and why? Open Data Philly: Bike Lanes Schools Crime I wanted to investigate whether crime clustered near schools with low-bike network coverage. This builds on an existing PennDOT’s Safe Routes to School (SRTS) Program that focuses on improving pedestrian and bicycle safety near schools.

What is the data source and date? Open Data Philly
How many features does it contain?

nrow(schools_proj)

[1] 490

nrow(bike_network_proj)

[1] 5225

nrow(crime_proj)

[1] 162032

What CRS is it in? Did you need to transform it? I transformed it to projected coordinate system 2272 which corresponds with the Pennsylvania South region where Phialdelphia is located.

Pose a research question

Which school buffer zones have the highest amount of crime exposure and and low-bike network coverage, signaling where the city should invest in more safety resources for children walking to school?

Examples: - “Do vulnerable tracts have adequate public transit access to hospitals?” - “Are EMS stations appropriately located near vulnerable populations?” - “Do areas with low vehicle access have worse hospital access?”

Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis

#First I created a buffer of 1,000ft for my schools 
school_buffers <- st_buffer(schools_proj, dist = 1000)

#I plotted it to make sure it worked 
library(ggplot2)

ggplot() +
  geom_sf(data = school_buffers, fill = NA) +
  geom_sf(data = schools_proj, size = 3) +
  theme_void()

#Next I joined my crime data with the schools buffer, filtered for only the schools that were associated with the crime, grouped by the school name+ walk shed and then counted the total amount of crime per each school walk shed 
school_crime <- st_join(crime_proj, school_buffers, join = st_within) %>%
  filter(!is.na(school_num)) %>% 
  group_by(school_nam) %>% 
  summarise(total_crime=n())

ggplot() +
  geom_sf(data = school_buffers, fill = NA)+
  geom_sf(data = school_crime, fill = NA) +
  theme_void()

#Then I joined school buffers with the bike network 
schools_buffers_bike<- st_join(school_buffers, bike_network_proj, join= st_intersects)

# I created a new df and grouped the join by school name then counted the amount of bike lanes per school buffer 
bike_counts <- schools_buffers_bike %>%
  st_drop_geometry() %>% 
  group_by(school_nam) %>% 
 summarise(bike_lane_count = n()) %>% 
  ungroup()

# I then joined my two data frames: the one with the school_buffer and crime counts and the one with the school_buffer and bike lanes count by the column they had in common which was the school name
final<- school_crime %>% 
  left_join(bike_counts, by="school_nam")

#Finally, I had to create an index to score the top schools in need based on two criteria: low number of bike lanes and high crime. By scaling each value, I essentially standardized the values so that I can compare them in terms of standard deviation. By subtracting bike from crime, this is essentially weighting the better regardless of how much crime there is. I then sliced for the top 10 schools at risk.
  final<- final %>% 
  mutate(
  crime_z = scale(total_crime),
  bike_z  = scale(bike_lane_count),
  vulnerability = crime_z - bike_z
) %>% 
  arrange(desc(vulnerability)) %>%
  slice(1:10)

#I had to do some more data chopping to plot my final variables  
  
 #this gave me the school buffers I want to plot  
final1<- final %>% 
  st_drop_geometry() %>% 
  select(school_nam)

schools_boundary<- final1 %>% 
  left_join(school_buffers, by= "school_nam")

#I had to do this to brign back the geometry, when I did the join it went away 
schools_boundary <- st_as_sf(schools_boundary, sf_column_name = "geometry")

#now I am getting the shape of Philadelphia 
pa_counties_phl<- pa_counties %>% 
  filter(COUNTY_NAM== "PHILADELPHIA")

#school points I want to plot 
schools_plot<- final1 %>% 
  left_join(schools_proj, by="school_nam")

schools_plot<- st_as_sf(schools_plot, sf_column_name = "geometry")

#bike lanes I want to map 

#first I have to do the join differently to keep the line geometry 

schools_buffers_bike1<- schools_buffers_bike %>% 
  filter(school_nam %in% c("CONWELL", "RUSSELL MIDDLE SCHOOL", "YOUTHBUILD PHILADELPHIA CHARTER", "PEIRCE, THOMAS M. SCHOOL", "KENDERTON ELEMENTARY", "WILLARD, FRANCES E. SCHOOL",     
"DUCKREY, TANNER SCHOOL", "PHILADELPHIA MILITARY ACADEMY", "JOHN B STETSON MIDDLE SCHOOL", "DEBURGOS, J. ELEMENTARY",   
"ELKIN, LEWIS SCHOOL"))

#re join to get line geometry 
schools_buffer_bikes_plot<- schools_buffers_bike1 %>% 
  st_drop_geometry()

schools_buffer_bikes_plot<- schools_buffer_bikes_plot %>% 
  left_join(bike_network_proj, by= "STREETNAME")

schools_buffer_bikes_plot<- st_as_sf(
  schools_buffer_bikes_plot, 
  sf_column_name = "geometry")


#Final Map 

ggplot() +
  geom_sf(data = pa_counties_phl, fill = NA) +
  geom_sf(data = schools_boundary, fill = NA) +
  geom_sf(data = schools_plot, color = "red", size = 1) +
  geom_sf(data = schools_buffer_bikes_plot, color = "blue", size = 1)+
  coord_sf(
  xlim = c(-75.22, -75.10),
  ylim = c(39.92, 40.03)
) + 
  theme_void()

#This is for my interpretation below 

schools_exp<- schools_proj %>% 
filter(school_nam %in% c("CONWELL", "RUSSELL MIDDLE SCHOOL", "YOUTHBUILD PHILADELPHIA CHARTER", "PEIRCE, THOMAS M. SCHOOL", "KENDERTON ELEMENTARY", "WILLARD, FRANCES E. SCHOOL",   
"DUCKREY, TANNER SCHOOL", "PHILADELPHIA MILITARY ACADEMY", "JOHN B STETSON MIDDLE SCHOOL", "DEBURGOS, J. ELEMENTARY",   
"ELKIN, LEWIS SCHOOL"))

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

I was able to find the top 10 schools most in need for investment based on an index that evaluated the presence of bike lanes and crime incidents. This builds off of PennDot’s safe routes to school initiative that encourages more students to walk or bike to school. Improving mobility conditions near schools may also make the walkshed safer. The 10 schools happen to be clustered in the same area in North Philadelphia indicating a more systemic issue that the City needs to address. Most concerning is that 6/10 schools selected are elementary schools. The exposure to crime at a young age has multiple implications to their development and likeliness to participate in crime as they get older.

Finally - A few comments about your incorporation of feedback!

Last assignment I received feedback suggesting that I use echo=false to hide the R packages that I loaded- I incorporated that in this week’s lab. I also made sure to check that I didn’t have any explanatory text in my code block.

Submission Requirements

What to submit:

Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use embed-resources: true in YAML so it’s a single file
- All code should run without errors
- All maps and charts should display correctly
Submit the correct and working links of your assignment on Canvas