::p_load(tidyverse, tidymodels, poissonreg) pacman
## Introduction Understanding the efficacy of various insecticides is crucial in agriculture, environmental science, and pest control. The effectiveness of a spray determines both economic and ecological outcomes—ensuring crops are protected while minimizing chemical overuse.
In this post, we’ll analyze the InsectSprays dataset, a classic dataset in R, using generalized linear modeling with a Poisson regression framework. We’ll explore which spray kills the most insects on average and whether there’s a statistically significant difference among them.
Loading the Data and Required Libraries
To begin, we load the necessary libraries:
Next we access the dataset.
<- InsectSprays
insect_spray head(insect_spray)
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
This dataset consists of insect counts (count) across different spray types (spray), giving us a perfect use case for modeling count data.
Summary Statistics
Let’s start by computing basic summaries to understand the distribution of effectiveness across sprays:
|>
insect_spray summarise(
.by = spray,
average_count = mean(count),
times_used = n()
)
spray average_count times_used
1 A 14.500000 12
2 B 15.333333 12
3 C 2.083333 12
4 D 4.916667 12
5 E 3.500000 12
6 F 16.666667 12
This output gives us the average number of insects killed per spray and how many observations are available for each.
Visualizing the Results
|>
insect_spray ggplot(aes(fct_reorder(spray, count), count)) +
geom_col(fill = "dodgerblue4") +
labs(
x = "Spray",
y = "Frequency",
title = "Total Insects Killed per Spray"
+
) coord_flip() +
theme_light(
base_family = "Inter"
+
) theme(
plot.title = element_text(
hjust = .5,
size = 14,
face = "bold"
) )
This bar chart gives a clear view of the total effectiveness of each insecticide. Some sprays–B
,F
, and A
–clearly outperform others.
|>
insect_spray summarise(
.by = spray,
average_count = mean(count)
|>
) ggplot(
aes(fct_reorder(spray, average_count), average_count)
+
) geom_col(fill = "coral3") +
labs(
x = "Spray",
y = "Mean Insect Killed",
title = "Mean insect killed by Sprays"
+
) theme_light(base_family = "Inter") +
coord_flip()
This visualization highlights the average efficacy of each spray. The difference in means provides a strong foundation for statistical modeling.
Modeling Insecticide Effectiveness with Poisson Regression
Since we are dealing with count data, Poisson regression is a natural choice. We’ll fit a Generalized Linear Model (GLM) with a Poisson distribution to examine the differences between spray types.
<- poisson_reg() |>
insect_mod set_mode("regression") |>
set_engine("glm") |>
fit(
~ spray,
count data = insect_spray
)
Interpreting the Model
Let’s extract and interpret the coefficients:
|>
insect_mod extract_fit_engine() |>
tidy()
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.67 0.0758 35.3 1.45e-272
2 sprayB 0.0559 0.106 0.528 5.97e- 1
3 sprayC -1.94 0.214 -9.07 1.18e- 19
4 sprayD -1.08 0.151 -7.18 7.03e- 13
5 sprayE -1.42 0.172 -8.27 1.37e- 16
6 sprayF 0.139 0.104 1.34 1.79e- 1
The output shows the log-mean counts relative to the reference spray (often the first alphabetical level unless changed). A negative coefficient implies lower effectiveness compared to the baseline, while positive values indicate improved performance.
Each coefficient can be exponentiated (exp(coef)) to interpret the relative rate of insect death compared to the baseline.
Conclusion
Through this simple analysis, we’ve:
Explored the insecticide effectiveness visually and numerically
Modeled the count of insects using a Poisson GLM
Identified which sprays perform significantly better than others
This kind of analysis not only helps in selecting the most effective spray but also supports data-driven decision-making in ecological management and agricultural planning.