| treatment | control | ratio | ratio2 | ratio3 | size | size25 | size50 | size100 | sizeno | ask | askd1 | askd2 | askd3 | ask1 | ask2 | ask3 | amount | gave | amountchange | hpa | ltmedmra | freq | years | year5 | mrm2 | dormant | female | couple | state50one | nonlit | cases | statecnt | stateresponse | stateresponset | stateresponsec | stateresponsetminc | perbush | close25 | red0 | blue0 | redcty | bluecty | pwhite | pblack | page18_39 | ave_hh_sz | median_hhincome | powner | psch_atlstba | pop_propurban |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
A Replication of Karlan and List (2007)
Introduction
Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the American Economic Review in 2007. The article and supporting data are available from the AEA website and from Innovations for Poverty Action as part of Harvard’s Dataverse.
The experimental design varied the matching ratios, maximum amount of leadership donation matching, and suggested donation amounts. The results revealed that the presence of a matching grant notably increased both the revenues per solicitation by 19% and the probability of an individual donating by 22%. The researchers concluded that the presence of a match likely serves as a quality signal or timing signal that can effectiely increase donations, especialy among demographics and regions more responsive to such cues. Their finding suggested practical implications for the design of fundraising campaigns and provide avenues for further theoretical research on charitable givings.
This project seeks to replicate their results.
Data
Description
The dataset contains 50,083 entries and 51 columns, each with varying data types.
Here are some key points:
treatmentandcontrolcolumns separate the groups for experimentation.- The ratio variables (
ratio,ratio2,ratio3), size variables (size25,size50,size100,sizeno), and ask variables (ask,askd1,askd2,askd3,ask1,ask2,ask3), refer to the different experimental conditions. - The
amountandamountchangecolumns measure impact. - The
years,female,couple,nonlit,cases, and geographic variables likestatecnt,stateresponse, provide demographic and location-based contextual data.
Balance Test
As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.
T-Test
Comparing the means of variables like mrm2 for the treatment and control groups allows us to see if there’s any obvious statistically significantly different groups at a 95% confidence level.
t_test_treatment_control, t_test_control(TtestResult(statistic=0.1194921058159193, pvalue=0.9048859731777738, df=50080.0),
TtestResult(statistic=0.0, pvalue=1.0, df=33372.0))
A quick pass at a variable last ‘month since last donation’ tells us there is no significant different between treatment and control groups for this value. Both samples are similar enough.
Below I tested the dormant variable leading to similar results, pointing to an unbaised dataset.
t_test_dormant_treatment_controlTtestResult(statistic=0.17388504815227449, pvalue=0.8619565062750842, df=50081.0)
Regression Coefficients
Explanatory variables used:
femalecouplepwhitepblackpage18_39ave_hh_szmedian_hhincomepownerpsch_atlstbapop_propurban
lin_reg_dormant.summary()Linear regression (OLS)
Data : Treatment Group
Response variable : dormant
Explanatory variables: female, couple, pwhite, pblack, page18_39, ave_hh_sz, median_hhincome, powner, psch_atlstba, pop_propurban
Null hyp.: the effect of x on dormant is zero
Alt. hyp.: the effect of x on dormant is not zero
coefficient std.error t.value p.value
Intercept 0.552 0.053 10.411 < .001 ***
female 0.001 0.006 0.105 0.916
couple -0.051 0.010 -5.141 < .001 ***
pwhite -0.029 0.041 -0.711 0.477
pblack 0.049 0.040 1.225 0.221
page18_39 0.081 0.046 1.762 0.078 .
ave_hh_sz -0.002 0.013 -0.118 0.906
median_hhincome 0.000 0.000 1.939 0.052 .
powner -0.019 0.033 -0.572 0.568
psch_atlstba -0.045 0.033 -1.362 0.173
pop_propurban -0.032 0.013 -2.503 0.012 *
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-squared: 0.002, Adjusted R-squared: 0.001
F-statistic: 5.652 df(10, 31296), p.value < 0.001
Nr obs: 31,307
lin_reg_dormant.plot("vimp")
lin_reg_dormant_2.summary()Linear regression (OLS)
Data : Control Group
Response variable : dormant
Explanatory variables: female, couple, pwhite, pblack, page18_39, ave_hh_sz, median_hhincome, powner, psch_atlstba, pop_propurban
Null hyp.: the effect of x on dormant is zero
Alt. hyp.: the effect of x on dormant is not zero
coefficient std.error t.value p.value
Intercept 0.453 0.075 6.073 < .001 ***
female -0.008 0.009 -0.862 0.389
couple -0.039 0.014 -2.743 0.006 **
pwhite 0.160 0.057 2.780 0.005 **
pblack 0.142 0.056 2.546 0.011 *
page18_39 -0.001 0.065 -0.011 0.991
ave_hh_sz 0.012 0.019 0.637 0.524
median_hhincome 0.000 0.000 1.432 0.152
powner -0.133 0.047 -2.857 0.004 **
psch_atlstba -0.070 0.047 -1.493 0.135
pop_propurban -0.016 0.018 -0.882 0.378
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-squared: 0.002, Adjusted R-squared: 0.001
F-statistic: 2.579 df(10, 15634), p.value 0.004
Nr obs: 15,645
lin_reg_dormant_2.plot("vimp")
The significance of the coefficients and their changes between the two groups (treatment and control of the same variable) carry important implications for understanding the dynamics of how treaments affects various populations. Primarily the variable means are consistent between the groups, but impacts the them differently.
Experimental Results
Charitable Contribution Made
First, I analyze whether matched donations lead to an increased response rate of making a donation.

(TtestResult(statistic=3.101361000543946, pvalue=0.0019274025949016988, df=50081.0),
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: gave R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 9.618
Date: Tue, 23 Apr 2024 Prob (F-statistic): 0.00193
Time: 10:30:46 Log-Likelihood: 26630.
No. Observations: 50083 AIC: -5.326e+04
Df Residuals: 50081 BIC: -5.324e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0179 0.001 16.225 0.000 0.016 0.020
treatment 0.0042 0.001 3.101 0.002 0.002 0.007
==============================================================================
Omnibus: 59814.280 Durbin-Watson: 2.005
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4317152.727
Skew: 6.740 Prob(JB): 0.00
Kurtosis: 46.440 Cond. No. 3.23
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
""")
Statistical Findings:
T-test: The t-test results show a statistically significant difference with a p-value of approximately 0.0019. This indicates that the observed difference in donation rates between the treatment and control groups is unlikely to be due to chance.
Linear Regression: The regression analysis confirms this finding. The coefficient for the treatment variable is positive (approximately 0.0042) and significant at the 0.002 level. This means that being in the treatment group increases the likelihood of donating by about 0.42 percentage points, holding other factors constant.
Interpretation in Context of the Experiment: The statistical tests confirm that the treatment, which typically involves some form of intervention such as an enhanced fundraising appeal or incentive (like matching donations), has a positive effect on the probability of making a donation. This finding suggests that interventions designed to make giving more appealing or rewarding can indeed increase charitable contributions.
Implications for Human Behavior: This outcome reveals insights into human behavior, especially in the context of philanthropy. The effectiveness of the treatment suggests that individuals are responsive to incentives or enhancements in the solicitation process. Essentially, when people perceive that their contributions will have a greater impact (such as through matching), they are more likely to contribute. This aligns with broader behavioral economics principles, which assert that people’s actions are often influenced by contextual cues and perceived benefits.
These results underline the importance of strategically designed fundraising campaigns that leverage psychological and economic incentives to boost charitable giving. By understanding and implementing what motivates people to give, nonprofits can more effectively mobilize resources to address various social issues.
Optimization terminated successfully.
Current function value: 0.301543
Iterations 7
| Dep. Variable: | gave | No. Observations: | 50083 |
| Model: | Probit | Df Residuals: | 50082 |
| Method: | MLE | Df Model: | 0 |
| Date: | Tue, 23 Apr 2024 | Pseudo R-squ.: | -1.999 |
| Time: | 10:30:46 | Log-Likelihood: | -15102. |
| converged: | True | LL-Null: | -5035.4 |
| Covariance Type: | nonrobust | LLR p-value: | nan |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
| treatment | -2.0134 | 0.015 | -131.734 | 0.000 | -2.043 | -1.983 |
Comparison to Table 3, Column 1:
The paper reports that the treatment has a statistically significant effect on the likelihood of donating.
The results obtained here match in terms of significance and direction, even if the exact coefficients might differ due to the differences in the model specification (such as inclusion of constants and other controls that were not replicated exactly in this model)
This model suggests that assignment to the treatment does increase the likelihood of making a donation, aligning with findings in the paper.
Differences between Match Rates
Next, I assess the effectiveness of different sizes of matched donations on the response rate.
(TtestResult(statistic=-0.96504713432247, pvalue=0.33453168549723933, df=22265.0),
TtestResult(statistic=-1.0150255853798622, pvalue=0.3101046637086672, df=22260.0),
TtestResult(statistic=-0.05011583793874515, pvalue=0.9600305283739325, df=22261.0))
- 1:1 vs. 2:1 Match Ratio:
- Statistic: -0.965
- P-value: 0.335
- Interpretation: There is no statistically significant difference in donation rates between the 1:1 and 2:1 match ratios.
- 1:1 vs. 3:1 Match Ratio:
- Statistic: -1.015
- P-value: 0.310
- Interpretation: There is no statistically significant difference in donation rates between the 1:1 and 3:1 match ratios.
- 2:1 vs. 3:1 Match Ratio:
- Statistic: -0.050
- P-value: 0.960
- Interpretation: There is no statistically significant difference in donation rates between the 2:1 and 3:1 match ratios.
Interpretation: The authors comment that the “figures suggest” certain outcomes regarding the match ratios, likely pointing towards expectations of different match ratios having varying effects on donation behavior. However, the t-test results indicate that there is no statistically significant difference in the likelihood of donating between any of the match ratios tested (1:1, 2:1, 3:1).
This suggests that increasing the match ratio, within the ranges tested, does not significantly influence the decision to donate in this particular data set and experimental setup. This finding is important because it challenges the assumption that simply increasing the match ratio will lead to higher donation rates, and supports a more nuanced view of how incentives impact charitable giving. It suggests that other factors beyond the match ratio might play more significant roles in influencing donor behavior.
OLS Regression Results
==============================================================================
Dep. Variable: gave R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 3.665
Date: Tue, 23 Apr 2024 Prob (F-statistic): 0.0118
Time: 10:30:46 Log-Likelihood: 26630.
No. Observations: 50083 AIC: -5.325e+04
Df Residuals: 50079 BIC: -5.322e+04
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0207 0.001 15.398 0.000 0.018 0.023
2 0.0019 0.002 0.989 0.323 -0.002 0.006
3 0.0020 0.002 1.041 0.298 -0.002 0.006
Control -0.0029 0.002 -1.661 0.097 -0.006 0.001
==============================================================================
Omnibus: 59812.754 Durbin-Watson: 2.005
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4316693.217
Skew: 6.740 Prob(JB): 0.00
Kurtosis: 46.438 Cond. No. 5.09
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
- Effectiveness of Match Ratios:
- The coefficients for the 2:1 and 3:1 match ratios (labeled as ‘2’ and ‘3’ in your regression) are not statistically significant. This indicates that, within this sample and under the conditions studied, increasing the match ratio from the baseline (possibly the 1:1 ratio or no match scenario) did not significantly increase the likelihood of donations.
- This suggests that the psychological or motivational impact of these match ratios on donation behavior may be less pronounced than hypothesized or varies based on other unaccounted factors such as the demographic characteristics of donors, the context of the donation appeal, or the specific charitable cause.
- Statistical Precision and Significance:
- The p-values associated with the 2:1 and 3:1 match ratios being above conventional significance levels (0.05) imply that the results could be due to random chance rather than a true effect of the match ratios.
- The lack of significant findings here contrasts with common fundraising strategies that assume higher match ratios will universally boost donation rates. This could indicate that other elements of the campaign or external factors have more influence on the donation decision than the match ratio alone.
- Model Fit and Reliability:
- The very low R-squared value suggests that the model does not explain much of the variance in donation behavior, indicating that other variables not included in the model might be influencing whether individuals decide to donate.
- This points to a potential oversimplification in the model or the need to explore other factors that impact donation behavior, such as personal connection to the cause, previous donation behavior, economic conditions, or the manner in which the donation appeal is made.
- Implications for Future Research and Practice:
- These results suggest that fundraisers and researchers should consider a broader range of factors when designing donation strategies and studies. It may not be sufficient to adjust the match ratio; instead, understanding the donor audience and tailoring the message might be more effective.
- Further research could explore combinations of strategies, such as varying the communication style, the visibility of donation impacts, or combining match offers with other incentives.
From Data
Difference in donation rates between 1:1 and 2:1 ratios: 0.0018842510217149944
Difference in donation rates between 2:1 and 3:1 ratios: 0.00010002398025293902
From Fitted Coefficients
OLS Regression Results
==============================================================================
Dep. Variable: gave R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 4.117
Date: Tue, 23 Apr 2024 Prob (F-statistic): 0.0163
Time: 10:30:46 Log-Likelihood: 26629.
No. Observations: 50083 AIC: -5.325e+04
Df Residuals: 50080 BIC: -5.323e+04
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0190 0.001 22.306 0.000 0.017 0.021
ratio2 0.0036 0.002 2.269 0.023 0.000 0.007
ratio3 0.0037 0.002 2.332 0.020 0.001 0.007
==============================================================================
Omnibus: 59815.856 Durbin-Watson: 2.005
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4317637.927
Skew: 6.741 Prob(JB): 0.00
Kurtosis: 46.443 Cond. No. 3.16
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Difference in coefficients between 2:1 and 3:1 ratios: 0.00010002398025543442
Regression Results Overview
Dependent Variable (
gave): Indicates whether a donation was made.Coefficients:
const(0.0190): Baseline probability of making a donation, likely reflecting the control group or a 1:1 match ratio group.ratio2(0.0036): Increases the likelihood of donating by 0.36 percentage points, significant at the 0.023 level.ratio3(0.0037): Increases the likelihood of donating by 0.37 percentage points, significant at the 0.020 level.
Statistical Significance
ratio2andratio3are statistically significant, indicating that higher match ratios positively affect donation likelihood.
Comparing Match Ratios
- The difference between
ratio2andratio3is 0.0001, suggesting no significant benefit in increasing the match ratio from 2:1 to 3:1.
Implications for Fundraising Strategy
Both match ratios increase donation probabilities similarly; hence, moving from a 2:1 to a 3:1 match ratio does not yield significantly higher donations. This suggests a diminishing return on higher match ratios, guiding organizations to potentially optimize fundraising efforts without increasing the match offer.
Match incentives effectively increase donations, but overly generous matches may not provide additional benefits, supporting cost-effective fundraising strategies.
Size of Charitable Contribution
In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.
T-Statistic: 1.9182618934467577
P-Value: 0.05508566528918335
Interpretation: The initial analysis shows a T-Statistic close to 2, suggesting a moderate difference in donation amounts between the treatment and control groups across all individuals, including non-donors. The p-value is slightly above the conventional threshold of 0.05, indicating that this difference is not statistically significant at the 5% level. This result suggests that while there is a trend towards higher donations in the treatment group, we cannot confidently assert that the treatment has a statistically significant impact on donation amounts when considering the entire sample.
T-Statistic: -0.5846089794983359
P-Value: 0.5590471865673547
Interpretation: When focusing only on individuals who made a donation, the T-Statistic is negative, indicating that the average donation amount in the treatment group might actually be lower than in the control group among donors, although the difference is small. The p-value is well above 0.05, suggesting that this observed difference is not statistically significant. This result indicates that among those who chose to donate, the treatment does not significantly influence the amount donated, implying that other factors may play a more critical role in determining donation size among this group.
Both analyses suggest that the treatment—while potentially affecting the decision to donate when considering the full dataset—does not have a significant impact on the amount donated, particularly among those who have already decided to donate. These findings underscore the complexity of donor behavior and suggest that the treatment may not be as effective in increasing donation amounts as it might be in influencing the decision to donate.

Regression Results Overview
Dependent Variable (
gave): Indicates whether a donation was made.Coefficients:
const(0.0190): Baseline probability of making a donation, likely reflecting the control group or a 1:1 match ratio group.ratio2(0.0036): Increases the likelihood of donating by 0.36 percentage points, significant at the 0.023 level.ratio3(0.0037): Increases the likelihood of donating by 0.37 percentage points, significant at the 0.020 level.
Statistical Significance
ratio2andratio3are statistically significant, indicating that higher match ratios positively affect donation likelihood.
Comparing Match Ratios
- The difference between
ratio2andratio3is 0.0001, suggesting no significant benefit in increasing the match ratio from 2:1 to 3:1.
Implications for Fundraising Strategy
Both match ratios increase donation probabilities similarly; hence, moving from a 2:1 to a 3:1 match ratio does not yield significantly higher donations. This suggests a diminishing return on higher match ratios, guiding organizations to potentially optimize fundraising efforts without increasing the match offer.
Match incentives effectively increase donations, but overly generous matches may not provide additional benefits, supporting cost-effective fundraising strategies.
Simulation Experiment
As a reminder of how the t-statistic “works,” in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem. Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made. Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.
Law of Large Numbers

- The plot indicates that, as the number of simulations increases, the cumulative average of the difference fluctuates around the true difference but does not consistently converge to the exact value of 0.004. This could be due to randomness inherent in the simulation of Bernoulli trials.
- Despite the fluctuations, the cumulative average does seem to stabilize as the number of simulations grows, which is consistent with the Law of Large Numbers. This law states that as the number of trials increases, the sample mean will get closer to the expected value.
- However, given that the cumulative average does not settle precisely on the true difference but hovers around it, this illustrates the role of variability when working with probabilities and random processes. The Central Limit Theorem would predict that the distribution of the sample means (if we repeated this entire process many times) would form a normal distribution centered around the true difference, with the variance of that distribution decreasing with more trials.
Cumulative verages approaching true difference in means The output is expected in the sense that it reflects the behavior described by the Law of Large Numbers, where the sample average will tend to get closer to the population mean with a large number of trials. Overall, the plot demonstrates that the cumulative average of the differences between the treatment and control probabilities does trend towards the true difference, validating the Law of Large Numbers.
Central Limit Theorem

The histograms depict the distribution of average differences in donation probabilities between treatment and control groups across different sample sizes, demonstrating the Central Limit Theorem (CLT).
Interpretation within the Study Context:
Sample Size: 50
- The distribution is somewhat bell-shaped but shows considerable spread and variability around the true mean difference. The red dashed line, which represents the true mean difference of 0.004, does not appear to be in the exact center of the distribution, indicating the impact of higher variability at lower sample sizes.
Sample Size: 200
- The distribution becomes more bell-shaped and starts to center around the true mean difference. The variability is reduced compared to a sample size of 50, which is expected as per the CLT.
Sample Size: 500
- Further narrowing and centering around the true difference are observed, with the distribution taking a more definitive normal shape.
Sample Size: 1000
- The histogram for a sample size of 1000 shows the distribution closely centered around the true mean difference, with even less variability, demonstrating the CLT’s prediction of a normal distribution as sample size increases.
Central Limit Theorem Validation:
Across all histograms, zero is not in the exact center because the true mean difference is not zero; it’s 0.004. The green dashed line, which would represent zero, is clearly not in the center, especially for larger sample sizes. Instead, it falls within the left tail of the distributions, particularly for sample sizes of 500 and 1000, where the bell shape is more apparent.
As sample sizes increase, the distribution of the average differences increasingly conforms to a normal distribution centered around the true difference (0.004), not zero, validating the CLT.
The histograms support the CLT’s assertion that with larger sample sizes, the sampling distribution of the mean will approximate a normal distribution centered around the true population mean. Zero is not in the center but rather in the tail of the distribution, as the treatment and control groups’ donation probabilities differ by 0.004.
In the context of the study, these results imply that with sufficient sample sizes, any difference in means due to random chance (sampling variability) will average out, and we can expect the sample mean to reliably estimate the population mean difference. This is critical for replication analysis, as it underlines the importance of sample size in detecting true effects and ensuring that findings are not artifacts of random variation. The CLT allows researchers to make inferences about population parameters based on sample statistics, a foundational concept in hypothesis testing and confidence interval estimation.