SSRN Author: Leonid PekelisLeonid Pekelis SSRN Content
https://www.ssrn.com/author=3042073
https://www.ssrn.com/rss/en-usThu, 13 Dec 2018 01:36:00 GMTeditor@ssrn.com (Editor)Thu, 13 Dec 2018 01:36:00 GMTwebmaster@ssrn.com (WebMaster)SSRN RSS Generator 1.0REVISION: p-Hacking and False Discovery in A/B TestingWe investigate to what extent online A/B experimenters “p-hack” by stopping their experiments based on the p-value of the treatment effect, and how such behavior impacts the value of the experimental results. Our data contains 2,101 commercial experiments in which experimenters can track the magnitude and significance level of the effect every day of the experiment. We use a regression discontinuity design to detect the causal effect of reaching a particular p-value on stopping behavior. <br><br>Experimenters indeed p-hack, at times. Specifically, about 73% of experimenters stop the experiment just when a positive effect reaches 90% confidence. Also, approximately 75% of the effects are truly null. Improper optional stopping increases the false discovery rate (FDR) from 33% to 40% among experiments p-hacked at 90% confidence. Assuming that false discoveries cause experimenters to stop exploring for more effective treatments, we estimate the expected cost of a false ...
https://www.ssrn.com/abstract=3204791
https://www.ssrn.com/1746411.htmlWed, 12 Dec 2018 09:16:49 GMTREVISION: p-Hacking and False Discovery in A/B TestingWe investigate whether online A/B experimenters "p-hack" by stopping their experiments based on the p-value of the treatment effect. Our data contains 2,101 commercial experiments in which experimenters can track the magnitude and significance level of the effect every day of the experiment. We use a regression discontinuity design to detect p-hacking, i.e., the causal effect of reaching a particular p-value on stopping behavior.
Experimenters indeed p-hack, especially for positive effects. Specifically, about 57% of experimenters p-hack when the experiment reaches 90% confidence. Furthermore, approximately 70% of the effects are truly null, and p-hacking increases the false discovery rate (FDR) from 33% to 42% among experiments p-hacked at 90% confidence. Assuming that false discoveries cause experimenters to stop exploring for more effective treatments, we estimate the expected cost of a false discovery to be a loss of 1.95% in lift, which corresponds to the 76th percentile of ...
https://www.ssrn.com/abstract=3204791
https://www.ssrn.com/1707424.htmlTue, 17 Jul 2018 20:00:14 GMT