学术报告
王海鹰:Subsampling for Rare Events Data and maximum sampled conditional likelihood

 

Academy of Mathematics and Systems Science, CAS
Colloquia & Seminars

Speaker:

王海鹰,康涅狄格大学

Inviter:  
Title:
Subsampling for Rare Events Data and maximum sampled conditional likelihood
Time & Venue:
2022.11.11 09:00-10:30 腾讯会议:890934838
Abstract:

In this talk, we show that the available information about unknown parameters in rare events data is only tied to the relatively small number of cases, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. To maintain more information, we derive an optimal sampling probability for the inverse probability weighted (IPW) estimator. We further we propose a likelihood-based estimator to further improve the estimation efficiency, and show that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. We validate our approach on simulated data, the MNIST data, and a real click-through rate dataset with more than 0.3 trillion instances.