Fast Converging Multi-armed Bandit Optimization using Probabilistic Graphical Model
This paper designs a strategic model used to optimize click-though rates (CTR) for profitable recommendation systems. Approximating a function from samples as a vital step of data prediction is desirable when ground truth is not directly accessible. While interpolation algorithms such as regression and non-kernel SVMs are prevalent in modern machine learning, they are, however, in many cases not proper options for fitting arbitrary functions with no closed-form expression. The major contribution of this paper consists of a semi-parametric graphical model complying with properties of the Gaussian Markov random field (GMRF) to approximate general functions that can be multivariate. Based upon model inference, this paper further investigates several policies commonly used in Bayesian optimization to solve the multi-armed bandit model (MAB) problem. The primary objective is to locate global optimum of an unknown function. In case of recommendation, the proposed algorithm leads to maximum user clicks from rescheduled recommendation policy while maintaining the lowest possible cost. Comparative experiments are conducted among a set of policies. Empirical evaluation suggests that Thompson sampling is the most suitable policy for the proposed algorithm.