Gittins index for simple family of markov bandit processes with switching cost and no discounting

M. P. Savelov

Результат исследования: Научные публикации в периодических изданияхстатья


We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.

Язык оригиналаанглийский
Страницы (с-по)355-364
Число страниц10
ЖурналTheory of Probability and its Applications
Номер выпуска3
СостояниеОпубликовано - 1 янв 2019