Gittins index for simple family of markov bandit processes with switching cost and no discounting

M. P. Savelov

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We consider the multiarmed bandit problem (the problem of Markov bandits) with switching penalties and no discounting in case when state spaces of all bandits are finite. An optimal strategy should have the largest average reward per unit time on an infinite time horizon. For this problem it is shown that an optimal strategy can be specified by a Gittins index under the natural assumption that the switching penalties are nonnegative.

Original languageEnglish
Pages (from-to)355-364
Number of pages10
JournalTheory of Probability and its Applications
Volume64
Issue number3
DOIs
Publication statusPublished - 1 Jan 2019

Keywords

  • Controlled Markov processes
  • Gittins index
  • Long run average return
  • Markov decision process
  • Multiarmed bandit problem
  • Multicomponent systems
  • No discounting
  • Optimal strategy
  • Simple family of alternative Markov bandit processes
  • Switching penalties

Fingerprint

Dive into the research topics of 'Gittins index for simple family of markov bandit processes with switching cost and no discounting'. Together they form a unique fingerprint.

Cite this