Home	News	Comedy	Entertainment	Sports	Music	Stunts	More

Online Markov Decision Processes under Bandit Feedback

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in terms of the total reward received. In each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is a no-regret algorithm. In this paper we propose a new learning algorithm and assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of the new algorithm is O(T^{2/3} (ln T)^{1/3}), giving the first rigorously proved convergence rate result for the problem.

Channel: VideoLectures

Video Length: 0

Date Found: March 28, 2011

Category: Educational

Date Produced: March 25, 2011

View Count: 0

Related Videos

VideoLectures	Hide the Stack: Toward Usable Linked Data July 10, 2011 The explosion in growth of the Web of Linked Data has provided, for the first time, a plethora of information in disparate locations, yet bound together by machine-readable, semantically typed relations. Utilisation of the Web of Data has been, until now, restricted to members of the community, ...

VideoLectures	Business Ethics and Corporate Social Responsibility July 10, 2011 Problems cannot be solved with the mentality that has caused them’. Hence, the 2008- crisis cannot be solved with ethics of one-sided and short-term mentality of the industrial and neoliberal economics, which has caused the ‘Bubble Economy’ of several recent decades. Neither the market nor the ...

VideoLectures	Higher Education in India - An Insider’s View July 10, 2011

VideoLectures	E-Government Core Vocabularies and federation of national semantic assets repositories: the European Commission approach. July 10, 2011

VideoLectures	Improving Categorisation in Social Media using Hyperlinks to Structured Data Sources July 10, 2011 Social media presents unique challenges for topic classification, including the brevity of posts, the informal nature of conversations, and the frequent reliance on external hyperlinks to give context to a conversation. In this paper we investigate the usefulness of these external hyperlinks ...

: advertisement :

Online Markov Decision Processes under Bandit Feedback

Hide the Stack: Toward Usable Linked Data

Business Ethics and Corporate Social Responsibility

Higher Education in India - An Insider’s View

E-Government Core Vocabularies and federation of national semantic assets repositories: the European Commission approach.

Improving Categorisation in Social Media using Hyperlinks to Structured Data Sources