Thanks for the great reading recommendations, sivaramanb!

]]>Indeed, competing with constant rebalanced portfolios might not be so interesting. Over the past 5-10 years, there has been quite a bit of work on understanding how regret bounds can be extended to infinite sets of experts and strategies, and what is the right way to measure complexity of such a set. For instance, being able to compete with a set of all finite-lookback or finite-memory strategies seems quite interesting.

]]>The ideas are the same. Data compression, universal portfolios, optimal prediction of individual sequences- they are all based on the same concept.

For a universal portfolio the set of “experts” is all possible “constantly rebalanced portfolios” and the best one (in hindsight) can be tracked sequentially ((without hindsight). When I first learned about these portfolios I was more than a little surprised, considering that stock returns are usually not stationary and can do arbitrarily odd things. Then I realized that since the target expert is constant, I would only be charged with the task of tracking the best “idiot”. It is a beautiful theory though.

I’m sure others out there will have better suggestions for applications but in the machine learning literature there are applications where people want to make predictions in real time or have strong reasons to move away from i.i.d assumptions. Also, online convex optimization algorithms are widely applied in “big data” problems. Some theorists who have worked a bit on applications)

http://eprints.pascal-network.org/archive/00004161/01/ShalevThesis07.pdf (section on applications)

ftp://ftp.cs.princeton.edu/reports/2006/766.pdf (for a description of applications in computation bio and portfolio optimization)

Computer scientists have been studying online algorithms from the viewpoint of competitive analysis (instead of regret) for many years. The wikipedia page talks about a few applications (but there are many many more algorithms for paging, queueing, server allocation, routing etc. that have been analysed this way and can be found with a little google searching)

http://en.wikipedia.org/wiki/Competitive_analysis_(online_algorithm)

]]>Hi Christian, Larry,

Christian, your question indeed often comes up. I should add to Sacha’s reply that, even if you don’t add extra information, this stuff can be very useful for actual prediction.

*In fact some of the best current data compression algorithms (the CTW method) is based on exactly the worst-case individual sequence idea that Larry described.*

The reason is that the worst-case individual-sequence regret is in many cases hardly larger than the expected regret , where the expectation is over a Bayesian marginal distribution and the distributions are Bayes optimal.

* This means that the absolute worst-case regret is not so far from the absolute nicest-case regret (under Bayesian assumptions) – a quite surprising fact.*

For example if you predict a sequence of bounded random variables

using the worst-case optimal sequential prediction strategy relative to a k-parameter exponential family with log loss, the regret will be (k/2) log n + O(1).

If you look at the expected regret you get if you assume a smooth (otherwise arbitrary) prior with full support over the parameters and you predict sequentially with the Bayesian predictive distribution under that same prior (which is the optimal prediction strategy you can use here) , you get (k/2) log n plus a smaller constant – the difference is quite small for most priors people use in practice. For Jeffreys prior the constant even goes to 0 for large n.

I don’t actually have good recommendations for

examples. Perhaps someone else does?

–LW