How to develop an ETF trading system that works
Technically speaking, a multi-asset 1/N equities portfolio with uncorrelated assets and periodic re-balancing is “system trading”. It’s just a set of rules or a methodology that one follows mechanically to make investment adjustments with the typical goal of increasing investment returns and decreasing the variability of the returns. But usually the term “system trading” refers to more complex, and typically computerized, trading algorithms. And that’s what I want to talk about: sophisticated systems which require some mathematical programming skills to implement.
Parameter estimation error
The biggest problem associated with system trading is called “curve fitting” or “over fitting”. Trading systems typically consist of a model or a set of rules which have a number of free parameters. These parameters are usually estimated using historical data. If there are too many parameters and/or too little historical data, the model will read too much into the historical data and will be inaccurate on new data – sometimes extremely inaccurate. And this problem isn’t unique to just highly sophisticated models with lots of parameters. It can occur with even the simplest models.
When you get the itch to optimize a trading system’s parameters, think about this example:
For example, pretend it’s January of the year 2000, and we want to build a simple asset allocation portfolio, which we will periodically re-balance, consisting of two low correlation assets: stocks and bonds. We have only one parameter to estimate: the proportion of our portfolio invested in stocks. We pull up 20 years of historical data and, using some trial and error, we estimate that the optimal value of our parameter is around 0.9 or 90%. It’s easy to see why: from 1980 to 2000, stocks pretty much monotonically increased in value – greatly outperforming bonds. Confident that we have a robust portfolio based on twenty years of historical data, and likely to return 30 or 40% this year, we put nearly all our funds into stocks. Two and a half years later, our portfolio will have lost some 40% of its value in the dot com crash.
What happened? Simple. We over-fit a model with only one parameter even though we used twenty years of data. Imagine what could happen if we used a sophisticated model with 5 parameters fit to 10 years of data. We could lose some real money.
The failure of the model on new data was caused by something called “parameter estimation error”. We estimated our parameter based on limited historical data which turned out to have no predictive value for what was coming. We would have been better off assuming that we couldn’t predict the future and used the naive 1/N rule, or 50% stocks and 50% bonds. Unfortunately, if we had back tested such a system in January 2000, we would have been unimpressed by the results. The strategy of buy and hold, which is what most people compare their trading strategies to, would have greatly outperformed our naive strategy. Any investment advisor using such a 50% bonds strategy back in 2000 probably would have been considered a loser who missed the bull market and wouldn’t admit it. Perhaps at this point you have more appreciation for advisors who focus on absolute returns rather than relative returns.
How to tell if a system is vulnerable to estimation errors
If you are contemplating buying, building or subscribing to a trading system, the only way to really know if it’s vulnerable is to understand the model it was built on, how sensitive the system is to parameter values and how they were chosen, how much and the quality of the historical data the system was tested on, and how it has performed on new, previously unseen data since it was built. The best case is a system that has very few, naively chosen, parameters to which it is insensitive to, and tested over a wide variety of economic and profit cycles, crashes, manias, etc, before and after it was developed.
The developers of solid, non-optimized (non-trivial) systems have a number of problems: 1) if they reveal enough information to assure their more sophisticated customers that their system is robust, they practically give the strategy away, 2) non-optimized system performance will always be worse than a curve-fit system, putting them at a marketing disadvantage because, 3) the amount of historical data available is limited, and 4) it may take many years to demonstrate, with any certainty, that a system performs as well on unseen data as it did in back-testing, and 5) customers tend to ignore boring, non-optimized systems and go after shiny new systems that bolt upwards right out of the gate.
Even if you do get access to all this information, you still need to have enough knowledge to properly assess the system and to ask intelligent questions.
Example of a system with a minimum of estimation error
As an illustrative example, I will introduce you to a multi-asset portfolio system that adjusts its allocations weekly and is based a the four asset 1/N portfolio (using short term treasuries, long term government bonds, stocks, and gold) but use a little bit of Machine Learning to tweak the allocations slightly to follow the trends.
But only a little. If stocks are in a strong trend upwards, I want to be slightly over exposed to them. If gold trends monotonically down in the future, I want my portfolio to be slightly underweight gold.
But to do this I’ll need to add another asset to the portfolio which I am pretty confident will tend to go up in value during a crisis. I want to do this because over-weighting cash will not offset loses in the other asset classes during a financial panic as the system is essentially constrained to make only minor adjustments to the 1/N allocations. For this example the asset I’ll use is an inverse S&P 500 fund. This introduces a serous drag on this portfolio, essentially canceling out stocks, so I’ll simulate the use of leveraged (2x) long bond funds and stock index funds to overcome some of this friction. Secondly, I’ll need a specific technique to do the tweaking. For this I’ll use something called Direct Reinforcement Learning .
Advertisement – Article continues below
Though Direct Reinforcement Learning requires a lot of parameters, methods exist which can reduce the effective number of parameters for this type of system. Instead of optimizing the parameter values, I’ll use the average of the recommendations of the system over a wide range of parameter values. For example, if I have three parameters, a, b, and c, and we give each a range of 0 to 1 and an increment of 0.1, then I could average the recommendation of the system over all 113 combinations: [0,0,0], [0.1,0,0], … [1,1,0.9], [1,1,1]. Hopefully the system will be relatively insensitive to some parameters and I can just use a representative value or just a few widely spaced values to approximate using the full range and save a great deal of computational time.
Most systems will inevitably perform much better over a tighter range, say, for example, from 0.2 to 0.4. But remember the curve fit example at the beginning of this article when you observe this and are tempted to tighten the range. The example would also have performed much better in back-testing at a tighter range, say 0.8 to 1.0, than it would over the full range of 0 to 1. (You’ll note in this particular case that averaging the recommendations over the full range from 0 to 1 is just the recommendation at the middle, 0.5).
The graphic below shows the back-testing results of this system (the black line). The foundation of the system, the 1/N base portfolio (aka the permanent portfolio), is the blue line. As you can see, except under exceptional circumstances, like the long bull market from 1997 to 2000, or the 2008 financial system crash, the system is pretty much tracks the permanent portfolio, which is supporting evidence that we haven’t tweaked it too much.
Click the graphic for a larger picture:
This system has an average yearly ROI of about 11%, a maximum (end of week) draw down of 9% and Sharpe ratio of about 1.3 over the period of September 1995 through the middle of April, 2012, based on weekly data. By comparison, the permanent portfolio has an average yearly ROI of about 8%, a maximum draw down of 18% and a Sharpe ratio of about 1.0.
- If you haven’t had much programming experience, I suggest you use the programming language R. It’s free, relatively easy to learn and there are lots of free resources available on the web. Search for “programming R”.
- Download the papers below and duplicate Fig. 2 in the second paper, which is a single artificial asset trend following exercise.
- Duplicate Fig. 10 in the first paper, which is a portfolio with three artificial assets.
- Modify your portfolio system to average over a wide range of parameter values if you haven’t already.
- Replace the artificial assets with weekly data from the funds “VFINX”, “VUSTX”, “TWUSX”, “CEF”, and “RYURX”. These are easily downloaded from Yahoo finance. You’ll need to write a function to double the returns of the first two funds to simulate 2x leverage.
- I used training periods ranging from 50 to 75 weeks, 10 epochs for training, regularization factors ranging from 0.1 to 0.5 and I averaged the results over a rho vs. eta matrix with values ranging from 0.01 to 0.1. I used a softmax output with a=2 (see equation 5 in the first paper).
- Feed the system with data up to the year 2000 and optimize the parameters to this data. Then test it on data from 2000 to present.
- Determine how sensitive the system is to increasing transaction costs.
- Try using other mutual funds or ETFs.
- Try simulating the use of higher leverage (x3) funds. How much leverage can the system handle?
- Check out other approaches, for example, this.
 J. Moody, et al, Performance Functions and Reinforcement Learning for Trading Systems and Portfolios, Journal of Forecasting, Volume 17, Pages 441-470, 1998 [pdf]
 J. Moody, M. Saffell, Learning to Trade via Direct Reinforcement, IEEE Transactions on Neural Networks, V. 12, No. 4, July 2001 [pdf]