Issue 7 » Data Explosion

## Mathematics and Statistics in Finance

Prof. David Hand, Imperial College London

Finance has always been built on mathematics. For example, in his book Summa de Arithmetica, Luca Pacioli, born in 1445, described the double-entry bookkeeping system through which merchants have controlled their businesses over the past half-millennium. Incidentally, Pacioli was a personal friend of Leonardo da Vinci – they once shared a house in Florence and drawings by da Vinci appear in Pacioli’s book Divina Proportione.

Mathematical models for finance evolved radically from about 1900, when Bachelier applied Brownian motion as an underlying process to derive option prices. Since then the use of probability theory and partial differential equations has boomed, to the extent that Merton and Scholes were awarded a Nobel Prize in 1997 for the contribution of mathematical finance to global economics.

Today banking systems, credit card operations, pension funds, and insurance companies amongst many others rely on such models to make decisions about all manner of financial transactions. As has been highlighted in recent years, the risk is that, while those developing these sophisticated models may understand them perfectly, those basing their decisions on them may not.

The effectiveness
of work in this
area hinges on
how accurate the
models are, and
how effectively
they are used

Nowadays, both finance and mathematics have grown into vast edifices, but they continue to be closely interwoven. Mathematics – or, more accurately, the mathematical disciplines – have a diverse, and it is no exaggeration to say fundamental, role to play in many quite different aspects of finance. This article singles out just three aspects: (i) so-called ‘mathematical finance’ (ii) hedge funds, and (iii) retail financial services.

Rocket Science
The first of these, mathematical finance, is concerned with building financial instruments for investment, and controlling risks: designing such things as financial options and derivatives. Through such instruments merchants can protect themselves from risk (e.g. by agreeing to buy raw materials in the future at a specified price, so protecting themselves from unpredictable price changes), and traders can seek to make money (e.g. if the price of the raw material in fact falls over the intervening period, by buying at the new reduced price and simultaneously selling at the agreed price).

This area uses advanced mathematics such as stochastic differential calculus and measure theory. It has been described as one of the few areas where cutting-edge mathematical research has a direct and immediate impact in a practical area. The nature of the mathematics involved in this area explains why it attracts physicists (the same equations describe the evolution of some physical systems and some financial systems), and hence why the phrase ‘rocket science’ has sometimes been attached to it.

A characteristic of this area is that it has, in the main, been built on economic models of the financial system. The efficient market hypothesis is important in this area. This says that it is impossible to produce a trading strategy (a strategy for buying and selling financial products) which outperforms the market (except randomly) because any information indicating that a particular stock is over or under-valued is already taken account of in the stock’s price. Despite its importance in some areas, the efficient market hypothesis is not an uncontroversial idea, and some people do seem able to outperform the market with a consistency beyond that suggested by chance. But it is clearly almost true – which is why outperforming the market is so hard. A classic book in this area (and there are a great many) is Hull's "Options, Futures, and Other Derivatives"1.

The effectiveness of work in this area hinges on how accurate the models are, and how effectively they are used. In particular, if they are based on shaky premises then there are clearly unquantified risks. The well-known collapse of Long Term Capital Management2 arose as a consequence of a failure to take into account the fact that perhaps there would not always be buyers and sellers willing to take the other side of a trade. The sub-prime crisis was based on perfectly sound mathematical models but if, despite the fact that your model tells you an applicant for a mortgage has a high risk of defaulting on the repayments, you go ahead and make the loan, you cannot really blame the mathematics.

The Turner report ... chooses its
words carefully: it does not ask
‘was the mathematics wrong?’, but
on sophisticated mathematics?’

Furthermore, the models in this area are highly, and increasingly, sophisticated. There is a real risk that while those developing the models may understand them perfectly, higher management may not, with obvious dangers. By way of illustration, the Turner report3, which describes the events leading up to the recent financial crisis, chooses its words carefully: it does not ask ‘was the mathematics wrong?’, but asks if there was ‘misplaced reliance on sophisticated mathematics?’  That is, did those who bore the ultimate responsibility for deciding to use the models really know what they were doing? These aspects are discussed in Hand (2009)4, but to put things into context, the excellent and informed review by Reinhart and Rogoff (2009)5 examines the history of financial crashes over several centuries (with some wonderful data!), from long before sophisticated mathematical models were available.

The second area listed above was that of hedge funds. In fact, the phrase ‘hedge fund’ is not a well-defined concept, as it covers a very wide range of quite distinct kinds of trading activity. Some base their trading strategies on perceived fundamental value of the stocks being traded, others use subjective views on how the market is behaving, others are based on objective statistical models of price time series, and so on. Investors in hedge funds are constrained by regulation, but typically include pension funds, university endowments, and high net worth individuals, but not the public.

Hedge fund managers usually have their own money invested in their funds, so aligning what they do with what their customers want. This need not be the case with banks, where the aims of customers and shareholders might not be aligned. Most hedge funds seek an absolute return, a positive return on investments, regardless of whether the market is going up or down. Hedge funds are a relatively recent development, beginning in about the 1960s. An entertaining history of hedge funds is given by Mallaby (2010)6.

One class of hedge funds is based on ‘systematic trading’: the use of mathematical and statistical models to predict market behaviour and make trading decisions with little or no human intervention. In contrast to the models of the financial mathematicians, described above, the models used in these organisations are empirical models, based on intensive statistical analysis of past behaviour of the financial markets. Some commentators have expressed concerns that if many hedge funds adopt similar strategies, then the correlation between the way they behave will introduce instabilities into financial markets – and have attributed the so-called ‘quant quake’ of 2007 to this cause.

Although statistics is often
taught in mathematics
departments, the two
disciplines are rather
different ... one might,
with equal justification
regard statistics as a part
of computer science

You may notice the use of the phrase ‘mathematical and statistical models’ above, and the distinction made between 'mathematics' and ‘the mathematical disciplines’. Although statistics is often taught in mathematics departments, the two disciplines are rather different, and few statisticians nowadays regard their discipline as a part of mathematics. Certainly statistics has mathematics at its base, as do physics and engineering, but statistics is no more a part of mathematics than are those disciplines. In particular, the discipline of statistics has been revolutionised over the past fifty years by the development of computing, so that one might, with equal justification regard statistics as part of computer science.

The key point is that statistics starts with the question and the data, and seeks to apply formal methods of inference to find structures and relationships, and to extract understanding and meaning. Mathematics, in contrast, is fundamentally concerned with deduction about given abstract objects and their relationships. This, in fact, is a commonly made distinction between probabilists and statisticians: the former start with the mathematics, and try to deduce what the data would look like; the latter begin with the data, and try to work out what kind of structure would have generated it. This explains the opening comment of the preface of David Williams’ marvellous book Weighing the Odds7: ‘Probability and Statistics used to be married; then they separated; then they got divorced; now they hardly ever see each other.’

Data Crunching
That small detour serves as an introduction to the third area mentioned at the beginning: the retail financial services sector. Retail banking or consumer banking refers to the sorts of transactions in which you and I engage every day. It covers such things as credit and debit cards, mortgages, car finance, personal insurance, store cards, personal loans, and so on. And it will be immediately obvious that one of the characteristics of this area is that it involves large, even massive, data sets. Many organisations in the sector carry out billions of transactions each year. So, in this area, we are really talking about statistics rather than mathematics.

Statistical algorithms ... far
outweigh anything that a
human could do: they make
more accurate decisions

The aim is to build models of behaviour to answer a variety of questions: will a loan applicant make the scheduled repayments on time; is a credit card owner running into financial difficulties; is that anomalous card transaction evidence of fraud; is that pattern of mortgage applications suggestive of something suspicious? Anderson8 gives an introduction to the area.

Almost universally, the models in this area are empirical models: they are not based on any underlying theory (e.g. from the psychology of behavioural finance) but are entirely data driven. A characteristic will be included in a model if the data analysis shows that it leads to improved prediction, regardless of whether there are theoretical reasons which might lead one to expect it to be predictive.

That last sentence should be slightly qualified. Such a characteristic will be included only if legal restrictions allow it (e.g. the US Equal Credit Opportunity Act of 1974). Typically, anti-discrimination legislation precludes certain characteristics, such as sex, race, colour, and religion, from being included in these so-called credit scorecards.

The models in this area have increased in sophistication since initially introduced in the 1960s, concurrently with the advent of the computer. Initial scepticism about whether statistical algorithms could be as accurate in their decision-making as humans soon gave way to a recognition that they could. Since then, decades of research and refinement, coupled with the growth of massive databases, now means that such systems far outweigh anything that a human could do: they make more accurate decisions.

The models can be very elaborate. They often take the form of logistic regression trees, in which the population is divided into subgroups, with a distinct logistic regression model being built in each segment. They may involve hundreds of variables.

Sex cannot be included
in scorecards because
gender, per se, is
irrelevant to propensity
to repay, and one
would not want to base
decisions on prejudice.
But, in general, women
are lower risks than men

An important area of work in this area is the evaluation of the scorecards. This is because the domain is characterised by so-called population drift: changes in the nature of the population of people applying for or using the financial products arising from changing economic conditions (e.g. making people less willing to take out loans), changing competitive environment (e.g. often because globalisation encourages international competition or because non-finance players, such as supermarkets, enter the financial arena), or changing technology (e.g. the advent of internet banking, and more recently mobile phone cash transfers).

To illustrate just one of the difficulties, imagine deciding that a loan scorecard’s performance has degraded, so that one wishes to construct a new one. The available data will be the descriptive characteristics of those who sought a loan in the past, along with the outcome (e.g. whether they defaulted or not) of those who were previously given a loan. But this is not the population to which the new model will be applied: we wish to calculate a creditworthiness score for all applicants, not merely those that the old method would have accepted. Thus we have a distorted population from which to build a new model. Coping with such selection bias is a non-trivial problem – to the extent that James Heckman was awarded the 2000 Nobel Prize for Economics for his efforts to tackle it.

Another conceptual and methodological challenge in this area is posed by the legal constraints on allowed characteristics mentioned above. This is described in more detail in Hand (2012)9 but the essence of the challenge is as follows, illustrated for the case of a loan. Sex cannot be included in scorecards because gender, per se, is irrelevant to propensity to repay, and one would not want to base decisions on prejudice. But, in general, women are lower risks than men, so that this exclusion means that women are being forced to pay a higher rate than men. Roughly speaking, the solution to the problem is to construct a scorecard using all the predictive characteristics one can find, so that one’s estimate of the probability of failing to repay is the most accurate estimate one can get, but then to base one’s decision on the model excluding the prohibited variables. This leads to a decision which includes that aspect of gender which is predictive of default probability, but excludes all other (irrelevant, prejudiced-based) aspects.

As illustrated above, the range of interaction between the mathematical disciplines and finance is large, spanning a number of different kinds of application domains. What is common, however, is that as time progresses, all of these areas are requiring an ever higher level of mathematical and statistical sophistication. It would be quite impossible to run a banking system, credit card operation, pension fund, insurance company, or other financial operation without mathematical and statistical tools right at the heart of the enterprise.

Prof. David Hand is Head of Statistics in the Department of Mathematics at Imperial College London, with an interest in applications in medicine, psychology, and finance. He is a Fellow of the British Academy and has won various prizes and awards for his research, including the Guy Medal of the Royal Statistical Society and a Royal Society Research Merit Award.

• Posted by Alexei S Ahmed on Jun 19, 2012 3:35 PM

Dear Professor Hand

I found your article very interesting. Have you read ‘My life as a quant – Emanuel Derman’? He was a POW (Physicist on Wall Street) during the nineties.

There are also algorithms based on behavioural responses to market information, data- mining of large volumes of internet text publications are used to guide trading decisions. In fact Bloomberg have completed a method of financial reporting that software can ‘read’ and interpret then execute buy and sell orders.

I myself have one year experience as proprietary trader in a fund just prior to the start of the 2008 financial crises so had quite a privileged opportunity to watch it unfold. I am actually a medical doctor (grad 2006) and with a group of my colleagues have ventured into the world of ‘algo’ trading, designing our own systems based on supply and demand mechanics illustrated by intraday pricing (candlestick) graphs. The current pension issues have spurred an interest from colleagues!

A statistical approach to trading would be a very satisfying avenue to investigate.

Do you know of any students who would be interested in investigating statistical models with our investment group on live markets and real execution environments?

We have a working execution platform linked with multiple brokerages (commodities, equities, options and foreign exchange) programmable in C+ type language.

Thanks and hope to hear from you.

Yours Sincerely

Dr S A Ahmed