Differential Machine Learning
The objective of this internal research was to evaluate and gain experience with application of two methods used for pricing and sensitivity analysis of exotic financial derivative instruments, namely, automatic adjoint differentiation (AAD) and deep learning.
The work was inspired by publication  of Danske Bank quantitative analysts Antoine Savine and Brian Huge in which the authors introduced a novel approach to building extremely efficient pricing and risk approximators for arbitrary financial derivative instruments.
Differential machine learning (ML) presented in , combines automatic adjoint differentiation (AAD)  with deep learning to estimate value and risk sensitivities of the financial derivatives.
Differential ML is a kind of supervised learning, where the models are trained on datasets (inputs & labels) augmented with differentials of labels wrt inputs. In the context of financial Derivatives and risk management, pathwise  differentials are efficiently computed with automatic adjoint differentiation (AAD). The pathwise estimator is calculated by interchanging, if it is possible, the order of differentiation and integration.
For example, in the simplest situation in the Black-Scholes framework. While an explicit expression for the option delta is available, we can also estimate it via the pathwise method as follows. We first write the option payoff as
It follows from (2) and (3) that
The estimator (4) is easily calculated via a Monte-Carlo simulation.
Cases selected to evaluate the approach of differential neural networks and estimate performance of pure AAD
- Valuation of Asian options with arithmetic averaging and their greeks (delta and vega) for the model with constant volatility.
- Valuation of Asian options with arithmetic averaging and their greeks (delta and vega) with arbitrary volatility curve.
- Valuation of option written on the basket of correlated stocks. We also estimated gamma, option 2nd derivative.
- Estimation of callable bond and its duration.
- Differential PCA  for large portfolio of correlated instruments (250, 500 and 1000).
- Libor Market Model . Pricing options which payoffs depend on LIBOR rates.
- Pricing of option and its vega for the model of stochastic volatility (SABR).
- Valuation worst-of options and their greeks for the basket of n-correlated instruments.
- ADD and LIBOR market model.
1. Valuation of Asian options and 'greeks' (delta and vega). Asian options are of particular importance for commodity products which have low trading volumes. The terminal payoff depends on some form of averaging of the underlying asset price over a part of or the whole of the option's life. When the option payoff depends on the average of the underlying asset over a time interval, the option tends to be less expensive than its European counterpart. Furthermore, the averaging feature can lessen incentives for market manipulation, and the volatility of an average is lower than the volatility of the underlying asset, thus explaining their usage in risk management. It makes Asian options ideal for use in hedging positions. We consider the case discrete arithmetic Asian call option with payoff
There are no known closed form analytical solutions to arithmetic average Asian options, many numerical methods are applied like Monte Carlo simulation, partial differential equations and moment matching method. In this experiment, we assume volatility as a constant parameter. It is rather simply to construct a pathwise estimator for the delta of this option .
We can get the pathwise estimator of the vega of the Asian option
It is easy to see that differential neutral network gives better accuracy than feed forward net especially for vega.
2. Valuation of Asian options and 'greeks'. In this approach we modeled the volatility by means volatility curve. It is very important for practitioners. We admit different volatilities for different averaging periods. Volatility curve can be extracted from option chain. In this situation we have sensitivity of option value for each volatility. It was found that the greater number of volatilities, the stronger advantage of differential neutral network. Differential neutral network outperforms feed forward net in high dimensional tasks.
3. Valuation of option written on n correlated stocks. We use Multivariate log-normal distribution. We use method to generate correlation matrix in multivariate log-normal distribution with method suggested in . This method produces a positive-semidefinite matrix, it is fast to implement even for large matrices, it allows the determination of a feasible matrix that most closely approximates a target real symmetric (but not positive-semidefinite) matrix. This method proposes for the construction of a valid correlation matrix C = BBT is to view the elements of the row vectors of matrix B as coordinates lying on unit hypersphere. If we denote by bij the elements of the matrix B, the key is to obtain the n x n coordinates bij from n x (n-1) angular coordinates Θij according to
Thanks to the trigonometric relationship and to the requirement that the radius of the unit hypersphere should be equal to one, the main diagonal elements are guaranteed to be unity. We generate price process by means follow stochastic differential equation:
Where: Sit denotes the price Si at time t and (α1,...,αN) are derived by taking the Cholesky decomposition LLT of the 'correlation matrix' and the applying it to N iid standard normal variables (∈1,...,εN).
We estimated gamma, it is second derivative. Gamma is used in option trading, it is important fast speed and high accuracy for this 'greek'. We combine the pathwise and likelihood ratio method for this objective. In contrast to the pathwise method, the likelihood ratio method differentiates a probability density with respect to the parameter of interest, Θ. It provides a good potential alternative to the pathwise method when payoff is not continuous in Θ . We rewrite
is piecewise linear approximation to the function 1(x>K) and that he(x) corrects the approximation, We applied the pathwise estimator to fe(x) and likelihood estimator to he(x). We provide result for 5-dimensional case. It is easy to see that accuracy of differential neutral network is better than accuracy of feed forwarf neutral network.
4. We estimated callable bond and duration. Fast estimation of duration is important for hedging. For interest rate dynamic, we assumed the Bachelier model.
Differential neutral network catches dynamic of duration more accurately.
5. We validated the approach for large portfolio like 250, 500 and 1000 instruments. Directly to learn 'greeks' for large portfolios is highly expensive task from computational point of view. We tried to validate methodology of Differential PCA. Differential PCA remove irrelevant factors and considerably reduce dimension. As a data preparation step, differential PCA may significantly reduce dimension, enabling faster, more reliable training of neural networks. Differential PCA is a useful algorithm on its own right, providing a low dimensional latent representation of data on orthogonal axes of relevance. In our case it dramatically shrunk dimensions, it is known in the case correlated Bachelier model , we just tried to train differential neural network in this case. We learned differential neural network in reduced space. We trained networks on 100K paths for portfolio with 250, 500, 1000 instruments.
In all cases accuracy of 'greeks' for differential machine learning is higher than feed forward net.
6. We considered Libor Market Model by Prof Mike Giles, Mathematical Institute, University of Oxford. The Libor Market Model (LMM), also known as BGM, is a widely used interest rate term structure model. As an extension of the Heath, Jarrow, and Morton (HJM) model on continuous forward rates, the LMM takes market observables as direct inputs to the model. Whereas the HJM model describes the behavior of instantaneous forward rates expressed with continuous compounding. LMM postulates dynamical propagation of the forward Libor rates, which are the floating rates to index the interest rate swap funding legs. LIBOR market model uses forward LIBOR rates as fundamental assets. Let Lit denote the forward LIBOR rate over the time interval [Ti,Ti+1], where Ti,i=0,1,…,N, are LIBOR reset dates. Each forward LIBOR rate has the following dynamics .
where Wit is the Brownian motion for Lit and forward LIBOR rates are allowed to have factor correlations, dWit Wjt=ρijt, ∀i,j. We assume constant volatilities, σit=σ. We also assume all LIBOR rates share the same Brownian factor Wt. The discretized version of forward LIBOR rate then reads:
With equation (14) we can simulate forward LIBOR rates at particular time points and price options whose payoffs depend on LIBOR rates. We build TensorFlow graph of calculation for each scenario in Monte Carlo simulation, the dependence of payoff from initial values of LIBOR forward rates. In this case we consider the case of Caplet payoffs, but it does not matter, we can easily change it to another payoff. Because we build graph of calculation for LIBOR model, then we get gradients for LIBOR model and construct train set to learn differential neutral network. In this case differential Neutral Network outperforms feed forward network. We estimate Caplets values (payoffs).  There are values of them.
There are derivatives of Caplet payoffs.
Let's compare them in terms of loss functions. The neural network has hidden size 64 and the number of hidden layers 6 for both standard and differential neural networks Caplet payoffs from the training dataset were generated with 10 Monte Carlo paths, from the test dataset with 10000 Monte Carlo paths. Experiments were carried out for different strikes, different time to maturity (TTM) and different amounts of forward LIBOR rates. Some examples of the learning curves with multiple restarts on test samples are shown in following graphs.
General conclusion for all graphs: the differential network has a big advantage in terms of Greeks L2 loss on a small data set and has a slight disadvantage in terms of Caplet payoff L2 loss as compared to a standard network. On large data set sizes, the differential network always has an advantage in terms of Greeks L2 loss (significant or not) and has a comparable quality in terms of Caplet payoff L2 loss compared to a standard network. Moreover, quite often the differential network has an advantage in the quality of the predicted Caplet payoffs. It can be assumed that additional information about the gradients helps to give more information to the approximator about the behavior of the function in the neighborhood of points, and this gives an increase in quality even though the weights of this network have to share information both about Caplet payoffs and derivatives. Let's test this hypothesis by increasing the hidden size to 128 and the number of hidden layers to 10 for both networks.
In this case, the differential network shows a significant advantage relative to the nondifferential network, so we can make sure that the derivatives help to get more information about the behavior of the function, and this can be used to train the neural network. It can also be concluded that the use of derivatives differs from the use of additional data not only that the quality of the predicted derivatives is much higher, but also that in order to get an advantage on the values of the Caplet payoffs, it may be necessary to increase the neural network. The reason for this: it is necessary to approximate two loss functions at once.
7. Also, we dealt with SABR model. It is the model of stochastic volatility. The SABR model helps explain the volatility smile better and resolve the problem of unstable hedges.
There are options values.
There are greeks.
The accuracy of differential neural network is higher again.
8. Also, we considered worst-of options. We estimated worst-of options for basket of n correlated stocks. Prices are again generated by means follow stochastic differential equation (10). It is case of 5 stocks. Differential neural network works better than feed forward network.
9. Our aim is to introduce TensorFlow as a tool for sensitivity analysis. And also, TensorFlow is very useful for Monte Carlo simulation in quantitative finance. We demonstrated the efficiency of TensorFlow and GPU to value Caplet payoffs on LIBOR rates and compute their derivatives with respect to starting forward LIBOR rates. The standard technique to estimate 'greeks' or first order derivative is bumping. In the case of Caplet payoffs on LIBOR rates, there are hundreds of parameters, and it requires a lot of time. We use Automated Adjoint Differentiation in TensorFlow. It provides quick estimation for 'greeks'. Adjoint Differentiation is an application of the chain rule for derivatives to compute differentials in constant time. This means that Adjoint Differentiation (AD) computes all the differentials of a scalar function of many variables, in a time like one evaluation of this function, independently on the number of inputs. We implemented pricing using TensorFlow version 1.15. In experiment N correlated assets simulated, calculated prices of N options and their 'greeks'. Forward LIBOR rates were built by methods suggested in . It will be shown that even using low computing power, even labor-intensive tools such as LIBOR can be efficiently calculated. In the experiments, will be used Geforce 1050 TI. For the true Caplet payoffs, we take the values generated with 1.000.000 Monte Carlo paths. The graphs with a logarithmic grid on both axes show good generation quality, and the inverse dependence of the error on the number of Monte Carlo paths.
Let N be the number of LIBOR reset dates and M be the number of Monte Carlo paths. Algorithm performs O(N2M), if sums in exponent are calculated as cumulative sum in O(N), not O(N2). Memory complexity: O(NM+N2).
The following graphs show the generation time for 1000 Caplet values and 1 Caplet value depending on number of Monte Carlo paths.
The execution time for generating the 1 Caplet payoff in a compartment is much longer than the time for generating many Caplet payoffs at once, because the GPU is capable of calculating many Caplet payoffs in parallel. Ratio speaks for itself.
In all the cases selected by us to test the approach , differential neural network provided much better accuracy and convergence rate than feed forward network. It was shown that sensitivities of Asian option in the case of volatility curve are estimated with good accuracy. It was empirically tested that the higher the order of volatility, the stronger advantage of differential neural network. We also estimated second derivative of payoff by mean likelihood method . Fast and accurate estimate of gamma is important for gamma trading. We also successfully applied the approach to instrument portfolios. We reduced dimension of input risk factors via differential PCA . We considered Libor Market Model by Prof Mike Giles . We built TensorFlow calculation graph of one Monte Carlo path for forward LIBOR rates and got gradients and used it as supervisor. It is new example how simply we can resolve very complicated task by means TensorFlow and differential machine learning. The approach could be applicable to this can be useful for any interest rate derivatives . We considered the case of worst-of options. It is very popular option and it is very important accurately pricing this option and greeks.
-  P. S. Hagan, D. Kumar, A. S. Lesniewski, and D. E. Woodward. Managing smile risk. Wilmott Magazine, 1:84–108, 2002.
-  www.nag.com/numeric/gpus/libor_example_nag.pdf
-  M. Giles and P. Glasserman. Smoking adjoints: Fast evaluation of greeks in monte carlo calculations. Risk, 2006
-  Differential Machine Learning 2005.02347.pdf (arxiv.org)
-  P. Glasserman. Monte Carlo Methods in Financial Engineering, 2004
-  Sensitivity Analysis in the Dupire Local Volatility Model with Tensorflow 2002.02481.pdf (arxiv.org)
-  Martin Haugh. Estimating the Greeks
-  Riccardo Rebonato and Peter Jackel. The most general methodology to create a valid correlation matrix for risk management and option pricing purposes.