Differential Machine Learning
Our work lies at the junction of two areas. The first is artificial intelligence, for which Google has recently made a TensorFlow environment for learning deep neural networks. The second is the evaluation of complex financial instruments for the derivatives market. A derivative refers to derivative financial instruments or contracts, that enable two parties to enter into a transaction for the right or obligation to use any underlying asset (for example, shares). Under this agreement, one party undertakes to sell/buy, exchange or provide a certain product or a package of securities on special conditions that are stipulated in the agreement. Such contracts protect from price changes since the seller is obliged to fulfill the contract at a strictly specified price.
The simplest option is the European option. A European option is a contract under which a buyer of the contract gets the right, but not necessarily, to buy or sell some underlying assets at a predetermined price at a certain moment in the future. The underlying asset can be a stock or a currency exchange rate. An option that gives the right to buy an underlying asset is called a CALL option. The right to sell is called a PUT option. STRIKE is the option price at which the option gives the right to conclude a deal in the future. The time specified in the contract in advance at which the option can be used is the expiration time of the option. The buyer of such a contract must pay a premium to the seller, since the seller undertakes to provide the underlying asset at a certain price in the future, if the market prices are higher than the strike, the seller will lose money because the underlying asset will be provided at a lower price. Therefore, the option price at expiration or payoff is C = Max(S(T)-K,0.0), where S(T) is the asset value, and K is a strike. The main challenge in pricing such derivatives is to determine the fair premium. A contract that looks like an option and is familiar to everyone is insurance. Anyone who buys the insurance gets the right, under certain conditions, to receive an insurance premium, and the insurance company undertakes to pay this premium. On the exchange, you can get Call and Put options on different expiry dates and different strikes. Such models are built based on the results of non-school probability theory, stochastic differential equations, martingale measure, partial differential equations, and also very slow numerical methods - this is the essence of the Monte Carlo method.
The option value is determined by many factors such as strike price, time to maturity, volatility, interest rates, dividends. The risk to anyone who buys or sells options is that the value of the option changes. Thanks to option pricing mathematical models, it is possible to calculate the impact of a price change in any of these factors. There is also a risk parameter called "Greek". Calculating the market value of the option is very challenging. Monte Carlo and similar calculation methods are very slow and laborious, they require large computing clusters and hundreds of millions of dollars invested in speeding up the methods for calculating such options, expensive infrastructure, and there is a huge number of challenging tasks. Recently, special types of neural networks have been developed that help in the approximation of options and their derivatives (Greeks) at the same time. Such networks approximate not only the value of a function, but also its shape at once, and they are very useful for calculating Greeks.
This paper provides new neural network types research, which allows to significantly speed up fair price and risk evaluation for various Option contracts. Methods described in this research allow traders to recognize arbitrage opportunities faster than competitors and, thus generate profit.
The objective of this internal research was to evaluate and gain experience with the application of two methods used for pricing and sensitivity analysis of exotic financial derivative instruments, namely, adjoint pathwise Monte Carlo and deep learning.
The work was inspired by the publication  of Danske Bank quantitative analysts Antoine Savine and Brian Huge in which the authors introduced a novel approach to building extremely efficient pricing and risk approximators for arbitrary financial derivative instruments.
The preliminary results of the use cases selected for evaluation of the approach were discussed with Antoine Savine.
Differential machine learning (ML) presented in , combines automatic adjoint differentiation (AAD)  with deep learning to estimate the value and risk sensitivities of the financial derivatives.
Differential ML is a kind of supervised learning, where the models are trained on datasets (inputs & labels) augmented with differentials of labels wrt inputs. In the context of financial Derivatives and risk management, pathwise  differentials are efficiently computed with automatic adjoint differentiation (AAD). The pathwise estimator is calculated by interchanging the order of differentiation and integration.
For example, in the simplest situation in the Black-Scholes framework. While an explicit expression for the option delta is available, we can also estimate it via the pathwise method as follows. We first write the option payoff as
where it follows from (2) and (3) that
the estimator (4) is easily calculated via a Monte Carlo simulation.
The foundation of differential ML is a twin network. A twin network combines two networks into a single representation, corresponding to the computation of a prediction (approximate price) together with its differentials wrt inputs (approximate risk sensitivities). The first part of the twin network predicts a value. The second part predicts risk sensitivities. It is the mirror image of the first half, with shared connection weights .
The strength of the approach is in its fast learning. When learning Derivatives pricing and risk approximation, the main computation load belongs to the simulation of the training set. For complex products prices are computed numerically, generally by Monte Carlo. Monte Carlo valuation has a highly unrealistic cost in a practical context. In this approach, sample datasets are produced for the computation cost of one Monte Carlo pricing, where each example is one sample of the payoff, simulated for the cost of one Monte Carlo path.
As a result, learning time is reduced dramatically and training sets are simulated in real-time.
Another strength is accurate real-time options and 'greeks' pricing. The calculation speed of 'greeks' is not much more than the speed of calculation of closed-form solution. This methodology is applicable to arbitrary Derivatives instruments under arbitrary stochastic models of the underlying market variables.
Differential machine learning, combined with ADD, provides extremely effective pricing and risk approximations. We can produce fast pricing analytics in models too complex for closed-form solutions, extract complex transactions risk factors and trading books, and effectively compute risk management metrics like reports across a large number of scenarios, backtesting, and hedge strategies simulation, or regulations like XVA, CCR, FRTB. XVA calculations are very computationally intensive. The standard solution has been to perform calculations at lower levels of resolution (for example, by running fewer Monte Carlo scenarios) to reduce the computational time, but in this case, XVA gets noisy P/L and non-stable sensitivities. Differential machine learning can significantly speed up the in-house models' performance.
Also, this methodology can be used in high-dimensional cases for large portfolios. PCA helps to reduce dimensions and differential PCA to learn NN more effectively, it helps to minimize the neural network size and the number of inputs. This approach can be also useful for execution algos because we can easily calculate ‘greeks’ and duration for different complex products.
Cases selected to evaluate the approach
- Valuation of Asian options with arithmetic averaging and their greeks (delta and vega) for the model with constant volatility.
- Valuation of Asian options with arithmetic averaging and their greeks (delta and vega) with arbitrary volatility curve.
- Valuation of option written on the basket of correlated stocks. We also estimated gamma, option 2nd derivative.
- Estimation of callable bond and its duration.
- Differential PCA  for a large portfolio of correlated instruments (250, 500, and 1000).
- LIBOR Market Model . Pricing options which payoffs depend on LIBOR rates.
- Pricing of option and its vega for the model of stochastic volatility (SABR).
- Worst-of options valuation and their greeks for the basket of n-correlated instruments.
1. Valuation of Asian options and ‘greeks’ (delta and vega). In this experiment, we assume volatility as a constant parameter. It is easy to see that a differential neural network gives better accuracy than a feed-forward net, especially for vega.
2. Valuation of Asian options and ‘greeks’. In this approach, we modeled the volatility by the mean volatility curve. It is very important for practitioners. We admit different volatilities for different averaging periods. The volatility curve can be extracted from the Option chain. In this situation, we have a sensitivity of option value for each volatility. It was discovered, that the greater number of volatilities causes the stronger advantage of the differential neural network. Differential neural network outperforms feed-forward net in high dimensional tasks.
3. Valuation of Option written on n correlated stocks. We estimated gamma, it is the second derivative. Gamma is used in options trading, it is important to fast speed and high accuracy for this ‘greek’. We combine the pathwise and likelihood ratio method for this objective. In contrast to the pathwise method, the likelihood ratio method differentiates a probability density with respect to the parameter of the interest. It provides a good potential alternative to the pathwise method when the payoff is not continuous . We rewrite
Here fe(x) is a piecewise linear approximation to the function 1(x>K) and that he(x) corrects the approximation, We applied the pathwise estimator to fe(x) and likelihood estimator to he(x). We provide results for 5-dimensional cases. It is easy to see that accuracy of the differential neural network is better than the accuracy of the feed-forward neural network.
4. We estimated callable bond and duration. Fast estimation of duration is important for hedging. For the interest-rate dynamic, we assumed the Bachelier model.
Differential neural network catches dynamic of the duration more accurately.
5. We validated the approach for large portfolios like 250, 500, and 1000 instruments. Directly learning ‘greeks’ for large portfolios is a highly expensive task from a computational point of view. We tried to validate the methodology of Differential PCA. Differential PCA removes irrelevant factors and considerably reduces the dimension. As a data preparation step, differential PCA may significantly reduce the dimension, enabling faster, more reliable training of neural networks. Differential PCA is a useful algorithm in its own right, providing a low dimensional latent representation of data on orthogonal axes of relevance. In our case, it dramatically shrunk dimensions. We have learned differential neural networks in reduced space. We have trained networks on 100K paths for portfolios with 250, 500, 1000 instruments.
In all cases, the accuracy of ‘greeks’ for differential machine learning is higher than the feed-forward net.
6. We considered LIBOR Market Model by Prof Mike Giles, Mathematical Institute, University of Oxford. We discretized Monte Carlo simulation for forwarding LIBOR rates.
With equation (6) we can simulate forward LIBOR rates at particular time points and price options whose payoffs depend on LIBOR rates. We build a TensorFlow graph of calculation for each scenario in the Monte Carlo simulation, the dependence of payoff from initial values of LIBOR forward rates. In this case, we consider the case of caplet, but it does not matter, we can easily change it to another payoff. Because we build the LIBOR model calculation graph, we get gradients or greeks for the LIBOR model and construct a train set to learn differential neural networks. In this case, the differential neural network outperforms the feed-forward network. We estimate caplets. 
There are values of the caplet.
There are greeks of the caplet.
7. Also, we have worked with the SABR model. It is a stochastic volatility model. The SABR model helps to explain the volatility and resolve the problem of unstable hedges.
The accuracy of the differential neural networks is higher again.
8. Also, we considered the worst-of options. We estimated worst-of options for a basket of n correlated stocks. It is a case of 5 stocks. A differential neural network works better than a feed-forward network.
In all the cases selected by us to test the approach , the differential neural network provided much better accuracy and convergence rate than the feed-forward network. It was shown that the sensitivities of the Asian option in the case of volatility curves are estimated with good accuracy. It was empirically tested that the higher the order of volatility, the stronger advantage of a differential neural network. We also estimated the second derivative of payoff by the mean likelihood method . A fast and accurate estimate of gamma is important for gamma trading. We also successfully applied the approach to instrument portfolios. We reduced the dimension of input risk factors via differential PCA . We considered the LIBOR Market Model by Prof Mike Giles . We built a TensorFlow calculation graph of one Monte Carlo path for forwarding LIBOR rates and got gradients and used it as a supervisor. It is a new example of how easily we can resolve very complicated tasks using TensorFlow and differential machine learning. This approach can be useful for any interest rate derivatives . We considered the case of worst-of options. It is a very popular option and it is very important to accurately price this option and greeks.
- To gain experience with TPU. (Tensor Processing Unit) .
- To build a use case for XVA calculations (hundred thousand of the financial derivative portfolio).
- To go to the cloud (Google, AWS, Azure).
- To build a front-end prototype for the application
-  P. S. Hagan, D. Kumar, A. S. Lesniewski, and D. E. Woodward. Managing smile risk. Wilmott Magazine, 1:84–108, 2002.
-  Libor Example Nag
-  M. Giles and P. Glasserman. Smoking adjoints: Fast evaluation of greeks in Monte Carlo calculations. Risk, 2006
-  Differential Machine Learning 2005.02347.pdf (arxiv.org)
-  P. Glasserman. Monte Carlo Methods in Financial Engineering, 2004
-  Sensitivity Analysis in the Dupire Local Volatility Model with Tensorflow 2002.02481.pdf (arxiv.org)