For a single tick, the computation time required for the main procedures is recorded in Table 8. In addition to the algorithmic calculations, we reserve time for some mechanical order-related activities, such as order submission and execution in exchanges. The Chinese A-share market can satisfy this tick-time condition with its update frequency of 3 s. Our empirical study shows that our deep LOB trading system is effective in the context of the Chinese market, which will encourage its use by other traders.


Sadighian develops a framework for cryptocurrency MM employing two advanced avellaneda-stoikov paper gradient-based RL algorithms and a state space comprised of both LOB data and order flow arrival statistics. Gašperov and Kostanjčar present a framework underpinned by ideas from adversarial RL and neuroevolution, with experimental results demonstrating its superior reward-to-risk performance. Other notable approaches introduce additional features like the presence of a dark pool , multi-asset MM in corporate bonds , and dealer markets . To overcome this problem, a deep Q-network approximates the Qs,a matrix using a deep neural network. The DQN computes an approximation of the Q-values as a function, Q(s, a, θ), of a parameter vector, θ, of tractable size.

Associated Data

This is not what the algorithm will use as proposed bid and ask. Observing the limit order book can dissipate quickly as execution latency increases. Might be used to further improve generalization and robustness under model uncertainty. Pulling all of that together was mathematically complicated due to the fact that client flows are discrete while trading on liquidity pools is continuous.

  • Closing_time – Here, you set how long each “trading session” will take.
  • Combining a deep Q-network (see Section 4.1.7) with a convolutional neural network , Juchli achieved improved performance over previous benchmarks.
  • Cricket teams are ranked to indicate their supremacy over their counter peers in order to get precedence.
  • The main contribution we present in this paper resides in delegating the quoting to the mathematically optimal Avellaneda-Stoikov procedure.
  • In comparison, both the mean and the standard deviation of the Max DD for the Alpha-AS models were very high.

This kind of scales generate ordinal variables made up of a set of rank ordered items. Since the distance between two consecutive items cannot be either defined or presumed equal, this kind of variable cannot be analysed by either statistical methods defined on a metric space or parametric tests. Therefore, Likert-type variables cannot be used as segmentation variables of a traditional cluster analysis unless pre-transformed. In such context, fuzzy numbers have been suggested as a way to recode Likert-type variables. Fuzzy numbers are defined by a membership function whose form is usually determined by an expert. In practice, researchers usually define one membership function for each Likert-type scale, not considering the peculiar characteristics of neither questions nor respondents.

Learning from imbalanced data

Together, a) and b) result in a set of 2×10d contiguous buckets of width 10−d, ranging from −1 to 1, for each of the features defined in relative terms. Approximately 80% of their values lie in the interval [−0.1, 0.1], while roughly 10% lie outside the [−1, 1] interval. Values that are very large can have a disproportionately strong influence on the statistical normalisation of all values prior to being inputted to the neural networks. By trimming the values to the [−1, 1] interval we limit the influence of this minority of values. The price to pay is a diminished nuance in the learning from very large values, while retaining a higher sensitivity for the majority, which are much smaller.

Nevertheless, it still assumes constant order arrival intensities, meaning that any effects of self- or mutual- excitation and inhibition between various types of order arrivals remain unaccounted for. In Section 2, we introduce some basic concepts and describe the input LOB datasets. An innovative feature of the model is the segmentation of clients into tiers, which allows it to capture their response to prices changes more accurately. “In this paper, we offer a way to separate clients into tiers in a purely quantitative way, by analysing their trading flow.

In order to see the time evolution of the process for larger inventory bounds. This part intends to show the numerical experiments and the behaviour of the market maker under the results given in Sect. Similar to the proof of Proposition2, the optimal spreads can be found by the first order optimality conditions. For instance, even after comments about reference formatting, some references have missing publications, years, issues, or even author names . Also, there seems to be a large number of arxiv or SSRN preprints listed for references which are actually published, either as working papers by some institutions or even in peer reviewed journals . Some of these will most likely be handled by the editorial team, but the extent of the errors is too large, evidently due to the revisions made by authors being mostly superficial.

The genetic algorithm selects the best-performing values found for the Gen-AS parameters on the corresponding day of data. This procedure helps establish AS parameter values that fit initial market conditions. The same set of parameters obtained for the Gen-AS model are used to specify the initial Alpha-AS models. The goal with this approach is to offer a fair comparison of the former with the latter. By training with full-day backtests on real data respecting the real-time activity latencies, the models obtained are readily adaptable for use in a real market trading environment.

Code, Data and Media Associated with this Article

The Avellaneda & Stoikov model was created to be used on traditional financial markets, where trading sessions have a start and an end. This parameter is used to calculate what is the difference between the current inventory position and the desired one. It’s easy to see how the calculated reservation price is different from the market mid-price .

Should you hedge or should you wait? –

Should you hedge or should you wait?.

Posted: Wed, 24 Aug 2022 07:00:00 GMT [source]

That is, these agents decide the bid and ask prices of their orderbook quotes at each execution step. The main contribution we present in this paper resides in delegating the quoting to the mathematically optimal Avellaneda-Stoikov procedure. What our RL algorithm determines are, as we shall see shortly, the values of the main parameters of the AS model.

2 Case 2: Exponential utility function

Finally, the best-performing model overall, with its corresponding parameter values contained in its chromosome, is retained for subsequent application to the problem at hand. In our case, it will be the AS model used as a baseline against which to compare the performance of our Alpha-AS model. Starting from the Avellaneda–Stoikov framework, we consider a market maker who wants to optimally set bid/ask quotes over a finite time horizon, to maximize her expected utility. The intensities of the orders she receives depend not only on the spreads she quotes, but also on unobservable factors modelled by a hidden Markov chain.

However, tree outputs may be unreliable in presence of scarce data. The imprecise Dirichlet model provides workaround, by replacing point probability estimates with interval-valued ones. This paper investigates a new tree aggregation method based on the theory of belief functions to combine such probability intervals, resulting in a cautious random forest classifier. In particular, we propose a strategy for computing tree weights based on the minimization of a convex cost function, which takes both determinacy and accuracy into account and makes it possible to adjust the level of cautiousness of the model.


We tackle this stochastic control problem under partial information with a model that unifies and generalizes many existing ones under full information, combining several risk metrics and constraints, and using general decreasing intensity functionals. We use stochastic filtering, control and piecewise-deterministic Markov processes theory, to reduce the dimensionality of the problem and characterize the reduced value function as the unique continuous viscosity solution of its dynamic programming equation. We then solve the analogous full information problem and compare the DOGE results numerically through a concrete example. We show that the optimal full information spreads are biased when the exact market regime is unknown, and the market maker needs to adjust for additional regime uncertainty in terms of P&L sensitivity and observed order flow volatility. This effect becomes higher, the longer the waiting time in between orders.


In 2008, Avellaneda and Stoikov published a procedure to obtain bid and ask quotes for high-frequency market-making trading . The successive orders generated by this procedure maximize the expected exponential utility of the trader’s profit and loss (P&L) profile at a future time, T , for a given level of agent inventory risk aversion. In this paper we present a limit order placement strategy based on a well-known reinforcement learning algorithm. We use the RL algorithm to modify the risk aversion parameter and to skew the AS quotes based on a characterization of the latest steps of market activity. Another distinctive feature of our work is the use of a genetic algorithm to determine the parameters of the AS formulas, which we use as a benchmark, to offer a fairer performance comparison to our RL algorithm.

There are various methods to achieve this, a particularly common one being gradient descent. Stock price prediction and modeling demonstrate high economic value in the financial market. Due to the non-linearity and volatility of stock prices and the unique nature of financial transactions, it is essential for the prediction method to ensure high prediction performance and interpretability.

The DQN then learns periodically, with batches of random samples drawn from the replay buffer, thus covering more of the state space, which accelerates the learning while diminishing the influence of single or of correlated experiences on the learning process. These successes with games have attracted attention from other areas, including finance and algorithmic trading. The large amount of data available in these fields makes it possible to run reliable environment simulations with which to train DRL algorithms. DRL is widely used in the algorithmic trading world, primarily to determine the best action to take in trading by candles, by predicting what the market is going to do.

L of the algorithm, which aff ects the prices at which the asset is traded. A multivariate linear Hawkes process with exponential kernels is used to model dependencies, namely cross-excitation and self-excitation effects, between different LOB order arrivals (i.e. event types). The choice of exponential kernels is motivated by both their tractability to simulation and suitability for modeling short-term memory processes describing market microstructure, due to which they are used traditionally in financial applications . Furthermore, their use conveniently results in the Markovian properties of the process . In order to simulate the multivariate Hawkes process, we rely on Ogata’s modified thinning algorithm . A closed-form solution for options with stochastic volatility with applications to bond and currency options.