Limited experience in either of these fields does not stop thought or research. At the risk of being corrected, from which I will learn, I’ll share those thoughts.

Early AI in FM was broadly expert systems. Used to advise on hedging to minimise overnight risk etc or to identify certain trends based on historical information. Like early symbolic maths programs (1980s) that revolutionised the way in which theoretical problems can be solved (transformed) without error in a fraction of the time, early AI in FM put an expert with a probability of correctness on every desk. This is not the AI I am interested in. It it only artificial in the sense it artificially encapsulates the knowledge of an expert. The intelligence is not artificially generated or acquired.

Machine learning covers many techniques. Supervised learning takes a set of inputs and allows the system to perform actions based on a set of policies to produce an output. Reinforcement learning https://en.wikipedia.org/wiki/Reinforcement_learning favors the more successful policies by reinforcing the action. Good machine, bad machine. The assumption is, that the environment is stochastic. https://en.wikipedia.org/wiki/Stochastic or unpredictable due to the influence of randomness.

Inputs and outputs are simple. They are a replay of the historical prices. There is no guarantee that future prices will behave in the same way as historical, but that is in the nature of a stochastic system. Reward is simple. Profit or loss. What is not simple is the machine learning policies. AFAICT, machine learning, for a stochastic system with a large amount of randomness, can’t magic the policies out of thin air. Speech has rules, Image processing also and although there is randomness, policies can be defined. At the purests level, excluding wrappers, financial markets are driven by the millions of human brains attempting to make a profit out of buying and selling the same thing without adding any value to that same thing. They are driven by emotion, fear and every aspect of human nature rationalised by economics, risk, a desire to exploit every new opportunity, and a desire to be a part of the crowd. Dominating means trading on infinitesimal margins exploiting perfect arbitrage as it the randomness exposes differences. That doesn’t mean the smaller trader can’t make money, as the smaller trader does not need to dominate, but it does mean the larger the trader becomes, the more extreme the trades have to become maintain the level of expected profits. I said excluding wrappers because they do add value, they adjust the risk for which the buyer pays a premium over the core assets. That premium allows the inventor of the wrapper to make a service profit in the belief that they can mitigate the risk. It is, when carefully chosen, a fair trade.

The key to machine learning is to find a successful set of policies. A model for success, or a model for the game. The game of Go has a simple model, the rules of the game. Therefore it’s possible to have a policy of, do everything. Go is a very large but ultimately bounded Markov Decision Process (MDP). https://en.wikipedia.org/wiki/Markov_decision_process Try every move. With trying every move every theoretical policy can be tested. With feedback, and iteration, input patterns can be recognised and successful outcomes can be found. Although the number of combinations is large, the problem is very large but finite. So large that classical methods are not feasible, but not infinite so that reinforcement machine learning becomes viable.

The MDP governing financial markets may be near infinite in size. While attempts to formalise will appear to be successful the events of 2007 have shown us that if we believe we have found finite boundaries of a MDP representing trade, +1 means we have not. Just as finite+1 is no longer finite by the original definition, infinite+1 proves what we thought was infinite is not. The nasty surprise just over the horizon.