In this section I review the impact of my work and list the open questions and challenges for model discovery.
Despite the massive progress in the last few years, model discover of (partial) differential equations is still a field in its infancy. At the start of this thesis in early 2019, both the sparse regression approach for PDEs and PINNs were novel; to our knowledge, no one had attempted to perform model discovery on noisy experimental data. Considering the experiments on synthetic data that would have been a futile attempt - they all showed model discovery required densely sampled datasets with very little noise (<10%, but more often <2% noise). With this we set out to construct methods able to handle noisy and sparse datasets originating from experiments. The line of work we have developed with DeepMoD makes large strides towards this goal. In each paper we show that our methods are able to handle >50% noise and work with an order of magnitude less data than classical methods on increasingly challenging (synthetic) datasets such as the Kuramoto-Shivashinsky equation. DeepMoD also recovered the underlying equation when applied to data generated by simple experiments, validating our approach.
A second, more implicit goal was to create accessible methods. Complex methods are less likely to be adapted than simple ones, and if the goal is to make model discovery a valuable addition to scientists’ toolboxes, complexity should be strongly limited. Ideally, scientists without a background in numerical methods should be able to understand and apply our methods. Deep learning is an ideal vehicle for this, as many scientists these days have at least a basic understanding. Automatic differentiation, the key but complex mechanism at the core of DL, is often abstracted away to frameworks such as Pytorch, making it possible to construct powerful approaches without in-depth numerical knowledge. This leads to the peculiar situation where a neural network-based approach is regarded as more accesible than splines. Indeed, our work shows that an integrated combination of simple features (a basic MLP, a simple Lasso and a constraint) can easily and strongly outperform much more complex traditional approaches - a testament to the power of differentiable programming.
Our work also carries strong implications for future work and for model discovery as a field. First and foremost, it shows that the limiting factor is the accuracy of the features, implying that accurately modelling and denoising data is just as important as the sparse regression. Perhaps paradoxically then, significant progress can be made by focussing on data-driven modelling. In all cases, neural networks (especially PINNs) should feature prominently; the benefits of automatic differentiation and excellent inter- and extrapolation only become more pronounced in high-dimensional data. That is not to say that non-DL-based approaches should be neglected. A main thread in our work has been to show how ‘classical’ methods such as sparse and Bayesian regression can be integrated in DL approaches, and most progress can be made by synthesizing these approaches. The Bayesian regression and model selection literature is especially rich, and combining DL-based modelling with these methods should prove fruitful.
Taken together, our work strongly establishes the argument for physics-constrained, neural network-based surrogates for model discovery of PDEs on experimental data.
Model discovery is a young and exciting field, and as with all young fields, many limitations, challenges and questions remain. As such I list these in no particular order below - I’ve mentioned some of them before, others might have occured to you while reading this thesis and a few of them are of a more philosophical nature.
Perhaps the single biggest barrier to apply model discovery on novel, experimental data is the lack of methods able to handle data with spatially and temporally varying coefficient fields. Initial work (Rudy et al. 2018; Chen and Lin 2021) has focussed on leveraging group sparsity, but so far this approach works only for a single varying dimension - sufficient for temporal dependence, but not for spatial dependence (2D at least). Scaling this approach to spatio-temporal dependence yields a compressed sensing problem: every sample becomes a separate regression problem (\(n=1\)) with many features (\(p \gg n\)). An alternative approach would be to extend on DeepMoD, and use a neural network to model the field. This implicitly encodes a smoothness bias into the fields and initial results have been promising. However, to perform actual model discovery would require some criterion for deciding when a field can be considered inactive.
Contrarily, perhaps one of the biggest opportunities for model discovery would be the synthesis of data from multiple sources. Rarely does a single experiment tell the whole story: we need multiple experiments to capture all of a systems dynamics. Initial work using group sparsity (Silva et al. 2019; Tod, Both, and Kusters 2021) only scratches the surface of what is possible. How can data from multiple experiments with different length and time scales be integrated? Initial work on combining timescales (Champion, Brunton, and Kutz 2019) on ODEs is promising, but has not been expanded to PDEs, nor to (multiple) lengthscales. Even more interesting would be to combine data from different experimental setups and instruments, also known as multimodal fusion.
Related to the data synthesis challenge is that of sampling, active learning and experimental design. Sampling has been studied in-depth from the perspective of signal reconstruction (Brunton et al. 2013), but not from the viewpoint of model discovery - a subtly different problem. Is there an optimal sampling strategy for model discovery, and if so, what would it be? Having more knowledge about this could form the basis for an active learning approach, where the model itself suggests where to sample to optimize model discovery. The most ambitious form of this would be experiment design, with an agent suggesting novel experiments designed to aid model discovery.
All of the neural networks used in our work were simple MLPs - no normalization, batching, or attention mechanisms. How much do the implicit biases of the function approximator impact the discovered equation? Various works show that incorporating underlying symmetries and invariances in the network improves performance (Cranmer et al. 2020; Greydanus, Dzamba, and Yosinski 2019; Finzi, Welling, and Wilson 2021), but in the context of model discovery these would have to be discovered from data.
Model discovery is usually applied directly on the observable data; we observe some quantity \(u\) and also wish to find a model for \(u\). In some cases the observable is not fully observed - the model is described by an \(n\)-dimensional state, and we only observe some of those dimensions -, or not at all what we wish to model - we observe single particles but want to model a density-. This significantly complicates model discovery, as it requires first constructing or inferring these ‘latent’ dynamics. This is likely to yield biased estimates and, in our experience, model discovery deals poorly with bias in data. Making model discovery more robust at dealing with bias is a sorely understudied subject.
A reflection on in what kind of settings model discovery can be useful is needed. Most papers (including our own) use model discovery on systems where the dynamics are already known, and relatively simple. In what kind of data could model discovery be truly useful? Can the complex, high dimensional data associated with the modern world by described by relatively simple, physics-inspired PDEs? Is model discovery only useful to discover effective models, or would it be able to discover new, more fundamental dynamics?