What is 'Differentiable Physics' & How Can 'Scientific Machine Learning' be Used to Solve Multi-Objective Engineering Problems?
Interview with Alexander Lavin of Pasteur Labs & ISI
Could you provide an overview of Pasteur Labs and the Institute for Simulation Intelligence, your business model and the type of work you focus on?
Pasteur Labs invents, validates, and scales new approaches integrating AI & simulation, providing in-silico playgrounds for human-machine teams tackling the most pressing challenges spanning industrial R&D and energy security arenas.
Commercialization is twofold: software applications providing AI-native digital engineering for dozens of industrial R&D uses and sectors, and the broader Pasteur platform for external teams, companies, and ecosystems to build upon proprietary, state-of-art Simulation Intelligence engines.
Pasteur Labs’ team is just over 25 full-time engineers, scientists, and leaders from diverse domains (experts from NASA to Deepmind, CERN to Meta, etc.), distributed from San Diego to Stockholm and many places in-between. The company is venture-backed and a “public-benefit” corporation, the only of its kind in pursuit of groundbreaking science and engineering in intersections of physics and AI/ML.
The “sister” org, Institute for Simulation Intelligence (ISI), is a 501c3 non-profit that supports the SI ecosystem in broad, non-commercial ways that Pasteur cannot but towards the shared mission: building Nobel-Turing technologies that advance science & society for all humankind.
“There’s a huge, unpredictable delta between what you model in engineering simulation vs what you need to engineer in reality… What if that sim-to-real delta could be optimized to zero by making digital engineering with differentiable software?”
Your upcoming presentation at CDFAM NYC discusses the role of ‘Scientific Machine Learning’ in digital engineering. How is SciML being applied in engineering today, and how do you see it being applied in the future?
SciML, that is the merger of computational sciences and data-driven machine learning*, is one of the more exciting, open-ended disciplines because of the multiple areas of human knowledge and real-world engineering it can unlock and expand.
Consider areas like nuclear fission and fusion energy, or manufacturing in aerospace or medical instruments, where well over 90% of experimentation and validation are done in the real world vs a simulated world.
What if we could flip that around, so 90% of your expensive, painfully slow, real-world testing is tackled in software? It turns out, SciML could be that catalyst. I say “could” rather than “will be”, let alone “is”, because the engineering is highly non-trivial to realize this potential, and dedication to scientific rigor (that can be lost on the AI/ML community at times) must be prioritized.
We are building this at Pasteur Labs, and, knowing that I’m slightly biased, can say confidently that legit use of SciML in engineering today is found nowhere else.
Well, one-off, bespoke solutions are popping up here and there with SciML elements, although very limited to the narrow problem they are designed (or overfit) to address—for instance to do tomographic reconstruction in medical imaging equipment, or modeling subsurface flows in reservoir engineering.**
At Pasteur we’ve built a new software stack on the computational principles of multiphysics modeling and simulation, so the SciML engines and dataflows are made available to non-ML experts, like Computational Design Engineers, and seamlessly interoperable with their existing digital engineering tools.
This is markedly different from taking the latest deep learning approaches into something like ANSYS—they can not speak the same language, in terms of dataflows and differentiable physics. Only can a SciML platform like ours, as the lingua franca for the CAE and ML ecosystems, be the flywheel on which future engineering is built, transforming multiple $100B industries and creating new ones.
CDFAM Computational Design Symposium brings together leading experts in computational design from industry, academia and software development for two days of knowledge sharing and networking, the next event taking place in Brooklyn NYC, Oct. 2-3, 2024.
Over 30 speakers at the event include representatives from NASA, Nervous System, IDEO, Sandia National Labs, KPF, HP, Carbon, nTop, Siemens Energy, Ocado, Autodesk, Neural Concept and more.
Do your clients primarily use real-world data, synthetic data, or a combination of both in their work? How do your tools handle each type, and how do you integrate them into your machine learning algorithms?
That may be the #1 fault of existing technologies that we’re addressing: the inability to utilize real-world data. One cannot simply plug-in data—this is a fundamental flaw with CAE software, and it’s muddied with incumbents trying to throw ‘AI this’ and ‘chatbot that’ into the legacy CAE software toolchains.
More acutely, if a potential customer is not interested in data generation, it’s more probable than not they simply haven’t tried working with engineering or physics data, let alone machine learning on it.
Industries are always naive about how hard it is to operationalize data in ways that build value — firehoses are expensive, and hard to tame in environments that matter. That is,
In the CAE realm, there is a lack of understanding what “ML-ready data” means: data in multitudes is a necessary but insufficient condition of the new digital engineering era. Well before machine learning (let alone inverse design or digital-physical assimilation), it is incredibly non-trivial to transform CAE simulations into datasets usable for ML.*
As for AI/ML communities, there is generally naiveté on the part of AI/ML researchers (or often ignorance in startup and investor circles) to embrace the real challenges industrial R&D teams face — for example, overly simplistic problems that are published as “solved” with deep learning (see the 2024 analysis by McGreivy & Hakim: Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations).
You’ll see in the upcoming Pasteur Labs platform launch, that we’ve built automated dataflow pipelines for interoperable CAE toolsets and ML-based problem solving — that is, the necessary parameterization and labeling for algorithms (learning, optimization, and data assimilation), data structures befitting differentiable physics and interfaces for efficient, end-to-end modeling workflows.
Differentiable programming in multi-physics settings is a key focus of your talk. Could you explain what this involves and how it might be used in engineering for advancing manufacturing?
First consider that multi-physics environments are everywhere—like fluid-structure interactions in everyday HVAC to the heat exchangers in nuclear fission systems, or thermal management in your local compute setup to remote sensing structures, and so on.
Humans describe such environments by sets of “governing equations”, which are used by computational engineering tools in the form of physics solvers — this is the dozens and dozens of fluid dynamics codes in software like Ansys and Siemens catalogues. But these are largely incomplete in describing real-world physics situations, let alone coupling multiple that vary over spatial scales and over time; there’s a huge, unpredictable delta between what you model in engineering simulation vs what you need to engineer in reality.
Now consider that differentiable programming is arguably the most powerful concept in deep learning (Lavin22a): parameterized software modules that can be trained with some form of gradient-based optimization. What if that sim-to-real delta could be optimized to zero by making digital engineering with differentiable software? Well that’s what we’ve done with the Pasteur platform, because it simply cannot be done with existing digital engineering tools, period.
Looking at manufacturing workflows, there is zero feedback nor performance data making its way upstream to inform design phases. The problem is not availability of the data or need for data-driven insights, rather the problem is digital engineering software cannot ingest data from downstream in the physical world, nor integrate with advanced data structures for model calibration let alone machine learning.
And looking at additive and advanced manufacturing (AM) pipelines, it’s hard to imagine a future for this field that does not rely on digital engineering for fast iteration over designs and confidence in novel designs based on in-silico testing. This only happens with end-to-end differentiable simulation, shipped to AM teams as plug-n-play software products—you win with machine learning, but keep your expertise focused on where it’s most needed.
How do you differentiate between reliable SciML techniques and less robust AI methods in the context of digital engineering?
Verification & Validation (Lavin22b). This is in every engineer’s DNA and doesn’t require SciML expertise to suss out:
What is “deployed” to a github repo to satisfy publication reviewers, vs deployed on-prem to solve live engineering problems?
Cool demo video, but can you see it running outside of that one use-case? Can they even tell you what use-cases it generalizes to and why?
What is self-verifiable by running alongside your battle-tested physics solvers and numerical codes?
What is validated on legit problems, and integrated with the diverse software-hardware stacks in the wild?
For these V&V reasons and more, the visual demos you see from big-tech companies may provide a bump in stock price but they do little in domain-expert confidence.
Could you share some examples where your approaches have made a significant impact in fields like energy security, aerospace, or advanced manufacturing?
Without spoiling our CDFAM presentation, we see significant impacts with users building more and more confidence in the design phase by moving more and more in-situ tests in-silico.
For example, the verification of additive manufactured heat exchangers that previously required full-scale prototypes to test multiple fluid regimes and non-static conditions: with Pasteur Labs “AutoPhysics” capabilities, you could not only simulate over 100x more fluids scenarios (versus 2-4 simulations with standard CFD), but also automatically calibrate your design models (in CAD) with sensor and performance data of the live system you’re modeling.
Those pain points are pervasive in really all engineering domains and any industrial R&D sector; accelerating the physics solvers is interesting, but closing the simulation-to-reality gaps is impactful.
What key insights or lessons do you hope the audience will take away from your presentation?
I expect the audience to be full of like-minded engineers, who care about scientific rigor and will appreciate the blood, sweat, and tears the Pasteur Labs team puts into the R&D software products we build. And I expect to see many machine learning practitioners who, like us, are excited about the industry-shaping opportunities, but will take away an appreciation for the complexities and nuances of real digital engineering.
Finally, what are you looking forward to gaining from attending CDFAM NYC?
Besides bragging about my team, I’m looking forward to learning from the many
“boots on the ground” perspectives—there’s no shortage of intriguing projects and talented engineers in this field.
Limited discount registration available for full time students and educators to learn from, and network with designers, engineers and architects pushing the Pareto from of computational engineering and design at all scales.