In our previous post on KPIs, we learnt how to “grade” an energy management algorithm. But a fundamental question remains: What is the absolute maximum score a system can achieve?
If you develop an Rule-based (RB) or AI control algorithm that saves 500g of Hydrogen on a journey, how do you know if that is the best possible result? Is it possible to save even more, say, reducing consumption to 400g?
To answer this, the control systems field relies on a “standard benchmark” to verify whether a system is truly optimised. One of the most prominent algorithms used for this purpose is Dynamic Programming (DP).
Today, we will not look at DP through dry mathematical formulas. Instead, we will explore it from a Computer Science perspective, a classic problem that every CS student has encountered: The Shortest Path Problem on a Graph. 🗺️
1. The Context: The Power Split Problem
Imagine our Fuel Cell Hybrid Electric Vehicle (FCHEV) possesses two energy sources:
- Fuel Cell (FC): Acts like a power plant. It operates efficiently under steady load but reacts poorly to sudden fluctuations.
- Battery: Acts as an energy buffer. It reacts instantly and can charge/discharge at will, but its capacity is limited.
When the driver steps on the pedal, demanding a specific power (e.g., 50kW to accelerate), the Energy Management System (EMS) must answer a crucial question:
“How much power should I draw from the Fuel Cell, and how much from the Battery, to minimise total Hydrogen consumption?”
The challenge lies in the fact that a decision made at second 1 affects the system at second 1000. If you overuse the Battery now, it will be depleted later (low ). You will then be forced to run the Fuel Cell at maximum power to recharge it, causing system efficiency to plummet.
2. Dynamic Programming: The One Who Knows the Future
Consequently, the EMS problem becomes complex because present decisions dictate future states. We need an intelligent control strategy to balance the usage of the Fuel Cell and Battery so that the aggregate Hydrogen consumption over the entire journey is minimised.
In Computer Science, Dynamic Programming (DP) is a technique for solving complex optimisation problems by breaking them down into simpler sub-problems.
In the context of EMS, DP possesses a “superpower”: It knows the future.
Let us assume we have full knowledge of the entire driving cycle beforehand—for instance, we know for certain the car will travel on a specific Highway Driving Cycle for 1800 seconds with a precise velocity profile. DP calculates backwards from the destination to the starting point to determine the optimal sequence of actions.
This is why DP is termed Global Optimisation. Theoretically, no other algorithm can save more fuel than DP.
3. The CS Perspective: DP as a “Shortest Path” Problem
From an Engineering perspective, DP is often represented by the Bellman equation. However, with a CS background, visualise it like this:
We transform the vehicle control problem into a massive Graph.
- X-axis: Time (from 0 to 1800s).
- Y-axis: Battery State of Charge (), for example, from 40% to 80%.
We discretise this space into a grid of Nodes.
- At each point , we can decide the power level that the Fuel Cell will provide (e.g., from 0kW to 50kW, divided into small steps).
- If the vehicle decides to use the Fuel Cell at power , the battery will transition to a new state at the next second .
- Cost Function: This corresponds to the amount of Hydrogen consumed during that 1 second.

The Mission of DP: To find a path from the start node () to the end node () such that the total weight (total Hydrogen consumption) along the path is MINIMISED.
Bellman’s Principle of Optimality: “If the optimal path from A to C passes through B, then the segment from B to C must also be the optimal path for that section.”
Thanks to this principle, instead of testing billions of cases (Brute force), DP computes backwards from the final second to the first, storing the cheapest cost at each point (Memoization).
4. Why is it the “Gold Standard” (Benchmark)?
If DP is so superior, why don’t major manufacturers apply it directly to their vehicles?
Although DP is theoretically the most optimal method, it suffers from critical drawbacks that make it inapplicable in real-world driving:
- Non-causal: DP requires full knowledge of the future driving cycle. In reality, when you drive, the car cannot predict whether you will hit a red light 10 minutes from now.
- Computational Cost: As discussed in the KPIs post, DP takes hours to calculate. The onboard chips in cars simply cannot handle this load.
So, why learn DP? To use it as a Benchmark.
When you develop a practical control algorithm, such as AI (Deep Reinforcement Learning) or a real-time Rule-based system, you need to compare it against DP.
- If your algorithm consumes only 2% more fuel than DP Your algorithm is excellent! 🥇
- Conversely, if it consumes 30% more than DP Your algorithm requires significant improvement.
Conclusion
Dynamic Programming (DP) is not the solution for real-time applications, but it is the lighthouse guiding the way. It reveals where the “physical limits” of the system lie. Successfully developing and coding DP for a Fuel Cell system is the first step in demonstrating a deep understanding of both the system and the optimisation problem.
In the next post, I will share how to implement DP code using MATLAB to solve this problem, from constructing the Cost Function to handling the State Grid.
See you then! 💻