Formulating Dynamic Programming for EMS: State-Space and Cost Function

In our previous post Dynamic Programming: The Gold Standard, we established that DP acts as a “map” for finding the shortest path. However, for a computer to read this map, we cannot simply feed it a physical vehicle. We must provide it with Mathematical Equations.

This step is known as Mathematical Modelling. It serves as the backbone of any control systems. If the model is flawed, any subsequent optimisation results are rendered meaningless.

Today, we will translate the physical problem of an FCHEV into the language of Mathematics: State-Space representation and the Cost Function. 📐

1. System Modelling

Before managing energy, we must determine how much energy the vehicle requires to move. This is the Longitudinal Dynamics problem.

The power demand at the wheels ( $P_{req}$ ) at any time instance $t$ is calculated based on Newton’s Second Law:

F_{trac} = F_{aero} + F_{roll} + F_{grade} + m \cdot a

From the traction force, we can derive the electrical power required from the powertrain system (Fuel Cell + Battery), accounting for the efficiency of the electric motor and inverter:

P_{elec\_req}(t) = \frac{v(t)}{\eta_{motor}} \cdot \left( \frac{1}{2}\rho A C_d v(t)^2 + mgC_r \cos(\alpha) + mg \sin(\alpha) + m \frac{dv}{dt} \right)

Note: In the DP algorithm, since the driving cycle ( $v(t)$ and $\alpha(t)$ ) is known a priori, $P_{elec\_req}(t)$ acts as a disturbance input at each time step.

2. Optimisation Problem Formulation

To apply Dynamic Programming, we must structure the system into a standard Discrete-time Optimal Control format. This structure typically comprises three elements: $x$ (State), $u$ (Control), and $w$ (Disturbance).

a. State Variable ( $x$ )

The state variable represents the system’s “memory.” In the EMS problem for hybrid vehicles, the most critical time-varying variable is the Battery State of Charge (SOC).

x_k = SOC_k

The State Transition Equation from step $k$ to $k+1$ is defined as:

SOC_{k+1} = SOC_k - \frac{V_{oc} - \sqrt{V_{oc}^2 - 4 R_{int} P_{batt}(u_k)}}{2 R_{int} Q_{batt}} \cdot \Delta t

(Do not be alarmed; this is simply the current calculation formula $I = P/V$ , rewritten based on the simplified Rint battery model).

b. Control Variable ( $u$ )

This is the decision variable. We can choose to control either the battery current or the Fuel Cell power. Typically, I select the Fuel Cell Power as the control variable:

u_k = P_{fc,k}

c. Power Balance Constraint

At every instant, the energy supplied must equal the energy consumed:

P_{fc} + P_{batt} = P_{elec\_req}

Consequently, the battery power ( $P_{batt}$ ) becomes a dependent variable: $P_{batt} = P_{elec\_req} - P_{fc}$ .

3. The Cost Function ( $J$ )

The objective of DP is to find a control sequence $\pi = \{u_0, u_1, ..., u_{N-1}\}$ that minimizes a global cost function $J$ .

J = \sum_{k=0}^{N-1} L(x_k, u_k) + \Phi(x_N)

Where:

$L(x_k, u_k)$ (Instantaneous Cost): The cost incurred at each step. In this context, it represents the Hydrogen consumption (in grams) for that second. $L(x_k, u_k) = \dot{m}_{H2}(P_{fc,k}) \cdot \Delta t$ (The value $\dot{m}_{H2}$ is obtained from the Fuel Cell efficiency map).
$\Phi(x_N)$ (Terminal Cost): The penalty cost at the final step.
- To ensure a fair comparison, we typically enforce a Charge-Sustaining condition, where the final SOC must equal the initial SOC ( $SOC_{end} = SOC_{start}$ ).
- If the vehicle finishes the cycle with a deviated SOC, a massive penalty (Infinity) is applied to $J$ , forcing the DP algorithm to find an alternative path that satisfies the condition.

4. System Constraints

While mathematics allows $P_{fc}$ to be infinite, physics does not. We must impose strict Inequality Constraints:

Fuel Cell Constraints: $0 \le P_{fc} \le P_{fc}^{max}$ $- \Delta P_{down} \le (P_{fc,k} - P_{fc,k-1}) \le \Delta P_{up} \quad (\text{Ramp rate limits})$
Battery Constraints: $SOC_{min} \le SOC_k \le SOC_{max} \quad (\text{e.g., } 0.4 - 0.8)$ $P_{batt}^{min} \le P_{batt,k} \le P_{batt}^{max}$

5. Numerical Example (Case Study)

To visualize how DP operates at a single time step ( $t_k$ ), let’s walk through a simplified scenario with hypothetical data.

Assumptions at time step $k$ :

Demand: The driver steps on the pedal, requesting $P_{req} = 30 \text{ kW}$ .
Current State: The battery is at $SOC_k = 60\%$ .
Time Step: $\Delta t = 1 \text{ second}$ .
Battery Specs: Capacity $Q = 5 \text{ kWh}$ ( $18 \text{ MJ}$ ), Open Circuit Voltage $V_{oc} = 300 \text{ V}$ .

The DP algorithm will discretise and test 3 feasible control candidates ( $u$ ) for the Fuel Cell and compare them:

Step 1: Power Split Calculation

Using the balance equation $P_{batt} = P_{req} - P_{fc}$ :

Option A (EV Mode): Fuel Cell OFF ( $P_{fc} = 0$ ). Battery takes full load $\to P_{batt} = 30 \text{ kW}$ .
Option B (Load Following): Fuel Cell matches demand ( $P_{fc} = 30 \text{ kW}$ ). Battery idle $\to P_{batt} = 0 \text{ kW}$ .
Option C (Charging Mode): Fuel Cell boosts ( $P_{fc} = 45 \text{ kW}$ ). Excess 15kW charges battery $\to P_{batt} = -15 \text{ kW}$ .

Step 2: Instantaneous Cost Calculation ( $L$ )

Looking up the Fuel Cell consumption map:

Option A: $P_{fc} = 0 \to \dot{m}_{H2} = 0 \text{ g/s}$ .
Option B: $P_{fc} = 30 \text{ kW} \to \text{Efficiency } 55\% \to \dot{m}_{H2} \approx 0.45 \text{ g/s}$ .
Option C: $P_{fc} = 45 \text{ kW} \to \text{Efficiency } 50\% \to \dot{m}_{H2} \approx 0.75 \text{ g/s}$ .

Step 3: State Transition Update ( $SOC_{k+1}$ )

Calculate the change in battery energy and the new $SOC$ . (Simplified formula: $\Delta SOC \approx - \frac{P_{batt} \cdot \Delta t}{Q_{batt}}$ )

Option A (Discharge 30kW): Significant energy drain. $SOC$ drops to $\approx 59.8\%$ .
Option B (Idle): No energy change. $SOC$ remains $60.0\%$ .
Option C (Charge 15kW): Energy gained. $SOC$ rises to $\approx 60.1\%$ .

Step 4: Total Cost Evaluation (Cost-to-Go + Instantaneous Cost)

This is the decisive step. DP looks not only at the present but also at the future. The future cost ( $J_{next}$ ) is retrieved from the Cost-to-Go matrix (which was calculated backwards from the end of the cycle).

Assumption: The Cost-to-Go matrix indicates that having low SOC (59.8%) incurs a high future penalty (recharging needed later), while having high SOC (60.1%) reduces future costs.

Candidate ( $u$ )	Instantaneous Fuel ( $L$ )	Assumed Future Cost ( $J_{next}$ )	TOTAL COST ( $J$ )
A ( $P_{fc}=0$ )	0 g (Cheapest Now)	100 g (High penalty)	100 g
B ( $P_{fc}=30$ )	0.45 g	50 g (Medium)	50.45 g
C ( $P_{fc}=45$ )	0.75 g (Most Expensive)	49.8 g (Low penalty)	50.55 g

DP’s Verdict: At this specific second, Option B is the optimal choice (Lowest Total Cost of 50.45). Although Option A consumes zero Hydrogen right now, DP “foresees” that depleting the battery will cost more in the long run, so it rejects the EV mode in this specific context.

Conclusion

We have successfully “translated” a physical vehicle into mathematical equations:

State ( $x$ ): Battery SOC.
Control ( $u$ ): Fuel Cell Power.
Objective ( $J$ ): Minimize Hydrogen consumption.
Rules (Constraints): Physical limits of the components.

With these components in place, the remaining task is to solve the Bellman equation. But how do we implement this on a computer? How do we handle state grid discretization?

In the next post, I will share the detailed MATLAB code to solve this problem. Get your MATLAB ready! 💻