%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Use dynamic programming to solve the shortest path problem given on % slide 14 of the deterministic finite-state control lecture %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % clean up the workspace %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear all; close all; clc; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Define the problem data %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % the number of states n = 4; % the time horizon T = 3; % the stage costs % a b c d gxu = [Inf , 2 , 6 , Inf ; % a Inf , Inf , 3 , 8 ; % b Inf , 1 , Inf , 4 ; % c Inf , Inf , Inf , 0]; % d % the terminal costs gT = [Inf ; % a Inf ; % b Inf ; % c 0]; % d %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Simple dynamic programming recursion % This naive implementation should be easy to understand %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % it is faster to allocate space to store the value function and policy % in advance; we also fill in the final value function V = [Inf*ones(n,T) , gT]; U = NaN*zeros(n,T); % the recursion runs backwards in time for t = T:(-1):1 % we need to find the value of every state for x = 1:n % we need to try every input to find the best one for u = 1:n % compute the cost-to-go of choosing input u in state x Vux = gxu(x,u) + V(u,t+1); % we update the value function and policy if we find an % input u that is better than any we've seen before if u == 1 || Vux < V(x,t) V(x,t) = Vux; U(x,t) = u; end end end end % store the value function and input, so we can compare with the fast % implementation VV = V; UU = U; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Fast dynamic programming recursion % This implementation uses vectorized operations to speed things up %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % it is faster to allocate space to store the value function and policy % in advance; we also fill in the final value function V = [Inf*ones(n,T) , gT]; U = NaN*zeros(n,T); % the recursion runs backwards in time for t = T:(-1):1 % we can compute the minimum cost-to-go and a corresponding action % simultaneously for all states [V(:,t),U(:,t)] = min(gxu + repmat(V(:,t+1)' , n , 1) , [] , 2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % State trajectory simulation % Having found the optimal policy, we can simulate the state trajectory %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% x = [1 ; NaN*ones(T-1,1)]; for t = 1:T x(t+1) = U(x(t),t); end