Benjamin Van Roy: PublicationsPublications are listed below by year of posting if they have not been published, or year of publication. 2024Henrik Marklund and Benjamin Van Roy, Choice between Partial Trajectories. Hong Jun Jeon and Benjamin Van Roy, Information-Theoretic Foundations for Machine Learning. Hong Jun Jeon and Benjamin Van Roy, Information-Theoretic Foundations for Neural Scaling Laws. Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, and Benjamin Van Roy, Continual Learning as Computationally Constrained Reinforcement Learning. Hong Jun Jeon, Jason D. Lee, Qi Lei, and Benjamin Van Roy, An Information-Theoretic Analysis of In-Context Learning, ICML, 2024. Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, and Benjamin Van Roy, Efficient Exploration for LLMs, ICML, 2024. Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy, RLHF and IIA: Perverse Incentives, ICML Workshop MHFAIA, 2024. Saurabh Kumar, Henrik Marklund, and Benjamin Van Roy, Maintaining Plasticity in Continual Learning via Regenerative Regularization, CoLLAs, 2024. Wanqiao Xu, Shi Dong, and Benjamin Van Roy, Posterior Sampling for Continuing Environments, RLC, 2024. Yueyang Liu, Xu Kuang, and Benjamin Van Roy, Non-Stationary Bandit Learning via Predictive Sampling. Anmol Kagrecha, Henrik Marklund, Benjamin Van Roy, Hong Jun Jeon, and Richard Zeckhauser, Adaptive Crowdsourcing Via Self-Supervised Learning. 2023Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen, Reinforcement Learning, Bit by Bit, Foundations and Trends in Machine Learning, Vol. 16, No. 6, pp. 733-865, 2023. Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, and Benjamin Van Roy, Epistemic Neural Networks, NeurIPS, 2023. David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh, A Definition of Continual Reinforcement Learning, NeurIPS, 2023. Zheqing Zhu and Benjamin Van Roy, Deep Exploration for Recommendation Systems, ACM RecSys 2023. Zheqing Zhu and Benjamin Van Roy, Scalable Neural Contextual Bandit for Recommender Systems, CIKM 2023. Hong Jun Jeon, Yifan Zhu, and Benjamin Van Roy, An Information-Theoretic Framework for Supervised Learning. Yueyang Liu, Benjamin Van Roy, and Kuang Xu, Non-Stationary Bandit Learning via Predictive Sampling, AISTATS, 2023. Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, and Benjamin Van Roy, Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping, Transactions on Machine Learning Research, May 2023. Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, and Benjamin Van Roy, Approximate Thompson Sampling via Epistemic Neural Networks, UAI, 2023. Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, and Zheng Wen, Leveraging Demonstrations to Improve Online Learning: Quality Matters, ICML, 2023. 2022Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, and Benjamin Van Roy, Robustness of Epinets against Distributional Shifts. Yifan Zhu, Hong Jun Jeon, and Benjamin Van Roy, Is Stochastic Gradient Descent Near Optimal? Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, and Kuang Xu, Gaussian Imagination in Bandit Learning. Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, and Benjamin Van Roy, From Predictions to Decisions: The Importance of Joint Predictive Distributions. Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy, On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning, NeurIPS Workshop on Information-Theoretic Principles in Cognitive Systems, 2022. Dilip Arumugam and Benjamin Van Roy, Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning, NeurIPS, 2022. Hong Jun Jeon and Benjamin Van Roy, An Information-Theoretic Framework for Deep Learning, NeurIPS, 2022. Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, and Benjamin Van Roy, The Neural Testbed: Evaluating Joint Predictions, NeurIPS, 2022. Chao Qin, Zheng Wen, Xiuyuan Lu and Benjamin Van Roy, An Analysis of Ensemble Sampling, NeurIPS, 2022. Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, and Benjamin Van Roy, Evaluating High-Order Predictive Distributions in Deep Learning, UAI, 2022 Shi Dong, Benjamin Van Roy, Zhengyuan Zhou, Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State, Journal of Machine Learning Research, Vol. 23, pp. 1-54, 2022. Daniel Russo and Benjamin Van Roy, Satisficing in Time-Sensitive Bandit Learning, Mathematics of Operations Research, Vol. 47, No. 4, pp. 2815-2839, 2022. Dilip Arumugam and Benjamin Van Roy, Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning, RLDM 2022. 2021Vikranth Dwaracherla and Benjamin Van Roy, Langevin DQN. Adithya Devraj, Benjamin Van Roy, and Kuang Xu, A Bit Better? Quantifying Information for Bandit Learning. Dilip Arumugam and Benjamin Van Roy, The Value of Information When Deciding What to Learn, NeurIPS, 2021. Dilip Arumugam and Benjamin Van Roy, Deciding What to Learn: A Rate-Distortion Approach, ICML, 2021. 2020Zhang Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh, On Efficiency in Hierarchical Reinforcement Learning, NeurIPS, 2020. Dilip Arumugam and Benjamin Van Roy, Randomized Value Functions via Posterior State-Abstraction Sampling, NeurIPS Workshop on Biological and Artificial Reinforcement Learning, 2020. Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Benjamin Van Roy, Zheng Wen, Hypermodels for Exploration, ICLR, 2020. Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepezvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt, Behavior Suite for Reinforcement Learning, ICLR, 2020. 2019Benjamin Van Roy and Shi Dong, Comments on the Du-Kakade-Wang-Yang Lower Bounds. Xiuyuan Lu and Benjamin Van Roy, Information-Theoretic Confidence Bounds for Reinforcement Learing, NeurIPS, 2019. Shi Dong, Tengyu Ma, and Benjamin Van Roy, On the Performance of Thompson Sampling on Logistic Bandits, COLT, 2019. Ian Osband, Daniel Russo, Benjamin Van Roy, Zheng Wen, Deep Exploration via Randomized Value Functions, Journal of Machine Learning Research, Vol. 20, No. 124, pp. 1-62, 2019. 2018Shi Dong and Benjamin Van Roy, An Information-Theoretic Analysis for Thompson Sampling with Many Actions, NeurIPS, 2018. Maria Dimakopoulou, Ian Osband, and Benjamin Van Roy, Scalable Coordinated Exploration in Concurrent Reinforcement Learning, NeurIPS, 2018.demo Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband and Z. Wen, A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning, Vol. 11, No. 1, pp. 1-96, 2018. code Maria Dimakopoulou and Benjamin Van Roy, Coordinated Exploration in Concurrent Reinforcement Learning, ICML, 2018. demo Daniel Russo and Benjamin Van Roy, Learning to Optimize Via Information-Directed Sampling, Operations Research, Vol. 66, No. 1, pp. 230-252, 2018. 2017Xiuyuan Lu and Benjamin Van Roy, Ensemble Sampling, NeurIPS, 2017. Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy, Conservative Contextual Linear Bandits, NeurIPS, 2017. Ian Osband and Benjamin Van Roy, Why is Posterior Sampling Better than Optimism for Reinforcement Learning? ICML, 2017. Ian Osband and Benjamin Van Roy, On Optimistic versus Randomized Exploration in Reinforcement Learning, RLDM, 2017. Zhang Wen and Benjamin Van Roy, Efficient Exploration and Value Function Generalization in Deterministic Systems, Mathematics of Operations Research, Vol. 42, No. 3, pp. 762-782, 2017. 2016Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, Deep Exploration Via Bootstrapped DQN, NeurIPS, 2016. Ian Osband, Benjamin Van Roy, and Zheng Wen, Generalization and Exploration Via Randomized Value Functions, ICML, 2016.supplementary material Daniel Russo and Benjamin Van Roy, An Information-Theoretic Analysis of Thompson Sampling, Journal of Machine Learning Research, Vol. 17, pp. 1-30, 2016. 2015Beomsoo Park and Benjamin Van Roy, Adaptive Execution: Exploration and Learning of Price Impact, Operations Research, Vol. 63, No. 5, pp. 1058-1076, 2015. 2014Daniel Russo and Benjamin Van Roy, Learning to Optimize Via Information-Directed Sampling, NeurIPS, 2014. Ian Osband and Benjamin Van Roy, Model-Based Reinforcement Learning and the Eluder Dimension, NeurIPS, 2014. Ian Osband and Benjamin Van Roy, Near-Optimal Reinforcement Learning in Factored MDPs, NeurIPS, 2014. Daniel Russo and Benjamin Van Roy, Learning to Optimize Via Posterior Sampling, Mathematics of Operations Research, Vol. 39, No. 4, pp. 1221-1243, 2014. Yi-Hao Kao and Benjamin Van Roy, Directed Principal Component Analysis, Operations Research, Vol. 62, No. 4, pp. 957-972, 2014. 2013Daniel Russo and Benjamin Van Roy, Eluder Dimension and the Sample Complexity of Optimistic Exploration, NeurIPS, 2013. Ian Osband, Daniel Russo, and Benjamin Van Roy, (More) Efficient Reinforcement Learning Via Posterior Sampling, NeurIPS, 2013. Zheng Wen and Benjamin Van Roy, Efficient Exploration and Value Function Generalization in Deterministic Systems, NeurIPS, 2013. Yi-Hao Kao and Benjamin Van Roy, Learning a Factor Model Via Regularized PCA, Machine Learning, Vol. 91, No. 3, pp. 279-303, 2013. 2012Zheng Wen, Lou Durlofsky, Benjamin Van Roy, and Khalid Aziz, Approximate Dynamic Programming for Optimizing Oil Production, Chapter 25 in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, edited by F. L. Lewis and D. Liu, Wiley-IEEE Press, 2012. Morteza Ibrahimi, Adel Javanmard, and Benjamin Van Roy, Efficient Reinforcement Learning for High Dimensional Linear Systems, NeurIPS, 2012. Michael Padilla and Benjamin Van Roy, Intermediated Blind Portfolio Auctions, Management Science, Vol. 58, No. 9, pp. 1747-1760, 2012. Ciamac Moallemi, Beomsoo Park, and Benjamin Van Roy, Strategic Execution in the Presence of an Uninformed Arbitrageur, Journal of Financial Markets, Vol. 15, pp. 361-391, 2012. Anant Chairawongse, Seksan Kiatsupaibul, Sunti Tirapat, and Benjamin Van Roy, Portfolio Selection with Qualitative Input, Journal of Banking and Finance, Vol. 36, No. 2, pp. 489-496, 2012. 2011Gabriel Weintraub, Lanier Benkard, and Benjamin Van Roy, Industry Dynamics: Foundations for Models with an Infinite Number of Firms, Journal of Economic Theory, Vol. 146, No. 5, pp. 1965-1994, 2011. Ciamac Moallemi and Benjamin Van Roy, Resource Allocation Via Message Passing, INFORMS Journal on Computing, Vol. 23, No. 2, pp, 205-219, 2011. Jiarui Han and Benjamin Van Roy, Control of Diffusions Via Linear Programming, in Stochastic Programming: The State of the Art, in Honor of George Dantzig, edited by Gerd Infanger, pp. 329-354, Springer, 2011. Zheng Wen, Lout Durlofsky, Benjamin Van Roy, and Khalid Aziz, Use of Approximate Dynamic Programming for Production Optimization, SPE Proceedings, 2011. 2010Benjamin Van Roy and Xiang Yan, Manipulation Robustness of Collaborative Filtering, Management Science, Vol. 56, No. 11, pp. 1911-1929, 2010. Benjamin Van Roy, On Regression-Based Stopping Times, Discrete Event Dynamic Systems, Vol. 20, No. 3, pp. 307-324, 2010. Ramesh Johari, Gabriel Weintraub, and Benjamin Van Roy, Investment and Market Structure in Industries with Congestion, Operations Research, Vol. 58, No. 5, 2010, pp. 1303-1317. Ciamac Moallemi and Benjamin Van Roy, Convergence of the Min-Sum Algorithm for Convex Optimization, IEEE Transactions on Information Theory, Vol. 56, No. 4, pp. 2041-2050, 2010. Gabriel Weintraub, Lanier Benkard, and Benjamin Van Roy, Computational Methods for Oblivious Equilibrium, Operations Research, Vol. 58, No. 4, pp. 1247-1265, 2010. Matlab code (updated July 2012) Vivek Farias, Ciamac Moallemi, Benjamin Van Roy, and T. Weissman, Universal Reinforcement Learning, IEEE Transactions on Information Theory, Vol. 56, No. 5, pp. 2441-2454, 2010. Vivek Farias and Benjamin Van Roy, Dynamic Pricing with a Prior on Market Response, Operations Research, Vol. 58, No. 1, pp. 16-29, 2010. 2009Yi-Hao Kao, Benjamin Van Roy, and Xiang Yan, Directed Regression, NeurIPS, 2009. Benjamin Van Roy and X. Yan, Manipulation-Resistant Collaborative Filtering Systems, Proceedings of the Third ACM Conference on Recommender Systems, 2009. C. C. Moallemi and Benjamin Van Roy, Convergence of Min-Sum Message Passing for Quadratic Optimization, IEEE Transactions on Information Theory, Vol. 55, No. 5, pp. 2413-2423, 2009. 2008Gabriel Weintraub, Lanier Benkard, and Benjamin Van Roy, Markov Perfect Industry Dynamics with Many Firms, Econometrica, Vol. 76, No. 6, 2008, pp. 1375-1411. Technical Appendix Xiang Yan and Benjamin Van Roy, Reputation Markets, Proceedings of the ACM SIGCOMM 2008 Workshop on Economics of Networks, Systems, and Computation. Haim Permuter, Paul Cuff, Benjamin Van Roy, and T. Weissman, Capacity of the Trapdoor Channel with Feedback, IEEE Transactions on Information Theory, Vol. 54, No. 7, pp. 3150-3165, 2008 Ciamac Moallemi, Sunil Kumar, and Benjamin Van Roy, Approximate and Data-Driven Dynamic Programming for Queueing Networks, 2008. 2007Vivek Farias and Benjamin Van Roy, An Approximate Dynamic Programming Approach to Network Revenue Management, 2007. Nathaniel Keohane, Benjamin Van Roy, and Richard Zeckhauser, Managing the Quality of a Resource with Stock and Flow Controls, Journal of Public Economics, Vol. 91, 2007, pp. 541-569. Benjamin Van Roy, Short Proof of Optimality for the MIN Cache Replacement Algorithm, Information Processing Letters, Vol. 102, No. 2, pp. 72-73, 2007. 2006Gabriel Weintraub, Lanier Benkard, and Benjamin Van Roy, Oblivious Equilibrium: A Mean Field Approximation for Large Scale Dynamic Games, NeurIPS, 2006. Ciamac Moallemi and Benjamin Van Roy, Consensus Propagation, IEEE Transactions on Information Theory, Vol. 52, No. 11, pp. 4753-4766, 2006. Ciamac Moallemi and Benjamin Van Roy, Consensus Propagation, NeurIPS, 2006. David Choi and Benjamin Van Roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, Vol. 16, No. 2, April 2006. Paat Rusmevichientong, Benjamin Van Roy, and Peter Glynn, A Non-Parametric Approach to Multi-Product Pricing, Operations Research, Vol. 54, No. 1, 2006, pp. 82-98. Paat Rusmevichientong, Joyce Salisbury, Lynn Truss, Benjamin Van Roy, and Peter Glynn, Opportunities and Challenges in Using Online Preference Data for Vehicle Pricing: A Case Study at General Motors, Journal of Revenue and Pricing Management, Vol. 5, No. 1, pp. 45-61, 2006. Daniela de Farias and Benjamin Van Roy, A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees, Mathematics of Operations Research, Vol. 31, No. 3, pp. 597-620, 2006. Vivek Farias and Benjamin Van Roy, Approximation Algorithms for Dynamic Resource Allocation, Operations Research Letters, Vol. 34, No. 2, March 2006, pp. 180-190. Randy Cogill, Michael Rotkowitz, Benjamin Van Roy, Sanjay Lall, An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems, Lecture Notes in Control and Information Sciences, Springer, Berlin, 2006, Vol. 329, pp. 243-256. Benjamin Van Roy, Performance Loss Bounds for Approximate Value Iteration with State Aggregation, Mathematics of Operations Research, Vol. 31, No. 2, pp. 234-244, 2006. Benjamin Van Roy, TD(0) Leads to Better Policies than Approximate Value Iteration, NeurIPS, 2006. Vivek Farias and Benjamin Van Roy, Tetris: A Study of Randomized Constraint Sampling, in Probabilistic and Randomized Methods for Design Under Uncertainty, G. Calafiore and F. Dabbene, eds., Springer-Verlag, 2006. 2005Daniela de Farias and Benjamin Van Roy, A Linear Program for Bellman Error Minimization with Performance Guarantees, NeurIPS, 2005 Vivek Farias, Ciamac Moallemi, Benjamin Van Roy, and Tsachy Weissman, A Universal Scheme for Learning, IEEE ISIT, 2005. Xiang Yan, Persi Diaconis, Paat Rusmevichientong, and Benjamin Van Roy, Solitaire: Man Versus Machine, NeurIPS, 2005. 2004Randy Cogill, Michael Rotkowitz, Benjamin Van Roy, Sanjay Lall, An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems, Proceedings of the Allerton Conference on Communication, Control, and Computing, 2004. Daniela de Farias and Benjamin Van Roy, On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming, Mathematics of Operations Research, Vol. 29, No. 3, August 2004, pp. 462-478. Hui Zhang, Ashish Goel, Ramesh Govindan, Kahn Mason, and Benjamin Van Roy, Improving Eigenvector-Based Reputation Systems Against Collusion, Workshop on Algorithms and Models for the Web Graph, October 2004. Warren Powell and Benjamin Van Roy, Approximate Dynamic Programming for High-Dimensional Dynamic Resource Allocation Problems, in Handbook of Learning and Approximate Dynamic Programming, edited by J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Wiley-IEEE Press, Hoboken, NJ, 2004, pp. 261-279. Ciamac Moallemi and Benjamin Van Roy, Distributed Optimization in Adaptive Networks, NeurIPS, 2004. appendix 2003Daniela de Farias and Benjamin Van Roy, The Linear Programming Approach to Approximate Dynamic Programming, Operations Research, Vol. 51, No. 6, November-December 2003, pp. 850-865. Ciamac Moallemi and Benjamin Van Roy, Decentralized Protocols for Optimization of Sensor Networks, Proceedings of the Allerton Conference on Communication, Control, and Computing, 2003. Daniela de Farias and Benjamin Van Roy, Approximate Linear Programming for Average-Cost Dynamic Programming, NeurIPS, 2003. Benjamin Van Roy, Book Review: Self-Learning Control of Finite Markov Chains, by A. S. Poznyak, K. Najim, and E. Gomez-Ramirez, Automatica, Volume 39, Issue 2, February 2003, pp. 373-376. 2002Nainesh Agarwal, Julien Basch, Paul Beckmann, Piyush Bharti, Scott Bloebaum, Stefano Casadei, Andrew Chou, Per Enge, Wungkum Fong, Neesha Hathi, Wallace Mann, Anant Sahai, Jesse Stone, John Tsitsiklis, and Benjamin Van Roy, Algorithms for GPS Operation Indoors and Downtown, GPS Solutions, Vol. 6, No. 3, December, 2002, pp. 149-160. John Tsitsiklis and Benjamin Van Roy, On Average Versus Discounted Reward Temporal-Difference Learning, Machine Learning, Vol. 49, No. 2-3, 2002, pp. 179-191. 2001David Choi and Benjamin Van Roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, ICML, 2001. Paat Rusmevichientong and Benjamin Van Roy, A Tractable POMDP for a Class of Sequencing Problems, Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2001. Benjamin Van Roy, Neuro-Dynamic Programming: Overview and Recent Trends, in Handbook of Markov Decision Processes: Methods and Applications, edited by E. Feinberg and A. Shwartz, Kluwer, 2001. John Tsitsiklis and Benjamin Van Roy, Regression Methods for Pricing Complex American-Style Options, IEEE Transactions on Neural Networks, Vol. 12, No. 4 (special issue on computational finance), July 2001, pp. 694-703. P. Rusmevichientong and Benjamin Van Roy, An Analysis of Belief Propagation on the Turbo Decoding Graph with Gaussian Densities, IEEE Transactions on Information Theory, Vol. 47, No. 2, pp. 745-765, 2001. 2000Paat Rusmevichientong and Benjamin Van Roy, An Analysis of Turbo Decoding with Gaussian Priors, NeurIPS, 2000. Nathaniel Keohane, Benjamin Van Roy, and Richard Zeckhauser, The Optimal Harvesting of Environmental Bads, IEEE CDC, 2000. Daniela de Farias and Benjamin Van Roy, On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning, Journal of Optimization Theory and Applications, Vol. 105, No. 3, June, 2000. Daniela de Farias and Benjamin Van Roy, Approximate Value Iteration with Randomized Policies, IEEE CDC, 2000. Daniela de Farias and Benjamin Van Roy, Approximate Value Iteration and Temporal-Difference Learning, IEEE Symposium 2000 on Adaptive Systems for Signal Processing, Communications and Control, 2000. Daniela de Farias and Benjamin Van Roy, Fixed Points for Approximate Value Iteration and Temporal-Difference Learning, Proceedings of the International Conference on Machine Learning, 2000. 1999John Tsitsiklis and Benjamin Van Roy, Average Cost Temporal-Difference Learning, Automatica,Vol. 35, No. 11, November 1999, pp. 1799-1808. Benjamin Van Roy, Temporal-Difference Learning and Applications in Finance, Computational Finance (Proceedings of the Sixth International Conference on Computational Finance, Leonard N. Stern School of Business, January 6-8, 1999), edited by Y. S. Abu-Mostafa, B. LeBaron, A. W. Lo, and A. S. Weigend. Cambridge, MA: MIT Press, 1999. John Tsitsiklis and Benjamin Van Roy, Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing High-Dimensional Financial Derivatives, IEEE Transactions on Automatic Control, Vol. 44, No. 10, October 1999, pp. 1840-1851. 1998Benjamin Van Roy, Learning and Value Function Approximation in Complex Decision Processes, PhD Thesis, Massachusetts Institute of Technology, May 1998. 1997John Tsitsiklis and Benjamin Van Roy, Average Cost Temporal-Difference Learning, IEEE CDC, 1997. John Tsitsiklis and Benjamin Van Roy, Overview of Neuro-Dynamic Programming and a Case Study in Optimal Stopping, IEEE CDC, 1997. John Tsitsiklis and Benjamin Van Roy, Approximate Solutions to Optimal Stopping Problems, NeurIPS, 1997. John Tsitsiklis and Benjamin Van Roy, An Analysis of Temporal-Difference Learning with Function Approximation, IEEE Transactions on Automatic Control, Vol. 42, No. 5, May 1997, pp. 674-690. John Tsitsiklis and Benjamin Van Roy, Analysis of Temporal-Difference Learning with Function Approximation, NeurIPS, 1997. Benjamin Van Roy, Dimitri Bertsekas, Yuchun Lee, and John Tsitsiklis, A Neuro-Dynamic Programming Approach to Retailer Inventory Management, IEEE CDC, 1997. (full length version) Ruby Kennedy, Yuchun Lee, Benjamin Van Roy, Christopher Reed, and Richard Lippman, Solving Data Mining Problems Through Pattern Recognition, Prentice-Hall, 1997. 1996John Tsitsiklis and Benjamin Van Roy, Feature-Based Methods for Large Scale Dynamic Programming, Machine Learning, Vol. 22, 1996, pp. 59-94. Benjamin Van Roy and John Tsitsiklis, Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions, NeurIPS, 1996. 1995Benjamin Van Roy, Feature-Based Methods for Large Scale Dynamic Programming, Master's Thesis, Massachusetts Institute of Technology, January 1995. Ruby Kennedy, Yuchun Lee, Christopher Reed, and Benjamin Van Roy, Solving Pattern Recognition Problems, Unica, 1995. 1993Benjamin Van Roy, Differential Cost Functions for Training Neural Network Pattern Classifiers, Bachelor's Thesis, Massachusetts Institute of Technology, May 1993. |