Introduction to Machine Learning for Materials Science
Citrine Informatics, May 2020
Context: Machine learning has become a powerful new tool within the field of materials science, with more professors interested in teaching their research tools and more students wanting to expand their skill sets than ever. As such, Professor Dane Morgan and his student Ben Afflerbach from the Computational Materials Group at UW-Madison reached out to Citrine about collaborating on a 1-week curriculum.
Challenge: Through interviewing professors and surveying existing course offerings, the team identified the following pain points: Materials science professors interested in teaching machine learning often don't have time to create their own course, but they are also reluctant to reuse other's materials because of different teaching focuses. On the other hand, the generic examples used in most introductory ML courses can't help materials science students learn how ML can be applied to materials science or their own research.
Action: The overall goal of this project is to create a curriculum that is 1) modular and therefore easily reusable by faculty with different teaching focuses, and 2) grounded in materials science examples to make learning relevant for students. Over 12 weeks, I worked with Ben (the SME) to define learning objectives, write a course outline, and create a 1-week curriculum consisting of six 20-minute slide-based lectures and one 2-hour lab activity based on Jupyter Notebook.
Result: The six modules, spanning a high-level overview of ML in materials science, a full supervised workflow using a decision tree model and a bandgap dataset, challenges in materials science applications, and common tools, can be taught in any combination according to the instructor's needs. The lab activity is available on Nanohub. The full curriculum is scheduled to be piloted in Fall 2020 at UW-Madison.
Image: Screenshots of the Jupyter Notebook lab activity.
- Jupyter Notebook
- Instructional Design
- Project Management
Insights and Lessons
- Instructors value ease of grading.
- Real materials datasets are messy. There's a trade-off between authenticity and pedagogical appropriateness.