Dan Yamins
  • Home
  • Research
    • Computational Models of Sensory Cortex
    • Probing the Nature of Visual Representations
    • Programming for Multiagent Systems Biology
  • Publications
  • Software
  • Teaching
    • High-Throughput Data Wrangling
    • Biological Multi-agent Systems
    • Dynamical Systems Theory
    • Stochastic Control
    • Physical Mathematics
  • Misc
    • 盆景
    • Orgel
    • 亚洲文化
    • History Club
  • Contact
I have helped develop a variety of open-source software projects for use in data analysis, machine learning, and computer vision.

Large-scale Data Analysis:
  • Tabular.  Tabular is package of Python modules for working with tabular data.  Its main object is the  tabarray class, a data structure for holding and manipulating tabular data.  Tabarrays generate a data representation that is more flexible and powerful than a native Python representation.   https://github.com/yamins81/tabular
  • StarFlow.   StarFlow is script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language.  http://www.eecs.harvard.edu/~elaine/pubs/ipaw10.pdf
  • GovData.   A package of parsers for government data.   See these Github repositories for the core data structures and a set of dataset-specific parsers.   In the winter of 2011,  I used these tools to teach this MIT IAP course providing an introduction to computational data analysis and management.
  • APyMongo.   A Tornado-based asynchronous version of the PyMongo driver for the MongoDB database. https://github.com/govdata/APyMongo

Machine Learning & Computer Vision:
  • Skdata.  Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.  http://jaberg.github.io/skdata/
  • Hyperopt.   Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.    https://github.com/hyperopt/hyperopt
  • Genthor.  A pacakge of python interfaces for creating photorealistic 3d rendered scenes, based on the Panda3d graphics engine.  https://github.com/dicarlolab/genthor

I have made a few minor contributions to:
  • Scikit-Learn.  http://scikit-learn.org/stable/
  • StarCluster   http://star.mit.edu/cluster/