One of my tasks in the Winter Quarter is to co-teach a new MBA course with Guido Imbens entitled “Programming for Data Analysis” (OIT 537). There are a lot of useful tools that folks use to do data analysis, but in this compressed course, we wanted to stay focused on tools that can be quickly installed, and provide a pleasant learning environment.
RStudio saves the day
RStudio is a better way to program in R for beginners than anything that currently exists for Python. Plus, it’s cross-platform and open source, and the installation process is easy. Even though I’d much rather code in Python than R, if I’m teaching, I can’t waste my time with Anaconda or iPython notebook or Emacs or the tools that I use myself. They’re just too tough to set up, and too esoteric for your average student. RStudio is a nice balance.
What about SQL?
I’m not very good at SQL, but I’m aware that it’s a very powerful way to do some high level manipulations with data. Unfortunately, there’s not really a “SQL Studio” that I know of that provides that easy, cross-platform learning environment, particularly for datasets of the size that motivate learning something other than Excel.
Right now, I’m leaning towards using SQLite, due to its easy set-up time, but that’s like saying I’m using R instead of Python.
What IDE will make it easy to learn SQL?
In the worst case, I can always use SQL from within R, but that won’t do helpful things like syntax highlighting and auto-completion.
- DB Browser for SQLite looks like it might be good!
- SQLite Studio looks decidedly less good
- Something else from this list might also work
What does Hadley say about R and SQL?
Hadley Wickham, the driving force behind
ggplot
, reshape2
, plyr
, shiny
and even RStudio, has written
about integrating R with databases. He
recommends
and SQLite for starting out. He also makes some good suggestions
about best practices for integrating the two.