The materials on this course website are only for the use of students enrolled in this course for purposes associated with this course and may not be retained or further disseminated. The materials on this course website may be protected by copyright; any further use of this material may be in violation of federal copyright law.
The goal of the course is to provide hands on instruction in
data management and analysis techniques.
Topics discussed include:
Professor |
Teaching Assistant |
Raymond R. Balise Balise at Stanford |
Mike Hurley |
Admission to Health Research and Policy and a comfortable knowledge of a Windows XP/Vista/7/8.
Monday and Wednesday 11:30 -1:00 in LK208 starting September 23th.
By appointment in Redwood Building T213D. Directions can be found here: www.stanford.edu/~balise/FindBalise.htm
There is a newsgroup form on the coursework site.
The Little SAS Book for Enterprise Guide 4.2.
Common Statistical Methods for Clinical Research with SAS Examples
Read about SQL vs. data step code here:
http://www2.sas.com/proceedings/sugi31/250-31.pdf
Read
about numeric precision here:
http://support.sas.com/techsup/technote/ts654.pdf.
Grades will be based on four homework problem sets. If you take the course for 2 units you must pass at least three of the four homework assignments and you must not violate the virus policy below. If you take the course for 3 units you must pass all four assignments. There will be many quick assignments that will not directly affect grades.
All assignments and homework will be submitted via the coursework website. Any student that sends me or the TA a virus (or any other malicious code) will fail the course. There will be no exceptions made. Therefore, you are strongly advised to download the latest version of the Sophos Anti-Virus software. If you need virus protection check here http://www.stanford.edu/services/ess/ and you can download the software for free. If you have any questions ask!
Each of the assignments will be due at the beginning of class on the day specified.
That said, there are unforeseen emergencies (illness, bike accidents, disk crashes, network troubles, childbirth, etc.). Instead of having to ask for special allowances on an individual basis, I give each of you the privilege of granting yourself a small extension in case of crisis. You will have two late days which you may use to extend the due dates of any assignments without penalty. To avoid any ambiguity, there are seven days in a week and each day ends at 5:00 PM. Thus, if your assignment was due on Wednesday but turned in the following Monday before 5:00, that assignment would be five days late. After the grace period is up each assignment is down weighted 20% per day. In all cases, assignments will not be accepted more than one calendar week after the original assignment due date.
The programs that you turn in must run on Windows SAS 9.4 and/or Enterprise Guide 6.1. I can provide good support for Windows or a Mac running parallels (http://www.parallels.com/).
Pictures
showing my SAS 9.4 TS1M1 and EG 6.1 install can be found here.
Notes
on configuring SAS Enterprise Guide 6.1 will be found here.
Notes
on using an IRT Server for SAS can be found here.
Be sure to set up you EG using the notes here.
You
can see the class in color by clicking here.
Note:
All dates are approximate and subject to change except HW 4.
Reading: The tutorials A, B, and C in TLSB for EG.
Software somebody should have told
you about a long time ago
Essential Stanford Software
(free stuff)
Tools of the trade
Other software I use (not
officially endorsed by anybody and ask your IT security person for help)
Using SAS Enterprise Guide as a
calculator
The PowerPoint slides are here.
How
What is a database?
Critical registry teaks for
Excel
How to organize data in Excel
Variable names
Dummy records
Using Excel
Making tables
Validation
Formulas
Quick counts
Collecting data in general
How to score and store answers
The value/danger of redundancy
Using REDCap
What is it?
How to set it up.
The PowerPoint slides are here.
The sample .htaccess file is here. (To save the file you may need to right click on the hyperlink and choose something like save Target as or save Link as …)
An Enterprise Guide project to easily load data from REDCap into SAS 9.3 is here or into 9.4 is here.
Reading: Chapters 1, 2 and 4 in TLSB for EG. Skim chapter 3.
How SAS works in Windows
Types of Files / Suffixes
Libraries vs Folders
Loading data from Excel
Simple subsets
Creating
new variables
Converting character variables to numeric with the input function
Basic numeric summaries
Making better summary tables
The PowerPoint slides are here.
An Excel 2007 spreadsheet showing mixed type issues is here and the EG project is here.
An Excel 2007 spreadsheet with fake drowning data is here and the EG project is here.
Homework 1 is here and is due Oct 16th.
Setting up Windows
File extensions
Folders
Office
DOS
Configuring SAS Enterprise Guide
Application options
Keyboard macros
Using SAS
Libraries
Subsets
Toy data sets
Loading real data sets
Simple descriptive statistics and graphics
The PowerPoint slides are here.
Reading: TLSB for EG 4.2 chapter 4.
Procedures vs Functions
Code for data step
Labels
Formats
Code for SQL
Labels
Formats
GUI for SQL
Process flow sheets and project organization
Autoexec flow sheets
Creating Formats with the GUI
Data step if-else
Data step select-when
SQL case-when-then
Showing combinations
The slides are here and the completed project is here. (Open the project after the class.)
Reading: TLSB for EG 4.2 chapters 3, 4 and 9.
Creating data
Recoding values to make data
Project organization
Fixing bad values
Using formats
The PowerPoint slides here.
An Excel spreadsheet is here and an EG project is here.
Reading: Tutorial D TLSB for EG 4.2 chapters 4 and 6.
SQL joins
Inner joins
Left joins
The coalesce()
and coalescec() functions
The slides are here.
An Enterprise Guide project that uses SQL to merge files is here.
Reading: Tutorial D TLSB for EG 4.2 and chapters 4 and 6.
Joins
Summary functions with SQL
Fixing bad values
By using functions like left
By recoding
Finding discrepancies when
matches fail
The tasks for today are here.
More SQL and querying voodoo is covered in a project here.
This assignment uses information in an Excel workbook that can be found here.
The partially complete project is here.
Reading: Tutorial D TLSB for EG 4.2 and chapter 4 and 6.
SQL to select subsets of records from multiple tables
Outer union and/or outer union corresponding
concatenate
Union unique rows from both
tables
Except rows that are part of
the first query
Intersect rows common to both
queries
The slides are here.
The Enterprise Guide 5.1 file for today is here and the 6.1 version is here.
EG prompts
macro variables
%macro %mend
Examples of macros
Dot/bar plot
Quantile plot
Calculating sensitivity, specificity, PPV, NPV
Binomial probabilities
Checking survival data
The slides are here.
An Enterprise Guide project showing macros is here.
proc format library = blah
options fmtsearch = (blah work);
proc cport memtype = all;
proc cimport
options nofmterr;
By processing
Retain statements
Finding
the labs closest to an event
The slides are here.
An Enterprise Guide project showing working with grouped data is here.
Queries
and corrections when data is not normalized
Normalizing data with arrays and loops in a data step
The slides are here.
An Enterprise Guide project is here for EG 5.1 or here for EG 6.1.
A dataset is here.
Limiting out output from SAS
How SAS creates different types of output
Writing output to different file types
SAS output table names
The slides are here.
An Enterprise Guide project is here.
My SAS header file is here.
Reading: TLSB for EG chapter 10
What makes a bad graphic bad
The gurus of data visualization
A good graphic
Sgplot vbox inset
styleattrs datacolors=(black)
datacontrastcolors=(grey grey grey
red) datasymbols=(circle circle circle
circlefilled);
Visualizing categorical data
Bar charts
Dot plot
Visualizing continuous data
Histograms
Box plos
Violin plot
How to get image formats for publication
Working with the SAS ODS
ODS Statistical Graphics
SAS Graphics Editor
ods listing sge
= on;
Setting up R
Tinn-R
PERL
High quality SAS graphics
Sgplot hbox / category =
Sgplot scatter
Sgplot reg / group =
Sgplot histogram
Sgpanel panelby histogram
Sgpanel panelby scatter
Lattice plots in SAS
Ods graphics on
Selecting a subset of ODS graphics
ODS statistical graphics with many elements
Truly custom ODS statistical graphics 9.3
proc sgplot data = plot tmplout="c:\blah\template.sas";
ods path (prepend) work.template (update);
Custom appearance/style template:
proc template;
define style sexE;
parent = styles.Statistical;
style graphdata1 / contrastColor=pink markersymbol = "star";
style graphdata2 / contrastColor=blue markersymbol = "plus";
end;
run;
ods listing style = sexE;
Making big pictures:
goptions hsize=11in vsize=8.5in;
ods graphics / width=11in height=8.5in;
The
slides are here. (Do not print the entire
set)
Enterprise Guide project is here.
The
slides are here for the 20th (Do
not print the entire set)
Enterprise Guide project is here for the 20th.
The slides
are here for the 2nd.
Enterprise Guide project is here for the 2nd.
SAS code showing ods graphics can be found here.
SAS code showing setting colors in graphics with style templates is here.
R code for common R graphics is here along with data sets here and here.
R code for categorical data is here.
The slides are here.
The example program that shows the getLabel macro is here.
SAS 2013 keyboard macros can be found here.
A
set of useful links can be found here.
(You
can get many of these from the Stanford library but preview them on Amazon or
books.google.com)
A
few of my old favorite books are listed here
My
favorite scientific visualization books are here.