Unsupervised learning#

In unsupervised learning we start with a data matrix:

http://www.stanford.edu/class/stats202/figs/tikz/unsupervised.png

Fig. 2 Unsupervised learning setup#

  • Quantitative, eg. weight, height, number of children, …;

  • Qualitative, eg. college major, profession, gender, …;


Goals of unsupervised learning#

In unsupervised learning we start with a data matrix:

Our goal is to:

  1. Find meaningful relationships between the variables or units: Correlation analysis.

  2. Find interpretable low-dimensional representations of the data which make it easy to visualize the variables and units. PCA, ICA, isomap, locally linear embeddings, etc.

  3. Find meaningful groupings of the data. Clustering.

Unsupervised learning is sometimes referred to in Statistics as exploratory data analysis.


Striking example#

  • 1387 European subjects were genotyped and differences (SNPs) are measured

  • Can be used to form a distance between subjects.

  • This distance looks surprisingly close to a map of Europe