The PyramidSnapshot Challenge

Our challenge is as follows: autonomously characterize student progress on a graphics-based programming task by looking at intermediate image output.

Metrics

Classification accuracy by knowledge state

The original paper provides benchmark accuracies against which to compare your work. Figure 5a in the original paper reports milestone and knowledge state accuracies on the validation set, corresponding to the 11,000 most popular images in the PyramidSnapshot dataset. The most interesting statistic to compare against is the overall knowledge state accuracy benchmark, shown below for each of the three models discussed in the paper:

Model Overall accuracy by knowledge state
Unit test (N = 15 images) 27.5%
KNN ( K = 100, N = 11000) 55.2%
Neural network (N = 11000) 64.9%

Where N is the number of images, sorted by popularity, used during training for each of the models. The validation set is the top 11,000 images in the dataset, scaled by popularity (so 71,489 images). Note that this means that the top-performing neural network had access during training to all images in the validation set. We believe this is reasonable, as popular images appear many times across different students. Please use these statistics in your work.

Effort as a metric

Since the validation set is the top 11,000 images in the dataset, we also are very interested in solving how to maximize overall accuracy while minimizing the effort required to train a classifier. We define effort as the train set size, N. In our work, for a train set of size N we define our train set to be composed of the top N most popular images in our dataset. The accuracy of our three baselines when trained on the top 100 images are:

Model Overall accuracy by knowledge state
Unit test (N = 15 images) 27.5%
KNN ( K = 100, N = 100) 21.2%
Neural network (N = 100) 60.7%

Note that you are free to choose any N images in the dataset to use as your train set when you perform your evaluation; when doing so in your work, please be clear as to how you composed your train set.

Dataset Download

The dataset can be downloaded here: pyramidsnapshot.zip

File Descriptions

The actual dataset released has a small reduction in size from the published work. We have elected to retract snapshot images that expose identifiable student information, which slightly changes the dataset count. However, this should not detract from training accuracies. The released dataset size is 72,472 images, of which 94 were

Milestone Label

Milestone Short name Description
1 Hello world Exactly the right brick size (text, centering assist lines are okay).
2 Single row A single row of at least 3 bricks in length; might contain a single rogue brick.
3 Diagonal A diagonal or column of at least 3 bricks in length.
4 Two row Two rows with different vertical offsets, each with at least 3 bricks.
5 Rectangle A rectangle structure with at least 3 bricks in each row; might contain a rogue row/diagonal.
6 Parallelogram A structure with horizontal offset, with a constant number of bricks (±1) per row.
7 Right triangle Right triangle or trapezoid, with a varying number of bricks per row. No horizontal offsets.
8 Column structure Column or diagonal structure, with either constant or varying number of bricks per column/diagonal.
9 Scalene triangle Milestone 7 with horizontal offsets.
10 Pyramid-like Roughly symmetric in horizontal offset, but definitely not a pyramid. Might be a pyramid with random holes.
11 Offset pyramid Definitely a pyramid, but vertically or horizontally offset from the bottom-center of the screen. May contain centering assist lines.
12 Offset extra credit Milestone 11 with color or extra items that seem intentional.
13 Perfect pyramid A pyramid properly centered horizontally and aligned with the bottom of the visible screen.
14 Perfect pyramid with extra credit Milestone 13 with color or extra items that seem intentional.
15 Off-track Anything that doesn't fall into the other milestone categories. In particular, includes images with multiple milestones (e.g., 2 pyramids).
16 Brick wall Basically the entire visible screen is covered with offset bricks.

Knowledge states

I grouped the milestones into Knowledge states which correspond to conceptual dimensions of student thinking.

Knowledge state Milestones Description
1 2, 3, 4 Single row (e.g., a single loop)
2 5, 7 Nested loop
3 6, 9, 10, 11 Adjusting nested offset
4 12, 13, 14 Adding final details
5 1, 8, 15, 16 Other/off-track

Notes

The notes file has three columns: Image (the image filename), Identifiable, and Notes.

Note Example image Applicable milestones Description
1-16 - 15 In the case of multiple milestones in an off-track image (Milestone 15, Note 43), identifies which milestones are in this image. Special cases are denoted below.
5 01438.png 6 Could also be classified as a rectangle. Looks like a rectangle with correct offsets.
7 00237.png 9 Could also be classified as a right triangle. Looks like a right triangle with correct offsets.
17 00826.png 1,11,13 Text or centering assist lines that do not detract from the milestone classification.
18 00117.png 2,3 One single brick floating around in the Single row knowledge state (Milestones 2, 3).
19 00342.png 2,3,4,5,8,10 Three bricks making one row, column, or diagonal.
20 00048.png 3,8 Column structure (as opposed to a diagonal one).
21 00484.png 2 Row gap (horizontally), so could also be classified as Milestone 4.
22 00672.png 4 Overlapped row (horizontally).
23 01471.png, 00494.png 8 Two columns or two diagonals.
24 00399.png 5,6,7,9,10,15 Extra row, column, or diagonal but does not detract too much from milestone.
25 00867.png 5 Row gaps (vertically).
26 03218.png 7 Row overlaps (vertically).
27 00615.png 8 Diagonal structure (as opposed to a column structure).
28 00042.png, 00435.png 8 Adjust number of bricks in column or diagonal structure (i.e., Knowledge state 3 for Milestone 8).
29 00259.png 9,10 Trapezoid (more than 1 brick in the row with the smallest number of bricks).
30 00687.png 9 Asymmetric pyramid (i.e., an acute scalene, versus an obtuse scalene).
31 00892.png 9,11,12,13 Off-screen so could be classified as Milestone 10 or Milestone 11.
32 00249.png 9,10 Top brick is huge or gap in pyramid (probably from diagonal approach).
33 00057.png 10 Upside-down pyramid.
34 00020.png 10 Every other row is somehow aligned (kinda symmetric).
35 00496.png 10 Extra correctly offset bricks at bottom or top.
36 01850.png 10 Slight column gaps here and there.
37 00506.png 15 Brick but wrong shape/extra items that are not text or centering assist lines.
38 00082.png 15 One row, two bricks.
39 00029.png 15 Three-brick pyramid.
40 00940.png 15 Attempt at row, but used an offset of a single pixel.
41 01116.png 15 Seemingly unrelated rows with some vertical or horizontal offset.
42 00418.png 15 Pyramid but missing a significant portion of bricks somewhere.
43 00944.png 15 Multiple milestones in a single image. Generally comes with additional milestone labels.
44 01965.png 15 Working on other assignments (e.g., Checkerboard or Target).

Additional information

The public-facing dataset has the files listed above. If you would like to study this dataset with student tags, we request that you contact the authors with an approved IRB request for your research. The following is available with an IRB:

Citing this work

If you use the PyramidSnapshot dataset in your research, we ask that you please cite the following:

Lisa Yan, Nick McKeown, and Chris Piech. "The PyramidSnapshot Challenge: Understanding student process from visual output of programs," in Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE), February 2019.

ACM library link

Our paper is also available at this link.