Psych 45: Stroop stats

In [1]:
%matplotlib inline

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='ticks', context='poster', font_scale=1.5)

Import data file

In [2]:
data = pd.read_csv('./psych45_stroop_stats.csv')
data.drop('when', axis=1, inplace=True)

data.time_normal = data.time_normal.str.strip(' ms').str.replace(',', '').astype(float)
data.time_interfere = data.time_interfere.str.strip(' ms').str.replace(',', '').astype(float)

data['time_diff'] = data.time_interfere - data.time_normal
In [3]:
data.head()
Out[3]:
percent_correct time_normal time_interfere time collected time_diff
0 100.00 669.67 1110.82 4/20/20 10:03 441.15
1 81.82 880.27 1791.45 4/20/20 10:03 911.18
2 98.00 796.71 1002.11 4/20/20 10:03 205.40
3 100.00 1057.00 1407.25 4/20/20 10:03 350.25
4 100.00 977.12 1177.64 4/20/20 10:03 200.52

Remove outliers

In [4]:
max_rt = data.time_interfere.mean() + 2*data.time_interfere.std()
data = data.loc[data.time_interfere < max_rt]
max_rt = data.time_normal.mean() + 2*data.time_normal.std()
data = data.loc[data.time_normal < max_rt]
In [5]:
data.head()
Out[5]:
percent_correct time_normal time_interfere time collected time_diff
0 100.00 669.67 1110.82 4/20/20 10:03 441.15
1 81.82 880.27 1791.45 4/20/20 10:03 911.18
2 98.00 796.71 1002.11 4/20/20 10:03 205.40
3 100.00 1057.00 1407.25 4/20/20 10:03 350.25
4 100.00 977.12 1177.64 4/20/20 10:03 200.52

Analyses

Summary stats

In [6]:
data.describe()
Out[6]:
percent_correct time_normal time_interfere time_diff
count 120.000000 120.000000 120.000000 120.000000
mean 94.866083 921.416583 1196.477417 275.060833
std 7.743249 185.300229 269.690070 216.812020
min 54.050000 527.920000 548.250000 -165.460000
25% 92.657500 796.182500 1027.457500 137.960000
50% 96.225000 891.695000 1158.720000 254.930000
75% 100.000000 1028.357500 1341.720000 361.007500
max 100.000000 1449.750000 1984.260000 1035.830000

What is the distribution of overall accuracy?

In [7]:
g = sns.distplot(data.percent_correct, rug=True,
                 color='dodgerblue')
g.set_xlabel('% correct')
sns.despine(trim=True)

How does condition affect response time?

In [8]:
data_long = pd.melt(data, ['percent_correct'])
data_long = data_long.loc[data_long.variable.isin(['time_normal', 'time_interfere'])]
data_long.loc[data_long.variable == 'time_normal', 'variable'] = 'congruent'
data_long.loc[data_long.variable == 'time_interfere', 'variable'] = 'incongruent'
In [9]:
g = sns.catplot(x='variable', y='value', 
                   aspect=1.5, ci=95,
                   kind = "point",
                   data=data_long, palette='Set2')
g.set_ylabels('RT (ms)')
g.set_xlabels('condition')
plt.locator_params(nbins=5)

What is the distribution of RTs for incongruent vs. congruent trials?

How much longer does it take to respond to an incongruent vs. a congruent trial?

In [10]:
g = sns.distplot(data.time_diff, rug=True, 
                 color='mediumpurple', vertical=True)
g.set_ylabel('RT for incongruent > congruent trials (ms)')
g.hlines(0, 0, .003, linestyles='dashed')
sns.despine(trim=True)
In [ ]:
 
In [ ]: