
Data Analysis
Over the years, I've been involved in projects that span various aspects of cognitive psychology, from working memory training to the effects of neuromodulation (via taVNS) on second language acquisition. I've had the pleasure of picking up a few programming, data management, project management, and data analysis skills along the way. Below a few examples of typical statistical analyses I use, mostly using R software. Though, for more complex datasets I'd suggest numpy in Python, which is very useful for multidimensional arrays (similar to MATLAB).

The Basics: ANOVAs
How do we assess whether certain groups or conditions are different from one another on some metric?

Linear Regression
How do we model linear effects with continuous data (so we don't need to create arbitrary groups)?

Generalized Additive Mixed Effects Models (GAMMs)
How do we model non-linear trends without having a priori assumptions about the shape of the function?
ANOVAs
How do we assess whether certain groups or conditions are different from one another on some metric?
In my research investigating the neural mechanisms that underlie reading comprehension, I tested whether the N400 ERP component is a better indicator of word-to-text integration (linking a currently read word to previously-read text) or prediction (predicting what event will occur next). The N400 is a negative-going component usually spanning 300 - 500 ms post-stimulus, where greater negativity is linked to more difficult semantic processing. I lead a team of research assistants in developing stimuli, collecting neural data, and analyzing results.
The design had four conditions, with two conditions in which words across a sentence boundary could be integrated with the prior sentence: 1) high integrability and low predictability and 2) high integrability and low predictability. Each condition had a baseline (conditions 3 & 4) that was low on both integrability and predictability (i.e., should be difficult to integrate). ANOVAs and t-tests reveled that the N400 was sensitive to integrability (low predictability condition), but high predictability did not add any additional benefit to processing.
​
The Ez package was used in R for ANOVAs. T-tests were conducted using the t.test() function in R.


ERP waveforms at a central electrode cluster, Cz from the EGI 128 channel system. ANOVAs were conducted at midline cites (Fz, Cz, and Pz) in a 3 (Electrode) x 4 (Condition) analysis in a 300 - 500 ms time window, with main effects of Electrode cluster and Condition. Follow-up Bonferroni corrected t-tests compared the two predictability conditions to one another, and each to their respective baselines.
Typical Regression setup (in R using lm())
​
ReadingComp = βintercept + βGender + β age + β vocabulary + PhonologicalAbility + Spelling
​
See example lm() output below. Followed by graphs of significant effects (using the `effects` package).
​
Sample size = 178. Data are a subset of a larger dataset from my dissertation work.
​
​
​
​
​



Linear Regression and Linear Mixed Effects Regressions
How do we model linear effects with continuous data (so we don't need to create arbitrary groups)?
​
Although ANCOVAs are good for investigating group differences to treatments while accounting for covariates (additional variables that may influence outcomes other than our main manipulation), linear regression are better suited for modeling continuous variables that we are interested in. So instead of showing differences between groups while accounting for something like age, we model age more directly.
To the left is an example of a simple linear regression with reading comprehension ability predicted by vocabulary, phonological ability, spelling, age, and gender. Increases in vocabulary and phonological ability were associated with better reading comprehension. Only vocabulary and phonological ability were significant predictors. Using the `effects()` package in R, one can plot the estimated linear effect that accounts for all other variables in the model (as apposed to the `lm` argument in ggplot2).
However, even when using linear regression, one cannot think of everything that may relate to outcomes of interest. For instance, in researching reading comprehension ability, what if some readers respond to certain words or stories differently than others due to differing background experiences? In a large multi-site intervention, what if how readers respond to texts depends on the school the student attends? Linear mixed effects models (or lmers for short) allow analysts to account for individual variability and is especially well-suited for estimating linear trends for continuous data.
​
​​
Coming Soon
I'm hoping to run a GAMM on some old ERP data to compare results of traditional ANOVAs. It'll be interesting to see how GAMMs match up to pre-defined time-windows (which is the traditional approach).
