MIT and Harvard release de-identified learning data from open online courses

Dataset contains the original learning data from the 16 HarvardX and MITx courses offered in 2012-13.


Press Contact

Kimberly Allen
Email: allenkc@mit.edu
Phone: 617-253-2702
MIT News Office

Michael Rutter
Email: michael_rutter@harvard.edu
Phone: 617-495-9028
HarvardX

A research team from Harvard University and MIT has released its third and final promised deliverable — the de-identified learning data — relating to an initial study of online learning based on each institution’s first-year courses on the edX platform.

Specifically, the dataset contains the original learning data from the 16 HarvardX and MITx courses offered in 2012-13 that formed the basis of the first HarvardX and MITx working papers (released in January) and underpin a suite of powerful open-source interactive visualization tools (released in February).

The dataset was subjected to a careful process of de-identification: removing personally identifiable information, using best practices including aggregation, anonymization via random identifiers, and blurring to reduce individuality of sensitive data fields, among other techniques.

“We are excited to be able to present the data behind the reports we released in January. This step opens the door to more sophisticated analyses that build on what we have already done,” says co-lead researcher Isaac Chuang, a professor in MIT’s electrical engineering and computer science and physics departments. “MITx and HarvardX are committed to upholding learner privacy as well as advancing learning research. These data are a public good.”

Harvard’s Andrew Ho, Chuang’s co-lead, adds that the release of the data fulfills an intention — namely, to share best practices to improve teaching and learning both on campus and online — that was made with the launch of edX by Harvard and MIT in May 2012.

Ho and Chuang anticipate that the data will offer insight to other educational researchers. Moreover, the methods used to protect learner privacy comply with FERPA (Federal Education Rights and Privacy Act) regulations, which govern the release of such data. The practice should inform the release of future datasets from edX and offer lessons more broadly.

“Learning data from open online courses hold great promise for research, but good research must be replicable by others,” says Ho, an associate professor at the Harvard Graduate School of Education and co-chair of the HarvardX Research Committee. “By sharing these de-identified data, we hope to show that we can protect information about individuals while still enabling replicable research about what works in online learning.”

The MIT Office of Digital Learning, HarvardX, and MIT’s Institutional Research group in the Office of the Provost contributed to the release of the dataset.

To learn more:


Topics: Education, teaching, academics, Massive open online courses (MOOCs), Educational Innovation and Technology, EdX, Learning, MITx, Office of Digital Learning

Comments

Back to the top