Notes From The Field
Research Center Notes on letter of Sara Goldrick-Rab and Douglas N. Harris, University of Wisconsin-Madison
1. Cleanness and completeness of Clearinghouse data
Postsecondary institutions voluntarily submit their enrollment data to the Clearinghouse. The quality of Clearinghouse data is thus dependent on the quality of data maintained by each postsecondary institution, and also on the quality of the processes used to extract that data for submission to the Clearinghouse. The accuracy and completeness of the data can realistically be measured only by the institutions themselves.
Historically, Clearinghouse data were collected for the purpose of verifying whether students were eligible for loan deferments. This provides a practical validity check: the colleges send data to the Clearinghouse as early as possible and the Clearinghouse processes enrollment verifications on their behalf. System edits and checks are in place to allow institutions to maintain compliance with federal reporting regulations; nonetheless, these checks may not always be sufficient for the purposes of research. And, since participation is voluntary, the Clearinghouse does not have mechanisms to require colleges to send complete data, or to do so in a timely fashion.
The Clearinghouse is always working on ways to enhance data collection, and is currently working to provide better transparency about data coverage rates and validity measures. Watch this space for updates in these areas.
2. Matching logic
Due to FERPA requirements, the Clearinghouse rarely uses SSNs for StudentTracker® matching, and then only in limited circumstances. In any case, the matching algorithm employs only exact logic on SSNs (it is clear upon reflection that matching only 3-4 digits of an SSN would result in false positives the majority of the time). Nonetheless, the Clearinghouse does need to provide more realistic guidance on expected match rates between students in the query and the Clearinghouse enrollment records, beyond the fact that the match rate will always depend upon the type of query performed, the quality of the data in the query file, and the quality of data submitted by the member institutions.
The Clearinghouse takes great care to minimize both false positives and false negatives in order to produce conservative match results. Every file processed is reviewed by a Clearinghouse data analyst who manually examines incomplete matches. The StudentTracker matching logic is under constant enhancement efforts, and the Clearinghouse welcomes feedback about matching results as we strive to learn from the experience of researchers in the field. StudentTracker has never received an inquiry, however, alerting us to the type of SSN mismatching noted in the letter.
3. Variation in the matching process according to the type of search the Clearinghouse processes.
The Longitudinal Cohort (CO) query was designed to be used by an individual college to track students who attended the college at a known point in time. The same is true for a Subsequent Enrollment (SE) query when it is used by an individual college. That is, there is no difference in matching logic between the (CO) query and (SE) query when requested by a college or university: both will return a student as “unfound” if the student was not enrolled in the requesting college. This logic does not apply, however, when an SE query is requested by a researcher from a system or other organization. The StudentTracker contract that each college and university signs is limited to tracking the former applicants, students, and/or graduates of that school. It does not permit the account to track people who never had a relationship with the college.
The Clearinghouse will work to better clarify differences among the query types.
4. Definition of enrollment terms
The Clearinghouse does not define enrollment terms. StudentTracker reports back the term begin dates and term end dates that we receive from each member institution. Each institution is thus responsible for defining the enrollment terms at their campus. The institution is also responsible for reporting student information in a timely manner and as frequently as they choose, often several times within each of their terms. Every time an institution reports data to the Clearinghouse, this data is a snapshot of enrollments on a particular date. StudentTracker returns the most up-to-date enrollment record that has been submitted for each unique combination of school code, term begin date, and term end date within a student’s career. Thus, a student who withdraws midway through a term will be reported by StudentTracker as withdrawn if the college has submitted a file to the Clearinghouse with the updated enrollment status, before the date that the query is processed.
Source: Wisconsin Scholars Longitudinal Study (2010), Letter to Colleagues: Observations on the use of Clearinghouse data for research purposes.