Chapter 11 Data security

Data security refers to the principles followed to protect our data. There are two sides to this: ensuring that our data are not lost and ensuring that aspects of our data that are personal are kept confidential.

All lab personnel should complete the web-based Data Security training module offered by Cardiff University.

The data we post publicly online should not afford discovery of participants’ identities. There nonetheless remains some information in the data file that interested parties could use in conjunction with other known information to identify a participant. Suppose you are a private investigator, and your client wants you to spy on her child at university. You know your mark is enrolled on the Psychology degree, and you find out that completing this degree involves participating in laboratory studies. You think these studies might be a source of key information about how well your mark can remember abstract visual images, so you follow him to the laboratory, noting the date and time he participated. Once the data are posted online, you can download them and find the participant whose data were recorded on that date and time, even if no other information in the data uniquely identifies your mark. Knowing what we know about the distortibility of data, we can take those visual recognition task data and dubiously infer all sorts of maliciousness.

This is obviously ridiculous, but technically possible. It is referred to as the motivated intruder test, and we cannot consider our data truly anonymous if a motivated intruder could easily uncover and exploit the indirect links between anonymous data and identifiable information we may have access to. To comply with EU privacy regulations, we need to take steps to protect participant identities even when we think our data are harmless. This holds regardless of whether we plan to make the anonymous data set publicly available. Unless you are storing the data only on secured Cardiff University resources, it must be properly anonymized. This means that before we anonymize it, data cannot be held even temporarily on personal disks, on DropBox, or Google Docs, and cannot be emailed to non-university email addresses.

11.1 Backing up digital data

11.1.1 During data collection

All lab personnel will be given access to a OneDrive folder “labBackUp_cmorey”. While data collection is ongoing, the researcher responsible for data collection should add newly acquired data to the OneDrive lab back-up every day new data are acquired. There are sub-directories in labBackUp_cmorey for all the physical spaces we use to collect data. Simply copy the your experiment directory on the physical machine to the corresponding sub-directory in labBackUp_cmorey.

Note that here it will be necessary to make sure that your experimental directory has a unique name. The short-hand name you use for your project can be used for this. Do not choose names like “project”, “dissertation”, “thesis”, “experiment 1”.

These data are confidential because they include timestamps that, if linked to the participant’s EMS schedule or to information the participant discloses publicly about their participation in experiments, their identity could be reconstructed. You should not save these data onto portable drives - they are not secure enough. They should be saved only on the labBackUp_cmorey OneDrive, to the lab computers which sit in a secured location, or to password-protected computers associated with Cardiff University.

11.1.2 When data collection is closed

Once we have decided not to collect any more data on a project, it is time to compile and anonymize that data set. This should always occur before any analyses are attempted, and must always be done in a scripted fashion. For lab members not fluent in scripting, this should be done in a face-to-face meeting with the PI or another senior lab member. This is the Data Round-up Meeting.

11.1.3 Cleaning up lab directories

After we have compiled the data, created anonymized versions, and backed up the original data to Dr. Morey’s H drive, it should not matter whether the original data files stored on labBackUp_cmorey or on local machines are lost. These sources are sufficiently secure for our data, which are at most confidential, never highly confidential. In the event that we collect highly confidential data, such data should be deleted from local machines and OneDrive as soon as we have the compiled data backed up to the more secure H drive. For data that is confidential we need not rush to delete it. We shall allow most data to exist on labBackUp_cmorey until the space is required for newer data, and then delete the oldest files that have been compiled and backed up first. The researcher responsible for the data should delete it from the local machine where it was collected after data round-up for that project is finished.

11.1.4 Summary

Individual data files may be found on the local machine on which they were collected, unless there has been a crash or overwriting error. Assuming our lab back up procedure was followed, individual data files should be redundantly available on our shared OneDrive back-up.

Once we have compiled and anonymized the data, it will not be necessary to access the individual data files. Lab personnel associated with a project shall all have access to the anonymized data via our shared OSF project page. This anonymized data file may be saved redundantly anywhere. It will not contain sensitive or confidenital information.

If you somehow lose the anonymized data file and it is somehow gone from OSF, we can recover it by re-running the anonymization script we created on the original compiled data. The PI controls access to the original data, so requests for recovery from the original data must go through her. If we need the original data to prove something about the time period of data collection, the PI must also arrange access.

11.2 Processing paperwork

11.2.2 Paper data

If your study involves data collection on paper forms, you must back these up digitally by scanning them to .pdf as soon as your data collection closes. While collecting data, you should leave the paper forms in an organized space within the lab.

Paper data should also exclude personally identifiable information. It will also be considered confidential because some combination of lab personnel could work out the identity of the participant by linking the participant id on the paperwork with corresponding digial data with a timestamp. This is why paper data should remain in the secure lab. Once scanned, the digital file shall be saved to the appropriate OneDrive directory and to Dr. Morey’s H drive.

Data requiring transcription or coding must be duplicated by two independent coders, with any disgreements resolved by a third person. For projects involving coding or transcription, once we have multiple ratings the raters and PI (or a senior researcher) shall meet to compare the ratings and arbitrate any differences. In this meeting, the two rater’s transcriptions shall be merged, a resulting measure evaluated, and ultimately the resulting agreed measure shall be merged with the other project data. All steps will be documented via script. Once this has been completed, and the data have been uploaded to OSF (for anonymized data) or Dr. Morey’s H-drive (for non-anoymous data), the paper forms shall be destroyed via Confidential Waste collection.

11.3 Confidentiality

We have procedures built into our back-up process to prevent potentially-identifiable research data from leaving the secure university environment. However, our biggest efforts for solving this problem are taken during data collection and study design. Whenever possible (which is nearly always), we do not record any identifiable information about our participants. Unless you need precise information about their ages at test, we do not record birth dates. We never record proper names or any stable identification number with experimental data. When we need to link a participant’s data across multiple sessions, we will do this with a key of real-life identity and contact information to participant numbers, which is destroyed once we have finished collecting data. When we need to use such a key, it is maintained by one researcher, and stored within the secured university environment.

Our ethics approval states that we shall destroy this key when data collection is finished. For projects involving a key, key destruction will be the last item on the agenda of the Data Round-up Meeting. Once we have compiled and written the anonymous data files, we shall run tests to be sure that we have correctly matched up participant ids across sessions. Once satisfied of this, we shall delete all copies of the key.