Skip to Main Content

Research Data Management

Terms and Definitions

Anonymisation refers to the process of removing or modifying personal identifier, both direct and indirect. 

Anonymisation results in anonymised data that cannot be associated with any one individual.

Direct Identifier
A data attribute which, on its own, identifies an individual (e.g. fingerprint) or has been assigned to an individual (e.g. NRIC).

Indirect Identifier
A data attribute which, by itself, does not identify an individual, but when combined with other information, may identify an individual.

De-identification
Removal of identifying information from a dataset. This data could potentially be re-identified.

Re-identification
Identifying a person by recombining de-identified dataset and identifying information.

When to anonymise?

  1. Purpose and utility
    Anonymisation should be done specifically to the purpose on hand.
    The process of anonymisation reduces the original information in the dataset by some extent, hence reduces the utility (e.g. clarity, precision).  You need to decide on the degree of the trade-off, between acceptable utility and reducing risk of re-identification.

  2. Nature and type of data
    Different anonymisation techniques are suitable for different type of data.

  3. Anonymisation techniques
    Certain techniques may be more suitable for a situation than others.
    For example, character masking are usually used on direct identifiers and aggregation for indirect identifiers. 

    The various anonymisation techniques also modify data in significantly different ways.
    For example, character masking modifies only parts of an attribute, pseudonymisation replaces the entire attribute with unrelated, but consistent information, and attribute suppression removes the attribute entirely.

     
  4. Inferred information
    It may be possible for certain information to be inferred from anonymised data.
    For example, masking may hide personal data, but it does not hide the length of the original data in terms of the number of characters.
    The anonymisation process must therefore take note of every possibility, both before deciding on the actual techniques and after applying the techniques.

  5. Expertise with the subject matter
    An “identifiability” assessment should be performed before and after anonymisation techniques are applied, and this requires a good understanding of the subject matter which the data pertains to. Hence, if the dataset is healthcare data, it likely requires someone with sufficient healthcare knowledge to assess how unique (i.e. how identifiable) a record is.

  6. Competency in anonymisation process and techniques
    Anonymisation is complex.  Look out for persons well-versed in anonymisation techniques and principles.

  7. The recipient
    Factors such as the recipients’ expertise with the subject matter, play an important role in the choice of the anonymisation techniques.
    Data released to public will require a much stronger form of anonymisation compared to data shared under a contractual arrangement.

  8. Tools
    Software tools can be very useful to aid in executing anonymisation. Note that even the best tools will need adequate inputs or may have limitations

(Source: PDPC Guide to basic data anonymisation techniques)

Basic Data Anonymisation Techniques

1. Attribution Suppression

2. Character Masking

3. Pseudonymisation

4. Generalisation

5. Swapping

6. Data Perturbation

7. Synthetic Data

8. Data Aggregation

For details, refer to PDPC Guide to basic data anonymisation techniques

Images and Recordings

Images/Photos

There are many mobile apps and websites that can blur or obscure faces in images/photos (check the list on WikiHow). Built-in image editor tools such as MS Paint (for Windows) and Paintbrush (for Mac) can be used for simple editing.

Video Recordings

There are various mobile apps and video editor software that offer blurring/obscuring functions. If you park your videos on YouTube, you can use YouTube Studio to blur your videos.

Audio Recordings

Participants should be advised not to identify themselves. Likewise, transcripts should identify participants by code rather than by name. Consider audio editing software (e.g. Audacity) or voice changer software to anonymise the audio.

A recording can never be completely anonymous. Blurring out faces does not guarantee protection.

A person can be identified from visual details (e.g. scars, tattoos), a distinct clothing, name on the screen, a landmark in the background, as well as audio details such as a distinctive voice or accent. 

Tips for recording:

  • Ask participant not to mention specific names or places
  • Ask participant not to wear distinctive clothes for filming
  • Film participant's hands or from behind, or in profile, or from a distance
  • Film in unidentifiable locations