The identities of thousands, possibly millions, of public transport users in Victoria, are discoverable in a vast set of data released by Public Transport Victoria (PTV), University of Melbourne researchers have found.
The research team from the School of Computing and Information Systems at Melbourne School of Engineering re-identified themselves, a co-traveller, and a member of the Victorian Parliament along with details of daily routines from data taken from more than 15 million cards from Melbourne’s contactless smartcard ticketing system known as Myki.
The de-identified information was released in mid-2018 and made available publicly online as part of a data science competition.
The Office of the Victorian Information Commissioner (OVIC) has today released its report on the researchers’ re-identification of the data.
Following analysis of the data, lead researcher Chris Culnane from the School of Computing and Information Systems said that, worryingly, most Myki users in the dataset could be identified from just a few touch on or touch off events.
“With just a handful of pieces of information about where someone boards or exits public transport, it’s possible to get an indication of where they live or work, their regular travel patterns, who they travel with, or if they travel alone – for example, children heading home from school alone,” Dr Culnane said.
“Our analysis raises serious privacy, safety and security issues. It’s easy to imagine how information like this could be used by people who might want to cause harm.”
When it released the nearly two billion lines of data, the only piece of information PTV withheld were the card IDs – this is a person’s name if they have registered their Myki card. Despite this redaction, the researchers’ data analysis was able to link all trips on the same card.
Dr Culnane said even by withholding more data it was near impossible to de-identify data like this and instead recommended frameworks such as differential privacy, which makes it possible to collect and share aggregate information about user habits, while maintaining the privacy of individual users.
“This isn’t the first time a government agency has released data about the public and claimed it was de-identified. Above and beyond the desire to release and share data, privacy needs to be the number one priority,” he said.
The researchers were also able to re-identify a state member of parliament by analysing a handful of their tweets about traveling on public transport, demonstrating that publicly available social media data can be used to re-identify people’s travel histories.
Victorian Information Commissioner Sven Bluemmel said: “Your public transport history can contain a wealth of information about your private life. It reveals your patterns of movement or behaviour, where you go and who you associate with. This is information that I believe Victorians expect to be well-protected.”
Read the University of Melbourne report on the data breach.