Telecommunications

New technique developed to identify authors of anonymous emails

New technique developed to ide...
Concordia University professor, Benjamin Fung, has developed an effective new technique to determine the authorship of anonymous emails (Image: Concordia University)
Concordia University professor, Benjamin Fung, has developed an effective new technique to determine the authorship of anonymous emails (Image: Concordia University)
View 1 Image
Concordia University professor, Benjamin Fung, has developed an effective new technique to determine the authorship of anonymous emails (Image: Concordia University)
1/1
Concordia University professor, Benjamin Fung, has developed an effective new technique to determine the authorship of anonymous emails (Image: Concordia University)

There might be many harmless reasons for sending anonymous emails – confessing your undying love for someone, seeking anonymous advice, or simply playing a joke on a friend – but there are also plenty of harmful reasons – making threats against someone, distributing child pornography or sending viruses, just to name a few. While police can often use the IP address to locate where an email originated, it may be harder to nail down exactly who sent it. A team of researchers claims to have developed an effective new technique to determine the authorship of anonymous emails that can provide presentable evidence in courts of law.

In an attempt to combat the increase of cybercrimes involving anonymous emails, Benjamin Fung, a professor of Information Systems Engineering at Quebec's Concordia University and an expert in data mining, and his colleagues set about developing a novel method of authorship attribution based on techniques used in speech recognition and data mining, which involves extracting useful, previously unknown knowledge from a large volume of raw data. Their approach relies on identifying frequent patterns and unique combinations of features that recur in a suspect's emails.

The technique works by first identifying the patterns found in emails written by the subject. Any of these patterns which are also found in the emails of other subjects are then filtered out, leaving patterns that are unique to the author of the emails being analyzed. These remaining frequent patterns then constitute what the researchers call the suspect's 'write-print' – a distinctive identifier akin to a fingerprint.

"Let's say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters," says Fung. "We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author."

Fung and his colleagues tested their technique by examining the Enron Email Dataset – a collection containing over 200,000 real-life emails from 158 employees of the Enron Corporation. Using a sample of 10 emails written by each of 10 subjects – 100 emails in all – they were able to identify authorship with an accuracy of 80 to 90 percent.

"Our technique was designed to provide credible evidence that can be presented in a court of law," says Fung. "For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this."

11 comments
teeduke
All lower case. Grammatical or spelling errors. Oh, boy. Whoopee. This sounds like what anyone with half a brain would spot in about 30 seconds. To call this a technique is like putting lipstick on a pig. I certainly wouldn\'t want some prosecutor to use this technique to \"provide credible evidence\" against me or anyone else. And any jurist who admits this as credible evidence should have their head examined.
Venril
"There might be many harmless reasons for sending anonymous emails - confessing your undying love for someone, seeking anonymous advice, or simply playing a joke on a friend - but there are also plenty of harmful reasons - making threats against someone, distributing child pornography or sending viruses, just to name a few."... The author of the article sort of forgot to explore the principle reason a society needs anonymous speech. The expression of ideas unpopular to those in power, who might just take action to folks questioning the way things are. It's not about love notes or sending viruses or "the children!". Even if this were to prove effective on plain text, I doubt the author of a virus will be send any amount of text to be analyzed. Who funded this research, the PRC or Iran? Seriously, the Federalist Papers were published anonymously. Much of political speech in repressive nations is done anonymously, to avoid being 'disappeared'. Oi. Vey.
mred
I agree with Venril. Also, someone could easily mimic the writing style of someone they wanted to frame if they knew that these analysis techniques were being used. And WHO is going to make a database of e-mails, presumably purloined by ISPs? That itself is scary. Privacy is a major issue, especially with governments acting they way they have been, with excuses of \"terrorism\" or \"protecting the children\". Society needs anonymous speech.
David Donovan
Any judge that would prosecute based on something like this should be fired. This is bullshit! -Anonymous
wsa999
This is nothing new.
alcalde
It\'s not BS, anonymous, it\'s the near magical power of data mining and machine learning. Those rubbishing the effectiveness are ignoring the fact presented that out of 158 authors of the Enron e-mail collection, the algorithm identified either 8 or 9 out of ten authors correctly from just 10 e-mail samples. The all lowercase, typos, etc. are just easy to understand examples, but given the claim also made in the article that the algorithm can infer nationality, gender, education level, etc. it\'s most likely looking at a far wider range of factors... specific words, average syllables per word, words per sentence, specific grammar rules followed or disobeyed like starting a sentence with a proposition, etc. Is anyone going to suggest that if I gave them ten writing/speech samples from Sarah Palin, Barack Obama and Charlie Sheen they wouldn\'t be able to pick out which came from which? People have unique speaking styles and data mining can pick those out the same way our own brains can, except the algorithms can explain their decisions better.
christopher
Here\'s a better idea: www.SelfDestructingEmail.com You can\'t get much more anonymous than completely invisible :-)
seekertom
Here\'s the thing... these researchers had a list in hand of em authors, and they matched the authors to their list, as expected. But the key is, the list was in-hand. In their examples of \'need\' for this technology, there is no list of authors from which to choose a culprit. There is only a wide world full of people sending ems to each other. Now, if the spookies have already collected all of everyone\'s emails, then they DO have a list of authors and samples of their emails... now THAT is the scary part, isn\'t it?
Hmm_OK
Hopefully they can use it to track all the automated/robot spam e-mails back to their owners and free up the web a little! Although I believe very strongly in the principals of free speach, there are always those who will abuse the rights. Our world is not perfect and until we can find effective ways to prevent the abuse, we either have to accept it or accept certain limits to freedom, I guess it all depends on what you think is the greater evil.
Facebook User
Any defense lawyer could knock this out of court by presenting a sampling of posts from Craigslist, picked out using the same techniques. Any sort of horrendous writing in any language can be found on Craigslist, surely with at least hundreds of \"matches\" to any given person\'s writing style.