One of the great things about the internet is the fact that everyday people can share what they know with the entire world, so if they've had a particularly good or bad experience with a business or product, they can notify everyone via customer review websites. The flip-side of that, however, is that business owners can plant fake reviews on those same sites, that either praise their own business or slam their competition. Well, confused consumers can now take heart - researchers from Cornell University have developed software that is able to identify phony reviews with close to 90 percent accuracy.
The Cornell team asked a group of people to deliberately write a total of 400 fraudulent positive reviews of 20 Chicago hotels. These were combined with the same number of genuinely positive reviews, then submitted to a panel of three human judges. When asked to identify which reviews were spam, the judges scored no better than if they had randomly guessed.
According to Myle Ott, a Cornell doctoral candidate in computer science, humans are affected by a "truth bias," in which they assume that everything they read is true unless presented with evidence to the contrary. When that happens, they then overcompensate, and assume that more of what they read is untrue than is actually the case.
After the human trials, the researchers then applied statistical machine learning algorithms to the reviews, to see what was unique to both the genuine and fraudulent examples. It turns out that the fake ones used a lot of scene-setting language, such as "vacation," "business" or "my husband." The genuine ones, on the other hand, tended to focus more on specific words relating to the hotel, such as "bathroom," "check-in" and "price."
The two groups of writers also differed in their use of specific keywords and punctuation, and how much they referred to themselves. As had already been found in other studies of imaginative versus informative writing, it was additionally determined that the spam reviews contained more verbs, while the honest ones contained more nouns.
Based on a subset of the 800 reviews, the team created a fake-review-detecting algorithm. When used in a way that combined the analysis of keywords and word combinations, that algorithm was able to identify deceptive reviews in the entire database with 89.8 percent accuracy.
So far, the software is only useful for processing hotel reviews, and Chicago hotel reviews at that. The Cornell team is hoping, however, that similar algorithms could be developed for reviews of a wider range of goods and services.
"Ultimately, cutting down on deception helps everyone," said Ott. "Customers need to be able to trust the reviews they read, and sellers need feedback on how best to improve their services."