Skip to main content

Early-Stage Software Can Sniff Out Lies in Fake User Reviews

If you've ever used reviews on to whittle down options for dinner, you've probably encountered write-ups that linger a little too long on the outstanding service of one particular waiter or perhaps sling some mud at the rival pizza shop down the street.

It's no secret that business owners and jaded customers plant false reviews on crowd-sourced review sites such as Yelp, but a team of professors and doctoral students at Cornell University is developing software that can vet those slanderous reviews, identify deceptive language, and sift out the fake critiques from the honest ones. In the process, the software could someday help site users cut through misleading information they would otherwise accept as truth.

That consideration is especially important given web surfers' inability to pick out deception. During the Cornell study, which was recently presented by doctoral candidate Myle Ott at the annual gathering of the Association for Computational Linguistics, Ott and a team of researchers asked 400 volunteers to write intentionally false reviews of 20 Chicago-area hotels. When a selection of those fictional reviews were given to three human judges, the panel failed to identify false entries with any accuracy beyond simple luck or guesswork.

Next, the researchers applied a set of homegrown algorithms to a collection of true and false write-ups and set the code to search for deception-identifying cues within the text of each review.

The result? Nine times out of ten, the software accurately identified deceptive hotel reviews. The findings showed that falsified reviews were more likely to contain language detailing a made-up back story—for instance, words such as "business," "vacation," or "my husband," that explain a person's fictional motivation to visit a hotel or detail who accompanied them on the trip.

On the other hand, truthful evaluations use more substantive words that reflect a live, in-person experience.  The study cites words such as "bathroom," "check-in," or "price" as the type of concrete keywords used by a reviewer who actually visited a place of business. The study also concluded that misleading reviews included more verbs, while truthful ones contained more nouns, likely a sign that a fake user review focuses on actions but lacks the knowledge to describe surroundings, peoples, and experiences.

When the algorithm was altered to pair the keyword-search method with other lie-detecting strategies involving punctuation and the number of first-person references, the software identified fraudulent reviews with an accuracy of 89.8 percent.

That rate of success is impressive, though the researchers were quick to point out that testing is far from conclusive. The study's intention, to give crowd-sourcing fans a means to trust online reviews, might just make the Internet's trust circle a little bit wider.