Search Rank Fraud Detection in Online Services
Which Thai restaurant has good food? Which vacuum cleaner to buy on Amazon? Which alarm clock app to install on your phone? Online services that assemble and leverage public opinion on hosted products are central to numerous aspects of people's online and physical activities. Every day, people rely on online information to make decisions on purchases, services, software and opinions. People often assume the popularity of featured products is generated by purchases, downloads and reviews of real patrons, who are sharing their honest opinions about what they have experienced. But, is that true? No, unfortunately. Reviews, opinions, and software are sometimes fake, produced and controlled by fraudsters. Some of them collude to artificially boost the reputation of mediocre services, products, and venues, some game the system to improve rankings in search results, and some entice unsuspecting users to download malicious software.
Profit is usually what most fraudsters eventually want; their deceptive and malicious activities are means to this end. A recent study showed that an extra half-star rating on Yelp (a popular business review site) causes restaurants to sell out 19% more frequently. Fake reviews and app installs alter the popularity and profitability of software, including malware.
Strongly motivated, fraudsters have become increasingly inventive and hard to detect. They exploit crowdsourcing sites (e.g., Freelancer, Fiverr), proxies and anonymizers to hire teams of willing workers to commit fraud collectively (many are experienced), emulating realistic, spontaneous activities from unrelated people (i.e., ``crowdturfing''). In addition, they often change their strategies to bypass defenses.
FairPlay: Fraud and Malware Detection in Google Play. We introduce FairPlay, a system to efficiently detect Google Play apps that engage in fraud, including malware. We focus on search rank fraud, where app developers fraudulently boost the rating and install count of their apps. Search rank fraud can be efficiently performed through crowdsourcing, and thus requires no app search optimization (ASO) expertise from developers.
FairPlay leverages the observation that fraudulent and malicious behaviors leave behind telltale signs on the app market. The combination of fraud and malware is not arbitrary: we posit and find evidence that malicious developers resort to search rank fraud to boost the impact of their malware.
FairPlay organizes the analysis of longitudinal app data into the following 4 modules, illustrated in the adjacent figure. The Co-Review Graph (CoReG) module identifies apps reviewed in a contiguous time window by groups of users with significantly overlapping review histories. The Review Feedback (RF) module exploits feedback left by genuine reviewers, while the Inter Review Relation (IRR) module leverages relations between reviews, ratings and install counts. The Jekyll-Hyde (JH) module monitors app permissions, with a focus on dangerous ones, to identify apps that convert from benign to malware. Each module produces several features that are used to train an app classifier.
We have evaluated FairPlay using gold standard datasets of hundreds of fraudulent, malware and benign apps, that we collected by leveraging search rank fraud expert contacts in Freelancer, anti-virus tools and manual verifications. FairPlay achieves over 97\% accuracy in classifying fraudulent and benign apps, and over 95\% accuracy in classifying malware and benign apps. We confirm that malware often engages in search rank fraud as well: When trained on fraudulent and benign apps, FairPlay flagged as fraudulent more than 75\% of the gold standard malware apps.
FairPlay discovered hundreds of fraudulent apps that currently evade Google Bouncer's detection technology. FairPlay enabled us to discover a novel, ``coercive campaign'' attack type, where app users are harassed into writing a positive review for the app, and install and review other apps.
Marco: Detecting Fraudulent Review Behaviors in Yelp. In this project, we introduced (MAlicious Review Campaign Observer), a novel system that leverages the wealth of spatial, temporal and social information provided by Yelp, to detect venues that are targets of deceptive behaviors. Marco (see figure above) exploits fundamental fraudster limitations to identify suspicious venues. First, Marco identifies venues whose positive review timeline exhibits abnormal review spikes, see the adjacent figure. We prove that if a venue has more than 49 genuine reviews, a successful review campaign for that venue will exceed, during the attack interval, the maximum number of reviews of a uniform review distribution.
Second, Marco exploits the observation that a venue that is the target of a review campaign is likely to receive reviews that do not agree with its genuine reviews. In addition, following a successful review campaign, the venue is likely to receive reviews from genuine users that do not agree with the venue's newly engineered rating. Marco defines then the disparity of a review for a venue, to be the divergence of the review's rating from the average rating of the venue at the time of its posting. The aggregate rating disparity score of a venue is then the average rating disparity of all its reviews. This is illustrated in the adjacent figure that plots the evolution in time of the average rating against the ratings of individual reviews received by the ``Azure Nail & Waxing Studio'' (Chicago, IL). The positive reviews (1 day has a spike of 19, 5-star reviews, shown in red in the upper right corner) disagree with the low rated reviews, generating a high average rating disparity.
Third, Marco detects both venues that receive large numbers of fraudulent reviews, and venues that have insufficient genuine reviews to neutralize the effects of even small scale campaigns.
In preliminary work we have used the features we extract to classify 7,435 venues we collected from Miami, San Francisco and New York City. The table on the right shows our results. The black numbers represent the number of venues of a certain type we have collected from each city. The red numbers between parentheses represent the number of such venues that Marco has detected as suspicious. We observe that San Francisco has the highest concentration of deceptive venues: Marco flags almost 10% of its car repair and moving companies as suspicious.
Mahmudur Rahman, Mizanur Rahman, Bogdan Carbunar, Duen Horng Chau.
In Proceedings of the SIAM International Conference on Data Mining (SDM), May 2016. [pdf]
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, Duen Horng (Polo) Chau.
Statistical Analysis and Data Mining, Wiley, Pang-Ning Tan and Arindam Banerjee, editors (invited), 2015. [Preliminary version]
Turning the Tide: Curbing Deceptive Yelp Behaviors
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, George Burri, Duen Horng (Polo) Chau.
In Proceedings of the SIAM International Conference on Data Mining (SDM), Philadelphia, April 2014. [pdf]
"Yelp Events: Building Bricks Without Clay?"
Jaime Ballesteros, Bogdan Carbunar, Mahmudur Rahman, Naphtali Rishe.
In Proceedings of the 5th International Workshop on Hot Topics in Peer-to-peer Computing and Online Social Networks (HotPOST), July 2013. [pdf]