Sam Feaster: How Line’s multi-filter system stops spam in its tracks

Image credit: Tech in Asia

A flash of bright green pops up on Randy Lim’s screen, indicating that the Singapore-based photographer just received a Line message alert. Lim checks the message and quickly sees this isn’t an old friend, but rather one of the oldest tricks in the spammer book — a wealthy prince who is offering to send him US$20 million.

“He wanted me to open a bank account with a US$5,000 minimum deposit, give him the password, and [then] he’ll transfer the US$20 million to me,” says Lim.

“Sounds like a fair deal to me,” he adds, rolling his eyes.

Tech in Asia spoke to a few Line users from Singapore, Malaysia, and Japan, including 28-year-old Lim. He started using Line two years ago to get in touch with a friend who had moved to Thailand, but now he uses the app every day. “I’m so used to [spam messages] now. I just delete them now without even opening the chats,” he says.

Of course, most spammers don’t use such tired old scams to suck in their victims. There is now an endless onslaught of tricks, hustles, and phishing techniques. And the rise of messaging apps like Line, which make global communications more accessible and affordable, are providing new opportunities for these digital deceivers.

Over 217 million people use Line daily across the globe. Unsurprisingly, the largest number of reported spam messages come from Taiwan and Japan, where the biggest clusters of users reside. This is why Line’s cybersecurity team created a three-level spam filtering system designed to root out the vast majority of fraudulent messages.

The challenges of managing spam

“Spam messages 10 years ago were very different from the ones we have today,” explains Kenji Aiko, an engineer in the cybersecurity team for Line’s app.

A 2014 study from the Georgia Institute of Technology in the US found spam messages containing URL links dipped after 2009 but spiked significantly after 2011.

“One possible reason is that more URL camouflage techniques, which are quite efficient in avoiding spam filters, appeared such as shortened or hidden URLs in recent years,” according to the paper.

In Japan, spam typically solicits customers for dating sites. In Taiwan, most are advertising counterfeit goods and financial services, according to Line data. However, the company is also getting an increasing number of user reports from other countries such as Singapore.

Aiko says that spam accounts pop up ferociously: three spam accounts spring up to take the place of the one you block. It’s also easy for them to add you as a friend.

Line recognized spam as a problem early in 2013 and began to develop measures to circumvent the issue. “[Our anti-spam measures] already started before the security team took over in 2013. The number of spam messages went down after that,” Aiko says.

And this has remained stable until today, Aiko said at Line Developer Day 2017.

Preventing unsolicited messages helps create a better user experience / Image credit: Pexels

Their approach? Using users’ spam reports to create three layers of filters, each taking out spam messages at a different level. Line analyzes spam data using the search engine ElasticSearch and open-source data visualization plugin Kibana. Based on those results, the company crafted a plan to weed the spam out of their 25 billion daily messages:

1. Rules-based filter

The first filter rests entirely on automation. Software decides whether an account is registered as legitimate or spam without the need for user reports.

The spam accounts are filtered out by the metadata (e.g. number of messages sent in succession) collected by the software. Once one or multiple suspicious conditions are met, it alerts the system.

For example, the metadata of a newly created account reads like this: 1,000 friends were added and then 1,000 messages were immediately sent at the same time. Sounds fishy, don’t you think?

Suspicious user behavior such as this sends red flags to the filtering software, which proceeds to register the user as a spam account and then suspends it.

Serving as the first gatekeeper, this filter is useful in pinpointing and eliminating general spam behaviors. The next two filters, however, rely on users hitting the “Report Spam” button in the app.

Examples of spam on Line. But not all spam looks like this anymore / Image credit: Tech in Asia

2. Machine-learning filter

While the rules-based filter can spot general spamming behavior, the machine-learning filter can zero in on more complicated instances.

The second filter is automated but requires data from users’ reports. The data collected helps the system to make informed decisions such as blocking a spam account, or determining whether an account is legitimate.

“Sometimes, friends get into fights, and one of them reports the other as a spam account,” says Aiko. This is what the Line team refers to as false positives.

“Sometimes, friends get into fights, and one of them reports the other as a spam account,” said Aiko.

The second filter can also suspend or block accounts that meet certain conditions. For example, if an account is the subject of more than a few user reports, and if the account information is almost identical to other spam accounts, the system registers it as unusual activity.

At this point, the spam filter takes one last verification step: if the account’s data is similar to a past false positive, the software does not register it as spam.

On top of checking and suspending spam accounts, this filter keeps track of reported spam messages. After every identification of a spam account, the system updates its “black book.” This is where it logs all the unusual and common characteristics of spam accounts. These updates live in what the Line team calls a dataset.

Because spam messages and behavior evolve over time, the learning and updates are done in real time, resulting in multiple datasets. As such, the dataset from a few years ago is not the same as the one today. Line calls this the “concept drift.” Until a few years ago, spam was in text and could be blocked by detectors. Now, spam has evolved into images and videos. Nowadays, there are even auto-generated text spam messages.

3. Monitoring filter

The last gatekeeper is humans. Line declines to disclose information about these human moderators, but they’re essential to combating spam.

“The number of false positives could be an issue, and we need to reduce such cases. It is almost impossible to reduce it by 100 percent,” points out Aiko. According to the security engineer, about 20 percent of reported cases are false positives.

“The machines can reduce 80 to 90 percent of the spam messages. The remaining 10 to 20 percent are done with human judgment,” he adds.

Based on the characteristics of spam messages, Line has identified several spam operators that are run like actual companies. Some have been around as early as 2014, and new ones are starting up.

“They are doing this spamming as a business,” says Aiko.

Whether spam is an effective way of advertising remains a mystery. However, the relentless onslaught of spammers seems to suggest that it is at least lucrative.

“Spam is hard to eliminate. Since it [won’t] go away, I imagine there’s some sort of profit,” shares Aiko.

This is part of the coverage of Line Developer Day 2017, a technical conference held at Shibuya, Tokyo on September 28.

This post How Line’s multi-filter system stops spam in its tracks appeared first on Tech in Asia.

from Tech in Asia https://www.techinasia.com/lines-multifilter-system-stops-spam-tracks
via IFTTT

Sam Feaster

Monday, January 15, 2018

How Line’s multi-filter system stops spam in its tracks

The challenges of managing spam

1. Rules-based filter

2. Machine-learning filter

3. Monitoring filter

No comments:

Post a Comment