Thursday, December 1, 2016

Data scientists caught Singapore’s ‘rogue train’. Here’s what else they can do.

GovTech Data Science Division team photo

The team comprising the Data Science Division. Photo credit: GovTech.

In the early hours of Saturday, November 5, Feng-Yuan Liu received a call from Singapore’s Ministry of Communications. A particularly nasty and persistent problem in the city’s Massive Rapid Transit (MRT) system had resurfaced once more. Stumped authorities were out of ideas.

Feng-Yuan, director of the Data Science Division of the Government Technology Agency (or GovTech), fired up his team’s WhatsApp group chat. Someone was going to have to come to work that Saturday.

“I was actually in Japan at the time,” he tells Tech in Asia. “So I pinged the team and asked if there were any volunteers who could help. These three guys stepped up.” The three guys are data scientists Daniel Sim, Shangqian Lee, and Clarence Ng.

The trio have documented how they managed to solve the mystery in a blog post that has gone viral, shared on Facebook even by Singapore’s Prime Minister.

In the blog, they explain the movie plot-worthy “trail of destruction” left by the rogue train designated as PV046 and how they helped stop it. It’s basically the computer geek version of 2010’s Denzel Washington-starring Unstoppable, thankfully with much lower stakes.

The post is a must-read, not only because of the complex problem-solving involved, but also as a look into how much data is generated around us daily and can be put to use.

See: Inside the Singapore government’s lab of the future

Runaway train

The crisis the team was called to tackle has been well documented in Singapore, but here’s the gist if you haven’t heard: the MRT, Singapore’s subway system, is largely automated, with trains riding the tracks by bouncing signals off each other and the stations.

At some point at the end of August, several trains on the system’s Circle Line started braking suddenly for no obvious reason. The culprit was a faulty train sending errant signals, which confused its fellow vehicles, creating disruption in the clockwork system.

Singapore MRT train

Singapore’s automated MRT trains. Photo credit: tang90246 / 123RF.

The problem showed up intermittently over the course of several months, resulting in delays and angry commuters on social media. The Land Transport Authority, transportation company SMRT, and the Defense Science and Technology Agency all tried to look into the problem before calling in the Data Science Division.

What makes a good data scientist is a healthy dose of common sense sometimes.

Working over the weekend, the team applied visualization methods to datasets SMRT gave them, creating charts that could show them any helpful pattern. Expanding visualization to more than two dimensions with a technique called a Marey chart – a way of visualizing public transport data to depict vehicle movement in time and geography. Adding some algorithm magic of their own, they set down a path for a breakthrough.

“Good visualization is important – the one we were initially shown was a two-dimensional plot, showing the geography of the line,” Daniel, a computer science graduate from Cambridge, says. “But it did not show how the incidents occurred with respect to time. So these later visualizations helped a lot.”

Looking at the data representations in more imaginative ways as well as offline viewing of video logs well into the night (a duty that Clarence, a systems engineer and economics graduate, took on with aplomb) allowed the team to put everything together.

Their expertise at scrying information like this played a big part in grokking the big picture. With modesty, they say there was a healthy dose of luck involved in zooming into the correct segment of the chart that gave them what they were looking for.

GovTech Data Science Division - MRT data visualization

The team came up with this pattern when they zoomed into the visualization. This helped them put together the most important pieces of the puzzle and determine a particular train or trains could be causing the problem. Image credit: GovTech.

But it’s not just luck, of course – instinct is key. “What makes one choose this plot over another is not something you can learn in a coding class, I think,” says Shangqian, an Oxford-educated engineer. “You gain [that] through regular practice and solving problems from day to day.”

“What makes a good data scientist is a healthy dose of common sense sometimes,” he adds with a laugh.

Once train 46 was confirmed as the culprit, the LTA and DSTA looked into the mischievous vehicle to figure out what was wrong with it. The jury is still out on that, but in the meantime the train is cooling its wheels, safely out of circulation.

Making sense of Singapore through text

As a way to highlight the work that GovTech’s data science division does, the “rogue train” case is a particularly spectacular one, complete with a mystery, clues, heroes, and its own robotic villain.

But GovTech’s data scientists work on many other projects meant to improve the lives of Singaporeans through the troves of data available in the country. Much of this information, in fact, is open source and available to third parties for development and analysis through GovTech’s own data portal.

Text analytics helps the government quantify information hidden in written form.

For example, there’s a text analytics team working on extracting data from written material to produce actionable insights, using a machine learning technique called topic modeling. To demonstrate, the team worked with material from Singapore’s Parliament, comparing speeches from different times in history to find out what topics were discussed. In easy-to-parse word clouds and charts, the project shows how a growing country’s concerns and priorities evolve.

This is important because it helps government agencies quantify information that’s hidden in written form, from parliamentary speeches to user feedback posts. For example, the country’s Housing Development Board, which is in charge of assigning state housing to Singaporeans, used feedback-based data to figure out that a significant percentage of people asked for flexible hours to collect their new house keys, instead of the set times it was using. This was an insight no one would have paid much attention to without these stats.

“In the past, they could look at some of the feedback data, but there was no way to systematically measure the percentage of topics,” Feng-Yuan explains. “With data, it was easier to make a case for changing their policies.”

See: How Singapore will run the country using APIs

Feeling the pulse

The Pulse of the Economy project tracks data from various sources to provide insights about a region’s economic activity. It was announced during GovTech’s official launch in October.

For now, the project is tracking transportation and electricity consumption data, but it plans to keep adding datasets as they become available. For example, information from jobs portals could bring employment data into the mix. The team prefers to use this type of “non-traditional” data, rather than survey-based information that’s the government norm at the moment.

GovTech Pulse of the Economy data

Pulse of the Economy currently tracks electricity consumption and transportation usage to measure economic activity in an area. Image credit: GovTech.

Port data that tracks ships coming and going in Singapore could also provide useful insights on commerce – or even the state of global trade, for example by tracking the numbers of idle ships parked in Singapore’s east coast when there are no jobs for them.

“What we try to do is find simple solutions to pressing problems,” Feng-Yuan says. Thankfully, the team can work on these projects long-term, without having to come up with solutions over a weekend.

It doesn’t capture the imagination quite like the hunt for a rogue train, but at least it doesn’t involve frantic phone calls and WhatsApp messages on a Saturday morning.

This post https://www.techinasia.com/govtech-data-scientists-projects appeared first on Tech in Asia.



from Tech in Asia https://www.techinasia.com/govtech-data-scientists-projects
via IFTTT

No comments:

Post a Comment