Google may be best known for being the internet guy who knows every other internet guy, but there’s much more to this technology giant than just that. The company is also the leader in the browser market, has one of the best email clients, and a great online storage service.
However, there is only one other area beside search where Google truly dominates the whole field; Maps. Google Maps is the simply the best mapping service available today. This unrivaled excellence hasn’t come easily for Google and, despite running a near-monopoly, the company still strives to keep improving Google Maps as much as possible and sets the standards for innovation in the field.
One thing that has always intrigued me is how does Google really find out all the addresses on its Map app? How does it learn the street names? Sure, the images could come from a satellite but how can it really determine the addresses unless somebody filled them out? A lot of people do contribute to Google Maps by filling out general information but there is no way they can account for all the information available on it.
City planners around the world don’t really follow a particular addressing scheme either and hiring people to go around and note down the addresses simply isn’t practical. Although Google does employ some people (contractors) for this job, that isn’t how the most of the work is done. Addresses are ever-changing and new addresses are added around the world every day.
So what does Google do? It uses its prowess in deep learning and its immense database of imagery collected through Street View to add new addresses in Google Maps as well as update the old ones.
Using Deep Learning and Street View Imagery to Learn Street Names
Google has a huge collection of images taken for their Street View program through cars, bikes, trikes, backpack-carrying dudes and what not. Now, they want to utilize all this information and put it to use. For example, they want to find the images of street signs and businesses, extract the text, and then assign those to the respective streets or businesses.
A piece of cake, right? All you have to do is look at an image of a street sign and assign it to the location it was taken at.
The problem? Google has over 80 billion images. That’s a lot and analyzing all that data is humanly impossible. So Google decided to use deep learning to use that to go through the images, find the relevant images, extract the text, and automatically assign it.
What is Deep Learning?
Deep Learning is a type of machine learning that, in some ways, tries to mimic the workings of a human brain. Deep learning algorithms allow a computer to recognize objects or data patterns. Basically, they try to make sense from data on their own and without any human intervention. They are also trying to constantly improve themselves based on previous results.
Google has been one of the leading contributors to the field of deep learning. It even released some consumer-focused tools like “Quick, Draw” and “Autodraw” to give a glimpse of the wonders that can be possible with deep learning.
Using Deep Learning to extract text
Text recognition has always been a challenging computer vision and machine learning problem. Computer algorithms are just not good enough when it comes to reading text from images, especially when the images aren’t always ideal. Some may be tilted or have a different viewpoint, some may be blurry or distorted, while some may be simply too dark. There are also a lot of other problems that arise due to ground truth errors — writing “Avenue” as “Av” on the street-name board, etc.
The company has been working on a number of models to solve this problem. In 2014, they unveiled a new method for reading street numbers on the Street View House Numbers.
“Today, over one-third of addresses globally have had their location improved thanks to this system”
Since then, they have been working on developing a system that could accurately read street names. Just last month, they published a paper called “Attention-based Extraction of Structured Information from Street View Imagery” on their latest model.
How does it work?
The paper details how their network model uses Convolutional Neural Nets and Recurrent Neural Nets and a novel attention mechanism to achieve an accuracy of 84.2% on the challenging French Street Name Signs (FSNS) dataset, a large and challenging training dataset of more than 1 million street names.
Their new model, publicly available now, handily beat the previous state-of-the-art model, Smith’16, which achieved an accuracy of 72.46%. It is also said to be much simpler and more general than the previous approach and can be used to extract other types of information out of Street View images as well, like business names from store fronts.
“Surprisingly, we find that deeper is not always better (in terms of accuracy, as well as speed). Our resulting model is simple, accurate and fast, allowing it to be used at scale on a variety of challenging real-world text extraction problems,” it reads.
Google has made some immense strides in artificial intelligence and machine learning recently, and while tools that could fill in the color in your doodles may sound gimmicky, the trickle-down effects of these advancements can be felt in almost every one of their products. Be it Google search’s amazing knowledge graph, or the incredibly contextual Google Assistant, or be it the ability to know every street name in your neighborhood — the company is so good at some things, it’s hard to imagine a life without it.