Can Google Merge AR and AI?
In all the excitement around generative and conversational AI, Google has gotten lots of grief for missing the party. The common refrain is that these technologies are Google killers. However, there’s just one issue with that take: Google is better positioned than anyone for AI.
Though it rushed and fumbled its Bard launch, Google is sitting on a knowledge graph that it’s been assembling for 20+ years. That data repository is one of the best AI training sets you could ask for. The AI engine that runs on top of it can be built or bought, both of which Google can do.
And it’s already gotten started, given machine learning research and endpoints in Tensor and Transformer. More recently, it launched the AI-powered multisearch. This lets users search with a combination of images (via Google Lens) and text (“show me the same shirt in green”).
Multisearch is powered by a flavour of AI called Multitask Unified Model which processes data across varied formats to infer connections, meaning, and relevance. In Google’s case, these formats include text, photos, and videos, which it, again, has spent the past 20+ years indexing.
Search What You See
Google earlier this month pushed the ball forward by announcing that Multisearch is now available on any mobile device where Google Lens is already available. For those unfamiliar, Lens is Google’s AR feature that provides informational overlays on items you point your phone at.
This applies computer vision, which is AI-driven. And with multisearch, visuals join text to offer optionality for users more inclined towards one or the other. For example, sometimes it’s easier to point your phone at items you encounter IRL, versus describing them with text.
But starting a search in that visual modality only gets you so far. Being able to then refine or filter those results with text (per the above green shirt example) is where the magic happens. And the use cases will begin to expand beyond fashion to fill out all the reaches of web search.
For example, Google noted that it’s working on a flavour of Multisearch that will launch image searches from wherever you are on your phone. Known as “search your screen” it brings Google Lens from your outward-facing smartphone camera to anything that shows up on your screen.
Multisearch also comes in local flavours. Known as Multisearch Near Me, Google applies all of the above to local search. So when searching for that same green shirt, users can query an additional layer of attributes related to proximity. In other words, where can I buy it locally?
Beyond fashion, Google has a food-lust use case pursuant to monetisable searches for local restaurants. For example, see a dish on Instagram that you like, then use that image to identify the dish with Google Lens… then use Multisearch Near Me to find similar fare locally.
Once again, Google is uniquely positioned to pull this off. Though upstarts like OpenAI have impressive AI engines, do they have all that local business and product data? Google is one of the few entities that has such data given Google Business Profiles and Google Shopping.
Google has also applied computer vision and machine learning to localise devices for AR wayfinding in its Live View product. This and all of the above flows from Google’s Space Race ambitions to build a knowledge graph for the physical world… just like it did for the web.