Daily Guardian UAEDaily Guardian UAE
  • Home
  • UAE
  • What’s On
  • Business
  • World
  • Entertainment
  • Lifestyle
  • Sports
  • Technology
  • Travel
  • Web Stories
  • More
    • Editor’s Picks
    • Press Release
What's On

LEGEND OF THE GAME PAOLO MALDINI PLACES HIS TRUST IN RRS INTERNATIONAL DEVELOPMENT

January 23, 2026

Your Fable reboot preview is here, open world Albion looks gloriously chaotic

January 23, 2026

Brazil and UAE Private Sectors Move to Strengthen Food Security Ties Amid Trade Shifts

January 23, 2026

Your AI could copy our worst instincts, but there’s a fix for AI social bias

January 23, 2026

The mouse that makes your whole setup feel faster is 38% off

January 23, 2026
Facebook X (Twitter) Instagram
Finance Pro
Facebook X (Twitter) Instagram
Daily Guardian UAE
Subscribe
  • Home
  • UAE
  • What’s On
  • Business
  • World
  • Entertainment
  • Lifestyle
  • Sports
  • Technology
  • Travel
  • Web Stories
  • More
    • Editor’s Picks
    • Press Release
Daily Guardian UAEDaily Guardian UAE
Home » Gemini AI is making robots in the office far more useful
Technology

Gemini AI is making robots in the office far more useful

By dailyguardian.aeJuly 11, 20243 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email

Lost in an unfamiliar office building, big box store, or warehouse? Just ask the nearest robot for directions.

A team of Google researchers combined the powers of natural language processing and computer vision to develop a novel means of robotic navigation as part of a new study published Wednesday.

Essentially, the team set out to teach a robot — in this case an Everyday Robot — how to navigate through an indoor space using natural language prompts and visual inputs. Robotic navigation used to require researchers to not only map out the environment ahead of time but also provide specific physical coordinates within the space to guide the machine. Recent advances in what’s known as Vision Language navigation have enabled users to simply give robots natural language commands, like “go to the workbench.” Google’s researchers are taking that concept a step further by incorporating multimodal capabilities, so that the robot can accept natural language and image instructions at the same time.

For example, a user in a warehouse would be able to show the robot an item and ask, “what shelf does this go on?” Leveraging the power of Gemini 1.5 Pro, the AI interprets both the spoken question and the visual information to formulate not just a response but also a navigation path to lead the user to the correct spot on the warehouse floor. The robots were also tested with commands like, “Take me to the conference room with the double doors,” “Where can I borrow some hand sanitizer,” and “I want to store something out of sight from public eyes. Where should I go?”

Or, in the Instagram Reel above, a researcher activates the system with an “OK robot” before asking to be led somewhere where “he can draw.” The robot responds with “give me a minute. Thinking with Gemini …” before setting off briskly through the 9,000-square-foot DeepMind office in search of a large wall-mounted whiteboard.

To be fair, these trailblazing robots were already familiar with the office space’s layout. The team utilized a technique known as “Multimodal Instruction Navigation with demonstration Tours (MINT).” This involved the team first manually guiding the robot around the office, pointing out specific areas and features using natural language, though the same effect can be achieved by simply recording a video of the space using a smartphone. From there the AI generates a topological graph where it works to match what its cameras are seeing with the “goal frame” from the demonstration video.

Then, the team employs a hierarchical Vision-Language-Action (VLA) navigation policy “combining the environment understanding and common sense reasoning,” to instruct the AI on how to translate user requests into navigational action.

The results were very successful with the robots achieving “86 percent and 90 percent end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real world environment,” the researchers wrote.

However, they recognize that there is still room for improvement, pointing out that the robot cannot (yet) autonomously perform its own demonstration tour and noting that the AI’s ungainly inference time (how long it takes to formulate a response) of 10 to 30 seconds turns interacting with the system a study in patience.











Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Keep Reading

Your Fable reboot preview is here, open world Albion looks gloriously chaotic

Your AI could copy our worst instincts, but there’s a fix for AI social bias

The mouse that makes your whole setup feel faster is 38% off

Key moment approaches for NASA’s crewed moon mission

Alder Lake is ending and here’s what it means for your current PC

Waze expands speed bump, toll, and emergency vehicle alerts worldwide

Your next road trip is booked: Forza Horizon 6 comes this May

1Password helps prevent your passwords from going to scam sites

You might actually be able to buy a Tesla robot in 2027

Editors Picks

Your Fable reboot preview is here, open world Albion looks gloriously chaotic

January 23, 2026

Brazil and UAE Private Sectors Move to Strengthen Food Security Ties Amid Trade Shifts

January 23, 2026

Your AI could copy our worst instincts, but there’s a fix for AI social bias

January 23, 2026

The mouse that makes your whole setup feel faster is 38% off

January 23, 2026

Subscribe to News

Get the latest UAE news and updates directly to your inbox.

Latest Posts

An Iftar Table for Everyone, Ramadan Evenings by the Marina

January 23, 2026

Key moment approaches for NASA’s crewed moon mission

January 23, 2026

PASION-Luxury House Yachts

January 23, 2026
Facebook X (Twitter) Pinterest TikTok Instagram
© 2026 Daily Guardian UAE. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.