Studying the impact of innovation on business and society

AInsights: MultiOn and Large Action Models (LAMs), Introducing Google Gemini and Its Version of Copilot, AI-Powered Frames, Disney’s HoloTile for VR

A whole new AI world: Created with Google Gemini

AInsights: Executive-level insights on the latest in generative AI….

MultiOn AI represents a shift from generative AI that passively responds to queries to actively participating in accomplishing tasks

The idea is this, tools such as ChatGPT are trained on large language models (LLMs). A new class of integrated, end-to-end AI solution is emerging to autonomously connect information, processes, digital experiences and outcomes as large action models (LAMs).

MultiOn AI is a new class of tool that makes generative AI actionable. It leverages generative AI to autonomously executive digital processes and digital experiences in the background. It operates in the background of any digital platform, handling tasks that don’t require user attention, It’s aim is to reduce hands-on step work, helping users to focus more on activities and interactions where their time and attention is more valuable.

MultiOn is a software example of what Rabbit’s R1 is also executing through a handheld AI-powered device.

AInsights

Beyond the automation of repetitive tasks, LAMs such as MultiOn AI, can interact with various platforms and services to execute disparate tasks across them. It opens the doors to all kinds of cross-platform applications that will only mature and accelerate exponentially.

For example:

Ordering Food and Making Reservations: Users can instruct MultiOn AI to find restaurants and make reservations.

Organizing Meetings: MultiOn AI can send out meeting invitations automatically.

Entertainment Without Interruptions: MultiOn AI can play videos and music from any platform, skipping over the ads for an uninterrupted experience.

Online Interactions: MultiOn AI can post and interact with others online.

Web Automation and Navigation: MultiOn AI can interact with the web to perform tasks like finding information online, filling out forms, booking flights and accommodations, and populating your online calendar.

Google Bard rebranded as Gemini to officially (finally) take on ChatGPT and other generative AI platforms

Google’s Bard was a code red response to OpenAI’s popular ChatGPT platform. Bard is now Gemini and is officially open for business. While it’s not quite up to ChatGPT or Claude levels, it will compete. It has to.

Google also introduced Gemini Advanced to compete against OpenAI’s pro-level ChatGPT service. It is also going against Microsoft and its Copilot initiatives.

The new app is designed to do an array of tasks, including serving as a personal tutor, helping computer programmers with coding tasks and even preparing job hunters for interviews, Google said.

“It can help you role-play in a variety of scenarios,” said Sissie Hsiao. a Google vice president in charge of the company’s Google Assistant unit, during a briefing with reporters.

Gemini is a “multimodal” system, meaning it can respond to both images and sounds. After analyzing a math problem that included graphs, shapes and other images, it could answer the question much the way a high school student would, according to the New York Times.

After the awkward stage of Google Bard being Bard, let’s not forget “bard” is a professional storyteller, in definition. Google recognizes this it has to compete for day-to-day activities, beyond novelty.

There are now two flavors of Google AI. 1) Gemini powered by Pro 1.0, and 2) Gemini Advanced powered by Ultra 1.0. The latter will cost $19.99 per month for access via a subscription to Google One.

Similar to ChatGPT-4, Gemini is multimodal, which means you can input more than text.

I hear this question a lot, usually in private. So, I’ll just put it here and you can skip over it if you don’t need it. Multimodal refers to the genAI model’s ability to understand, process, and generate content across multiple types of media or ‘modalities’, including text, code, audio, images, and video. This capability allows Gemini or other models to perform tasks that involve more than just text-based inputs and outputs, making it significantly more versatile than traditional large language models (LLMs) that primarily focus on text. For example, Gemini can analyze a picture and generate a text-based description, or it can take a text prompt and produce relevant audio or visual content.

Gemini can visit links on the web, and it can also generate images using Google’s Imagen 2 model (a feature first introduced in February 2024). And like ChatGPT-4, Gemini keeps track of your conversation history so you can revisit previous conversations, as observed by Ars Technica.

Gemini Advanced is ideal for more ‘advanced’ capabilities such as coding, logical reasoning, and collaborating on creative projects. It also allows for longer, more detailed conversations and better understands the context from previous prompts. Gemini Advanced is more powerful and suitable for businesses, developers, and researchers,

Gemini Pro is available in over 40 languages and provides text generation, translation, question answering, and code generation capabilities. Gemini Pro is designed for general users and businesses.

AInsights

Similar to what we’re seeing with Microsoft Copilot, Google Gemini Advanced will be integrated into all Google Workspace and Cloud services through its Google One AI premium plan. This will boost real-time productivity and up-level output for those that learn how to get the most out of each application via multi-modal prompting.

AI-powered AR glasses and other devices are on the horizon

2024 will be the year of AI-powered consumer devices. Humane debuted its AI Pin, which will start shipping in March. The Rabbit R1 is also set to start shipping in March and is already back-ordered several months.

Singapore-based Brilliant Labs just entered the fray with its new Frame AI-powered AR glasses (Designed with ❤️ in Stockholm and San Francisco) powered by a multimodal AI assistant named Noa. The design pays homage to Steve Jobs, John Lennon, and Gandhi.

Not to be confused with Apple’s Vision Pro or Meta’s AR devices, Frame is meant to be worn frequently in the way that you might use Humane’s AI Pin. Both make AI wearable.

Priced at $349, Frame puts AI in front of your eyes using an open-sourced AR lens. It uses voice commands for prompts and is also capable of visual processing. For example, you can hold a plate of fruit in front of you and ask it to tell you about sugar or carb levels or more about the fruit itself.

As your virtual AI-powered agent, Noa performs real-world visual processing, image generation, and real-time speech recognition and translation. This multimodality unlocks the world around wearers in new, fun, and productive ways.

Frame also features integrations with AI-answers engine Perplexity, Stability AI’s text-to-image model Stable Diffusion, OpenAI’s GPT4 text-generation model, and speech recognition system, Whisper.

“The future of human/AI interaction will come to life in innovative wearables and new devices, and I’m so excited to be bringing Perplexity’s real-time answer engine to Brilliant Labs’ Frame,” Aravind Srinivas, CEO and founder of Perplexity, said in a statement.

Imagine looking at a glass of wine and asking Noa the number of calories in the glass (I wouldn’t want to know!) or the story behind the wine. Or let’s say you see jacket that someone else is wearing, and you’d like to know more about it. You could also prompt Frame to help you find the best pricing or summarize reviews as you shop in-store. The results are presented onto the lenses.

AInsights

Frame looks like Steve Jobs’ glasses and perhaps with AI, you can be as smart as him!

The company secured funding from John Hanke, CEO of Niantic, the AR platform behind Pokémon GO. This tells me that it’s at least going to be around for a couple of iterations, which is good news. At $349, and even though I may look like Steve Jobs’ twin, I’ll probably give Frames a shot.

I still haven’t purchased Rabbit’s R1 simply because it’s too far out to have any meaningful reaction to it to help you understand its potential in your life. At $699 (starting) plus a monthly service fee, I just can’t justify the $699 investment in Humane’s AI Pin, though I’d love to experiment with it. To me, Humane is pursuing a post-screen or post smartphone world and I find that fascinating!

Frame’s appeal to me is its open source approach to AI wearables. This could lead to some very interesting innovations post-purchase. I see a competitor in Meta/Ray-Ban Smart Glasses, too. I wouldn’t be surprised if the pair launched AI wearables in the next year.

But to have the creator of Pokémon GO on your side is notable and could help Brilliant Labs conjure up some magic in design and use cases.

Disney introduces HoloTile concept to help you move through virtual reality safely

No one wants to end up like this…

https://twitter.com/briansolis/status/1756793933773070716

Disney Research Imagineered one potential, incredibly innovative solution.

Designed by Imagineer Lanny Smoot, the HoloTile is the world’s first multi-person, omnidirectional, modular treadmill floor for augmented and virtual reality applications.

The Disney HoloTile isn’t premised on a treadmill, instead it’s designed for today’s and tomorrow’s AR, VR, and spatial computing applications. I can only imagine applications that we’ll see on Apple’s Vision Pro and others in the near future.

AInsights

When AR and VR started to spark conversations about a metaverse, new omnidirectional treadmills emerged that were promising, but in more traditional ways.

It reminded me of the original smartphones. They were based on phones. When the iPhone was in development, the idea of a phone would be completely reimagined as, “a revolutionary mobile phone, combining three products—a widescreen iPod with touch controls, and a breakthrough Internet communications device with desktop-class email, web browsing, searching and maps—into one small and lightweight handheld device.”  In fact, after launch, the number of phone calls made on iPhones remained relatively flat while data usage as only continued to spike year after year.

Please subscribe to AInsights.

Please subscribe to my master newsletter, a Quantum of Solis.

3 COMMENTS ON THIS POST To “AInsights: MultiOn and Large Action Models (LAMs), Introducing Google Gemini and Its Version of Copilot, AI-Powered Frames, Disney’s HoloTile for VR”

  1. Brian, very cool article, nicely laid out. Not quite connected to what I am doing with Responsible AI, Ethics, Data & Sustainability, but may have similarities to the ‘Cognitive Futures’ innovations I am working with @ Web3/Blockchain Research Institute. See Neom – https://www.neom.com/en-us/about & Tonomos – https://tonomus.neom.com/en-us

  2. Inrico T says:

    The advancements in AI and technology discussed in [AInsights: MultiOn And Large Action Models (LAMs), Introducing Google Gemini And Its Version Of Copilot, AI-Powered Frames, Disney’s HoloTile For VR] are truly exciting! MultiOn and Large Action Models promise to streamline tasks with incredible efficiency, while Google Gemini’s Copilot could revolutionize how we interact with AI. AI-Powered Frames are paving the way for next-level creativity, and Disney’s HoloTile is bringing the magic of VR to new heights. It’s an inspiring showcase of innovation, hinting at a future where technology and imagination come together to create amazing experiences.

Leave a Reply

Your email address will not be published. Required fields are marked *