GPT4V has new rivals: LLaVa 1.5 & Fuyu-8B
PLUS: Text to Gif, AI app builder by Google, Robots that can self-reward to do complex tasks
Hi folks!👋🏻 This is The Prompt!
Here’s what we have today:
2 new multi-modal models
no-code visual AI app builder by Google
NVIDIA’s robots can self-reward and do tasks at human speed
New text-to-gif model
Let's get it
OpenAI's GPT-4V has new competitors
ChatGPT+ users already have access in the chat, and the first impressions have been very interesting — and divided. People are impressed by certain abilities, but there are also some concerning problems.
However, this week, we got two new multimodal models; both of them are open-sourced & aren’t licensed for commercial use (yet?).
Here’s a quick breakdown on their specs/uses👇🏻
LLaVa 1.5 was released by a team of researchers, and like GPT-4V, can answer questions about images.
What’s interesting about this model is that it’s easy to get it running on consumer-level hardware (GPU with less than 8GB of VRAM).
it can easily locate an object in a photo;
can explain memes;
can’t reliably recognize text';
Fuyu-8B by Adept
Fuyu-8B is an open-source multimodal model by Adept. It understands “knowledge worker” data such as charts, graphs and screens, enabling it to manipulate — and reason over — this data.
can locate very specific elements on a screen;
can extract details from software’s UI;
answer questions about charts/diagrams;
no moderation mechanisms or prompt injection guardrails.
🚨 What else is going on
Google is working on a secret project named Stubb, a no-code visual builder for AI prototypes that will potentially include multi-modal support with Gemini
DeepMind released a paper proposing a framework for evaluating the societal and ethical risks of AI systems.
NVIDIA has unveiled Eureka, an AI agent built on GPT-4, that autonomously generates rewards can then be used to acquire complex skills via reinforcement learning like the “pen spinning” skill below — at human speed! 🤯
[interesting] AI models explained with simple animations
[tutorial] How to build your own AI-generated image with ControlNet and Stable Diffusion
[API] Turn text to gif with this latest model that works alongside Stable Diffusion
[opportunity] The Rundown is hiring “AI tool tester”
[online event] Beyond the hype: Preparing for AI in 2024 (speakers from OpenAI, Meta)
go-pro video of a polar bear diving in the ocean, 8k, HD, dslr, nature footage
✍🏼 Prompt of the Day
a flat panda head origami logo, white background
What'd you think of today's edition?