Package Exports
- @mobileai/react-native
- @mobileai/react-native/generate-map
- @mobileai/react-native/package.json
Readme
Agentic AI for React Native
Add an autonomous AI agent to any React Native app — no rewrite needed. Wrap your app with
<AIAgent>and get: natural language UI control, real-time voice conversations, and a built-in knowledge base. Fully customizable, production-grade security, performant, and lightweight. Plus: an MCP bridge that lets any AI connect to and test your app.
Two names, one package — pick whichever you prefer:
npm install @mobileai/react-native
# — or —
npm install react-native-agentic-ai🤖 AI Agent — Autonomous UI Control
🧪 AI-Powered Testing — Test Your App in English, Not Code
Google Antigravity running 5 checks on the emulator and finding 5 real bugs — zero test code, zero selectors, just English.
Two names, one package — install either: @mobileai/react-native or react-native-agentic-ai
⭐ If this helped you, star this repo — it helps others find it!
🧠 How It Works — Structure-First Agentic AI
What if your AI could understand your app the way a real user does — not by looking at pixels, but by reading the actual UI structure?
That's what this SDK does. It reads your app's live UI natively — every button, label, input, and screen — in real time. The AI understands your app's structure, not a screenshot of it.
No OCR. No image pipelines. No selectors. No annotations. No view wrappers.
The result: an AI that truly understands your app — and can act on it autonomously.
| This SDK | Screenshot-based AI | Build It Yourself | |
|---|---|---|---|
| Setup | <AIAgent> — one wrapper |
Vision model + custom pipeline | Months of custom code |
| How it reads UI | Native structure — real time | Screenshot → OCR | Custom integration |
| AI agent loop | ✅ Built-in multi-step | ❌ Build from scratch | ❌ Build from scratch |
| Voice mode | ✅ Real-time bidirectional | ❌ | ❌ |
| Custom business logic | ✅ useAction hook |
Custom code | Custom code |
| MCP bridge (any AI connects) | ✅ One command | ❌ | ❌ |
| Knowledge base | ✅ Built-in retrieval | ❌ | ❌ |
✨ What's Inside
Ship to Production
🤖 Autonomous AI Agent — Natural Language UI Automation
Your users describe what they want in natural language. The SDK reads the live screen, plans a sequence of actions, and executes them end-to-end — tapping buttons, filling forms, navigating screens — all autonomously. Powered by Gemini. OpenAI is also supported as a text mode alternative.
- Zero-config — wrap your app with
<AIAgent>, done. No annotations, no selectors - Multi-step reasoning — navigates across screens to complete complex tasks
- Custom actions — expose any business logic (checkout, API calls, mutations) via
useAction - Knowledge base — AI queries your FAQs, policies, product data on demand
- Human-in-the-loop — native
Alert.alertconfirmation before critical actions
🎤 Real-time Voice AI Agent — Bidirectional Audio with Gemini Live API
Full bidirectional voice AI powered by the Gemini Live API (Gemini only). Users speak naturally; the agent responds with voice AND controls your app simultaneously.
- Sub-second latency — real-time audio via WebSockets, not turn-based
- Full UI control — same tap, type, navigate, custom actions as text mode — all by voice
- Screen-aware — auto-detects screen changes and updates its context instantly
💡 Speech-to-text in text mode: Install
expo-speech-recognitionand a mic button appears in the chat bar — letting users dictate messages instead of typing. This is separate from voice mode.
Supercharge Your Dev Workflow
🔌 MCP Bridge — Connect Any AI to Your App
Your app becomes MCP-compatible with one prop. Any AI that speaks the Model Context Protocol — editors, autonomous agents, CI/CD pipelines, custom scripts — can remotely read and control your app.
The MCP bridge uses the same AgentRuntime that powers the in-app AI agent. If the agent can do it via chat, an external AI can do it via MCP.
MCP-only mode — just want testing? No chat popup needed:
<AIAgent
showChatBar={false}
mcpServerUrl="ws://localhost:3101"
apiKey="YOUR_KEY"
navRef={navRef}
>
<App />
</AIAgent>🧪 AI-Powered Testing via MCP
The most powerful use case: test your app without writing test code. Connect your AI (Antigravity, Claude Desktop, or any MCP client) to the emulator and describe what to check — in English. No selectors to maintain, no flaky tests, self-healing by design.
Skip the test framework. Just ask:
Ad-hoc — ask your AI anything about the running app:
"Is the Laptop Stand price consistent between the home screen and the product detail page?"
YAML Test Plans — commit reusable checks to your repo:
# tests/smoke.yaml
checks:
- id: price-sync
check: "Read the Laptop Stand price on home, tap it, compare with detail page"
- id: profile-email
check: "Go to Profile tab. Is the email displayed under the user's name?"Then tell your AI: "Read tests/smoke.yaml and run each check on the emulator"
Real Results — 5 bugs found autonomously:
| # | What was checked | Bug found | AI steps |
|---|---|---|---|
| 1 | Price consistency (list → detail) | Laptop Stand: $45.99 vs $49.99 | 2 |
| 2 | Profile completeness | Email missing — only name shown | 2 |
| 3 | Settings navigation | Help Center missing from Support section | 2 |
| 4 | Description vs specifications | "breathable mesh" vs "Leather Upper" | 3 |
| 5 | Cross-screen price sync | Yoga Mat: $39.99 vs $34.99 | 4 |
📦 Installation
Two names, one package — pick whichever you prefer:
npm install @mobileai/react-native
# — or —
npm install react-native-agentic-aiNo native modules required by default. Works with Expo managed workflow out of the box — no eject needed.
Optional Dependencies
📸 Screenshots — for image/video content understanding
npx expo install react-native-view-shot🎙️ Speech-to-Text in Text Mode — dictate messages instead of typing
npx expo install expo-speech-recognitionAutomatically detected. No extra config needed — a mic icon appears in the text chat bar, letting users speak their message instead of typing. This is separate from voice mode.
🎤 Voice Mode — real-time bidirectional voice agent
npm install react-native-audio-apiExpo Managed — add to app.json:
{
"expo": {
"android": { "permissions": ["RECORD_AUDIO", "MODIFY_AUDIO_SETTINGS"] },
"ios": { "infoPlist": { "NSMicrophoneUsageDescription": "Required for voice chat with AI assistant" } }
}
}Then rebuild: npx expo prebuild && npx expo run:android (or run:ios)
Expo Bare / React Native CLI — add RECORD_AUDIO + MODIFY_AUDIO_SETTINGS to AndroidManifest.xml and NSMicrophoneUsageDescription to Info.plist, then rebuild.
Hardware echo cancellation (AEC) is automatically enabled — no extra setup.
🚀 Quick Start
1. Enable Screen Mapping (optional, recommended)
Add one line to your metro.config.js — the AI gets a map of every screen in your app, auto-generated on each dev start:
// metro.config.js
require('@mobileai/react-native/generate-map').autoGenerate(__dirname);Or generate it manually anytime:
npx @mobileai/react-native generate-mapWithout this, the AI can only see the currently mounted screen — it has no idea what other screens exist or how to reach them. Example: "Write a review for the Laptop Stand" — the AI sees the Home screen but doesn't know a
WriteReviewscreen exists 3 levels deep. With a map, it sees every screen in your app and knows exactly how to get there:Home → Products → Detail → Reviews → WriteReview.
2. Wrap Your App
React Navigation
import { AIAgent } from '@mobileai/react-native'; // or 'react-native-agentic-ai'
import { NavigationContainer, useNavigationContainerRef } from '@react-navigation/native';
import screenMap from './ai-screen-map.json'; // auto-generated by step 1
export default function App() {
const navRef = useNavigationContainerRef();
return (
<AIAgent
// ⚠️ Prototyping ONLY — don't ship API keys in production
apiKey="YOUR_API_KEY"
// ✅ Production: route through your secure backend proxy
// proxyUrl="https://api.yourdomain.com/ai-proxy"
// proxyHeaders={{ Authorization: `Bearer ${userToken}` }}
navRef={navRef}
screenMap={screenMap} // optional but recommended
>
<NavigationContainer ref={navRef}>
{/* Your existing screens — zero changes needed */}
</NavigationContainer>
</AIAgent>
);
}Expo Router
In your root layout (app/_layout.tsx):
import { AIAgent } from '@mobileai/react-native'; // or 'react-native-agentic-ai'
import { Slot, useNavigationContainerRef } from 'expo-router';
import screenMap from './ai-screen-map.json'; // auto-generated by step 1
export default function RootLayout() {
const navRef = useNavigationContainerRef();
return (
<AIAgent
apiKey={process.env.AI_API_KEY!}
navRef={navRef}
screenMap={screenMap}
>
<Slot />
</AIAgent>
);
}Choose Your Provider
The examples above use Gemini (default). To use OpenAI for text mode, add the provider prop. Voice mode is not supported with OpenAI.
<AIAgent
provider="openai"
apiKey="YOUR_OPENAI_API_KEY"
// model="gpt-4.1-mini" ← default, or use any OpenAI model
navRef={navRef}
>
{/* Same app, different brain */}
</AIAgent>A floating chat bar appears automatically. Ask the AI to navigate, tap buttons, fill forms, answer questions.
Knowledge-Only Mode — AI Assistant Without UI Automation
Set enableUIControl={false} for a lightweight FAQ / support assistant. Single LLM call, ~70% fewer tokens:
<AIAgent enableUIControl={false} knowledgeBase={KNOWLEDGE} />| Full Agent (default) | Knowledge-Only | |
|---|---|---|
| UI analysis | ✅ Full structure read | ❌ Skipped |
| Tokens per request | ~500-2000 | ~200 |
| Agent loop | Up to 25 steps | Single call |
| Tools available | 7 | 2 (done, query_knowledge) |
🗺️ Screen Mapping — Navigation Intelligence
By default, the AI navigates by reading what's on screen and tapping visible elements. Screen mapping gives the AI a complete map of every screen and how they connect — via static analysis of your source code (AST). No API key needed, runs in ~2 seconds.
Setup (one line)
Add to your metro.config.js — the screen map auto-generates every time Metro starts:
// metro.config.js
require('@mobileai/react-native/generate-map').autoGenerate(__dirname);
// ... rest of your Metro configThen pass the generated map to <AIAgent>:
import screenMap from './ai-screen-map.json';
<AIAgent screenMap={screenMap} navRef={navRef}>
<App />
</AIAgent>That's it. Works with both Expo Router and React Navigation — auto-detected.
What It Gives the AI
| Without Screen Map | With Screen Map |
|---|---|
| AI sees only the current screen | AI knows every screen in your app |
| Must explore to find features | Plans the full navigation path upfront |
| Deep screens may be unreachable | Knows each screen's navigatesTo links |
| No knowledge of dynamic routes | Understands item/[id], category/[id] patterns |
Disable Without Removing
<AIAgent screenMap={screenMap} useScreenMap={false} />Advanced: Watch mode, CLI options, and npm scripts
Manual generation:
npx @mobileai/react-native generate-mapWatch mode — auto-regenerates on file changes:
npx @mobileai/react-native generate-map --watchnpm scripts — auto-run before start/build:
{
"scripts": {
"generate-map": "npx @mobileai/react-native generate-map",
"prestart": "npm run generate-map",
"prebuild": "npm run generate-map"
}
}| Flag | Description |
|---|---|
--watch, -w |
Watch for file changes and auto-regenerate |
--dir=./path |
Custom project directory |
💡 The generated
ai-screen-map.jsonis committed to your repo — no runtime cost.
🧠 Knowledge Base
Give the AI domain knowledge it can query on demand — policies, FAQs, product details. Uses a query_knowledge tool to fetch only relevant entries (no token waste).
Static Array
import type { KnowledgeEntry } from '@mobileai/react-native'; // or 'react-native-agentic-ai'
const KNOWLEDGE: KnowledgeEntry[] = [
{
id: 'shipping',
title: 'Shipping Policy',
content: 'Free shipping on orders over $75. Standard: 5-7 days. Express: 2-3 days.',
tags: ['shipping', 'delivery'],
},
{
id: 'returns',
title: 'Return Policy',
content: '30-day returns on all items. Refunds in 5-7 business days.',
tags: ['return', 'refund'],
screens: ['product/[id]', 'order-history'], // only surface on these screens
},
];
<AIAgent knowledgeBase={KNOWLEDGE} />Custom Retriever — Bring Your Own Search
<AIAgent
knowledgeBase={{
retrieve: async (query: string, screenName?: string) => {
const results = await fetch(`/api/knowledge?q=${query}&screen=${screenName}`);
return results.json();
},
}}
/>🔌 MCP Bridge Setup — Connect AI Editors to Your App
Architecture
┌──────────────────┐ ┌──────────────────┐ WebSocket ┌──────────────────┐
│ Antigravity │ Streamable HTTP │ │ │ │
│ Claude Desktop │ ◄──────────────► │ @mobileai/ │ ◄─────────────► │ Your React │
│ or any MCP │ (port 3100) │ mcp-server │ (port 3101) │ Native App │
│ compatible AI │ + Legacy SSE │ │ │ │
└──────────────────┘ └──────────────────┘ └──────────────────┘Setup in 3 Steps
1. Start the MCP bridge — no install needed:
npx @mobileai/mcp-server2. Connect your React Native app:
<AIAgent
apiKey="YOUR_API_KEY"
mcpServerUrl="ws://localhost:3101"
/>3. Connect your AI:
Google Antigravity
Add to ~/.gemini/antigravity/mcp_config.json:
{
"mcpServers": {
"mobile-app": {
"command": "npx",
"args": ["@mobileai/mcp-server"]
}
}
}Click Refresh in MCP Store. You'll see mobile-app with 2 tools: execute_task and get_app_status.
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"mobile-app": {
"url": "http://localhost:3100/mcp/sse"
}
}
}Other MCP Clients
- Streamable HTTP:
http://localhost:3100/mcp - Legacy SSE:
http://localhost:3100/mcp/sse
MCP Tools
| Tool | Description |
|---|---|
execute_task(command) |
Send a natural language command to the app |
get_app_status() |
Check if the React Native app is connected |
Environment Variables
| Variable | Default | Description |
|---|---|---|
MCP_PORT |
3100 |
HTTP port for MCP clients |
WS_PORT |
3101 |
WebSocket port for the React Native app |
🔌 API Reference
<AIAgent> Props
| Prop | Type | Default | Description |
|---|---|---|---|
apiKey |
string |
— | API key for your provider (prototyping only). |
provider |
'gemini' | 'openai' |
'gemini' |
LLM provider for text mode. |
proxyUrl |
string |
— | Backend proxy URL (production). |
proxyHeaders |
Record<string, string> |
— | Auth headers for proxy. |
voiceProxyUrl |
string |
— | Dedicated proxy for Voice Mode WebSockets. |
voiceProxyHeaders |
Record<string, string> |
— | Auth headers for voice proxy. |
model |
string |
Provider default | Model name (e.g. gemini-2.5-flash, gpt-4.1-mini). |
navRef |
NavigationContainerRef |
— | Navigation ref for auto-navigation. |
maxSteps |
number |
25 |
Max agent steps per task. |
maxTokenBudget |
number |
— | Max total tokens before auto-stopping the agent loop. |
maxCostUSD |
number |
— | Max estimated cost (USD) before auto-stopping. |
showChatBar |
boolean |
true |
Show the floating chat bar. |
enableVoice |
boolean |
true |
Enable voice mode tab. |
enableUIControl |
boolean |
true |
When false, AI becomes knowledge-only. |
screenMap |
ScreenMap |
— | Pre-generated screen map from generate-map CLI. |
useScreenMap |
boolean |
true |
Set false to disable screen map without removing the prop. |
instructions |
{ system?, getScreenInstructions? } |
— | Custom system prompt + per-screen instructions. |
customTools |
Record<string, ToolDefinition | null> |
— | Override or remove built-in tools. |
knowledgeBase |
KnowledgeEntry[] | KnowledgeRetriever |
— | Domain knowledge the AI can query. |
knowledgeMaxTokens |
number |
2000 |
Max tokens for knowledge results. |
mcpServerUrl |
string |
— | WebSocket URL for MCP bridge. |
accentColor |
string |
— | Accent color for the chat bar. |
theme |
ChatBarTheme |
— | Full chat bar color customization. |
onResult |
(result) => void |
— | Called when agent finishes. |
onBeforeStep |
(stepCount) => void |
— | Called before each step. |
onAfterStep |
(history) => void |
— | Called after each step. |
onTokenUsage |
(usage) => void |
— | Token usage per step. |
onAskUser |
(question) => Promise<string> |
— | Handle ask_user inline — agent waits for your response. |
stepDelay |
number |
— | Delay between steps (ms). |
router |
{ push, replace, back } |
— | Expo Router instance. |
pathname |
string |
— | Current pathname (Expo Router). |
debug |
boolean |
false |
Enable SDK debug logging. |
🎨 Customization
// Quick — one color:
<AIAgent accentColor="#6C5CE7" />
// Full theme:
<AIAgent
accentColor="#6C5CE7"
theme={{
backgroundColor: 'rgba(44, 30, 104, 0.95)',
inputBackgroundColor: 'rgba(255, 255, 255, 0.12)',
textColor: '#ffffff',
successColor: 'rgba(40, 167, 69, 0.3)',
errorColor: 'rgba(220, 53, 69, 0.3)',
}}
/>useAction — Custom AI-Callable Business Logic
import { useAction } from '@mobileai/react-native'; // or 'react-native-agentic-ai'
function CartScreen() {
const { cart, clearCart, getTotal } = useCart();
useAction('checkout', 'Place the order and checkout', {}, async () => {
if (cart.length === 0) return { success: false, message: 'Cart is empty' };
// Human-in-the-loop: AI pauses until user taps Confirm
return new Promise((resolve) => {
Alert.alert('Confirm Order', `Place order for $${getTotal()}?`, [
{ text: 'Cancel', onPress: () => resolve({ success: false, message: 'User denied.' }) },
{ text: 'Confirm', onPress: () => { clearCart(); resolve({ success: true, message: `Order placed!` }); } },
]);
});
});
}useAI — Headless / Custom Chat UI
import { useAI } from '@mobileai/react-native'; // or 'react-native-agentic-ai'
function CustomChat() {
const { send, isLoading, status, messages } = useAI();
return (
<View style={{ flex: 1 }}>
<FlatList data={messages} renderItem={({ item }) => <Text>{item.content}</Text>} />
{isLoading && <Text>{status}</Text>}
<TextInput onSubmitEditing={(e) => send(e.nativeEvent.text)} placeholder="Ask the AI..." />
</View>
);
}Chat history persists across navigation. Override settings per-screen:
const { send } = useAI({
enableUIControl: false,
onResult: (result) => router.push('/(tabs)/chat'),
});🔒 Security & Production
Backend Proxy — Keep API Keys Secure
<AIAgent
proxyUrl="https://myapp.vercel.app/api/gemini"
proxyHeaders={{ Authorization: `Bearer ${userToken}` }}
voiceProxyUrl="https://voice-server.render.com" // only if text proxy is serverless
navRef={navRef}
>
voiceProxyUrlfalls back toproxyUrlif not set. Only needed when your text API is on a serverless platform that can't hold WebSocket connections.
Next.js Text Proxy Example
import { NextResponse } from 'next/server';
export async function POST(req: Request) {
const body = await req.json();
const response = await fetch('https://generativelanguage.googleapis.com/...', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'x-goog-api-key': process.env.GEMINI_API_KEY! },
body: JSON.stringify(body),
});
return NextResponse.json(await response.json());
}Express WebSocket Proxy (Voice Mode)
const express = require('express');
const { createProxyMiddleware } = require('http-proxy-middleware');
const app = express();
const geminiProxy = createProxyMiddleware({
target: 'https://generativelanguage.googleapis.com',
changeOrigin: true,
ws: true,
pathRewrite: (path) => `${path}${path.includes('?') ? '&' : '?'}key=${process.env.GEMINI_API_KEY}`,
});
app.use('/v1beta/models', geminiProxy);
const server = app.listen(3000);
server.on('upgrade', geminiProxy.upgrade);Element Gating — Hide Elements from AI
<Pressable aiIgnore={true}><Text>Admin Panel</Text></Pressable>Content Masking — Sanitize Before LLM Sees It
<AIAgent transformScreenContent={(c) => c.replace(/\b\d{13,16}\b/g, '****-****-****-****')} />Screen-Specific Instructions
<AIAgent instructions={{
system: 'You are a food delivery assistant.',
getScreenInstructions: (screen) => screen === 'Cart' ? 'Confirm total before checkout.' : undefined,
}} />Lifecycle Hooks
| Hook | When |
|---|---|
onBeforeStep |
Before each agent step |
onAfterStep |
After each step (with full history) |
onBeforeTask |
Before task execution |
onAfterTask |
After task completes |
🧩 AIZone — Declarative AI Modification Boundaries
<AIZone> lets you explicitly tell the AI where it is and isn't allowed to modify the UI. Everything outside a zone is read-only forever. Everything inside a zone gets only the permissions you grant.
Zero visual impact — it's a transparent boundary, not a wrapper component. No extra styles, no layout changes.
Why Use Zones?
Without zones, the AI reads the UI but can only interact via standard taps and navigation. With zones, you unlock proactive AI-native behaviors: the AI can simplify cluttered UIs, inject contextual cards, and highlight elements to guide users — all within your declared boundaries.
Setup
import { AIZone } from '@mobileai/react-native'; // or 'react-native-agentic-ai'Permissions
Wrap any section with <AIZone> and grant only the permissions you want:
<AIZone
id="product-detail"
allowHighlight // AI can draw an animated ring to draw user attention
allowInjectHint // AI can add a tooltip above any element in this zone
allowSimplify // AI can hide low-priority children to reduce clutter
allowInjectCard // AI can inject a built-in card template into this zone
templates={[InfoCard, ReviewSummary]} // required when allowInjectCard
>
<ProductImage />
<ProductTitle />
<ProductDescription aiPriority="low" /> {/* hidden when AI simplifies */}
<SizeSelector />
<AddToCartButton />
<AdvancedOptions aiPriority="low" /> {/* hidden when AI simplifies */}
</AIZone>
{/* These are never touched by the AI */}
<BrandHeader />
<PaymentSection />aiPriority Prop
Mark elements as low-priority within a allowSimplify zone — the AI hides them to reduce cognitive load:
<AIZone id="settings" allowSimplify>
<PrimaryOption aiPriority="high" /> {/* always visible */}
<AdvancedOption aiPriority="low" /> {/* hidden when AI simplifies */}
</AIZone>No extra imports needed — aiPriority is a typed prop on all standard React Native elements (View, Pressable, etc.).
Built-in Card Templates
When allowInjectCard is set, pass an array of your pre-approved templates. The AI picks the right one and injects it with contextually relevant props:
import { AIZone, InfoCard, ReviewSummary } from '@mobileai/react-native';
<AIZone
id="hero-banner"
allowInjectCard
templates={[InfoCard, ReviewSummary]}
>
<HeroBanner />
</AIZone>| Template | Best for |
|---|---|
InfoCard |
Policy snippets, tips, FAQs |
ReviewSummary |
Average rating, review count on product screens |
You can also pass your own custom card components — just set MyCard.displayName explicitly (required for minification safety):
MyCard.displayName = 'MyCard';
<AIZone allowInjectCard templates={[MyCard]} id="my-zone">User Dismissal — Always in Control
Every AI modification is reversible by the user:
| AI Action | User escape hatch |
|---|---|
allowHighlight |
Tap anywhere outside the ring (auto-removes after 5s) |
allowInjectHint |
Tap × on the tooltip (auto-removes after 8s) |
allowSimplify |
"Show all options" button always appears at zone bottom |
allowInjectCard |
× close button always rendered on the injected card |
The agent can also call restore_zone(zoneId) itself — if the user says "show everything", the agent reverses all simplification.
What the Agent Sees
Inside the system context, the agent receives a precise summary of each zone — permissions and element list — so it can never hallucinate permissions it wasn't given:
[ZONES ON SCREEN]
zone "product-detail":
permissions: highlight, simplify, card(InfoCard, ReviewSummary)
elements: [0]ProductTitle, [1]ProductDescription(priority=low),
[2]SizeSelector, [3]AddToCartButton, [4]AdvancedOptions(priority=low)AIZone Props
| Prop | Type | Description |
|---|---|---|
id |
string |
Unique zone identifier on this screen |
allowHighlight |
boolean |
AI can draw an animated ring around elements |
allowInjectHint |
boolean |
AI can add a tooltip above elements |
allowSimplify |
boolean |
AI can hide aiPriority="low" children |
allowInjectCard |
boolean |
AI can inject a card template |
templates |
ComponentType[] |
Pre-approved card components (required with allowInjectCard) |
🛠️ Built-in Tools
| Tool | What it does |
|---|---|
tap(index) |
Tap any interactive element — buttons, switches, checkboxes, custom components |
long_press(index) |
Long-press an element to trigger context menus |
type(index, text) |
Type into a text input |
scroll(direction, amount?) |
Scroll content — auto-detects edge, rejects PagerView |
slider(index, value) |
Drag a slider to a specific value |
picker(index, value) |
Select a value from a dropdown/picker |
date_picker(index, date) |
Set a date on a date picker |
navigate(screen) |
Navigate to any screen |
wait(seconds) |
Wait for loading states before acting |
capture_screenshot(reason) |
Capture the screen as an image (requires react-native-view-shot) |
done(text) |
Finish the task with a response |
ask_user(question) |
Ask the user for clarification |
query_knowledge(question) |
Search the knowledge base |
📋 Requirements
- React Native 0.72+
- Expo SDK 49+ (or bare React Native)
- Gemini API key — Get one free, or
- OpenAI API key — Get one
Gemini is the default provider and powers all modes (text + voice). OpenAI is available as a text mode alternative via
provider="openai". Voice mode usesgemini-2.5-flash-native-audio-preview(Gemini only).
📄 License
MIT © Mohamed Salah
👋 Let's connect — LinkedIn