AI assistant on the glasses
The AI assistant is what makes the Core App useful in the field — a real-time voice partner the engineer talks to naturally, eyes on the job.
The AI assistant is what makes the Core App useful in the field — a real-time voice partner the engineer talks to naturally, eyes on the job.
The model
Under the hood it's OpenAI's Realtime API:
- Trainer mode —
gpt-4o-realtime-preview-mini. Lower-cost, listening + transcription focused, no tool calls. - Trainee mode —
gpt-4o-realtime-preview(full). Voice + tool calls + vision.
Both run server-side (in TrainAR's cloud), not on the glasses. The glasses stream audio + video over WebRTC to a TrainAR-managed room; the AI is a participant in that room.
How voice works
There's no hotword. The session is started and stopped explicitly with two voice commands:
| Phrase | What it does |
|---|---|
| "Start training session" | Opens the voice channel, runs pre-flight, joins the AI participant, begins recording |
| "Stop training session" | Closes the session gracefully — recording is sealed, post-session pipeline kicks off |
Once the session is open, the AI uses server-side voice activity detection (VAD). The engineer talks naturally; the AI replies when they finish a phrase. No "Hey TrainAR," no push-to-talk button, no fixed turn-taking — it feels like talking to a colleague.
What the AI can do (trainee mode)
The AI can call tools mid-conversation. The current tool set:
search_knowledge — your tenant's knowledge base
Semantic search over every document your tenant has uploaded (manuals, procedures, wiring diagrams, internal specs). The AI calls this whenever the engineer asks a question whose answer is likely in your knowledge.
The tool's description includes a manifest of your specific document names — so the AI knows what's actually in there before it tries to search. If you upload "ABC Boiler Service Manual v3" it'll know to search there for ABC Boiler questions.
capture_camera — see what the engineer sees
Grabs a single frame from the glasses camera and passes it to the AI for visual analysis. Triggered by phrases like:
- "Look at this."
- "What does this say?"
- "Can you see the model plate?"
- "Is this the right valve?"
The frame is taken on demand only — TrainAR doesn't continuously feed frames to the AI (that would cost a fortune and burn through your minute pool). One frame per "look" request.
Session end — via the explicit voice command
The session is ended cleanly when the engineer says "Stop training session" — the same explicit lifecycle command that started it. The voice channel tears down and the post-session pipeline kicks off. The AI doesn't try to infer end-of-session from conversational cues.
Platform skills (if entitled)
When your tenant subscribes to bundles that include platform skills — for example the Parts & Spares bundle (Parts Arena) — those skills become available to the AI automatically. The Parts Arena bundle adds five skills:
pa_identify_model— boiler ID by GC number.pa_search_parts— free-text parts search.pa_get_parts_list_for_model— full parts list for an identified model.pa_show_manual_page— fetch + render a manual page in the engineer's eyeline.pa_show_exploded_view— fetch + render an exploded-parts diagram.
The engineer doesn't need to remember the skill names — they just ask: "what's the GC number for this Worcester?" or "show me the exploded view of the burner." The AI picks the right tool.
Your custom skills
Skills you've built in Dashboard → Skills & Knowledge become available to the AI for any of your engineers' sessions. Each skill's description field is what the AI uses to decide when to call it — write the descriptions so they describe when to use the skill, not just what it does, and the AI will route correctly.
See Creating a custom skill for the authoring side.
What it can see
| Source | When |
|---|---|
| Glasses camera (single frame) | On-demand, when the engineer asks ("look at this") |
| Knowledge base text + images | Whenever search_knowledge is called |
| Loaded procedure (for the current task) | Always available as session context |
| Manufacturer manual pages + exploded diagrams (via Parts Arena) | On-demand, via the pa_* skills |
| Engineer's voice | Real-time |
What it doesn't see:
- Continuous live video. The recording is captured and saved, but not fed to the real-time AI — that's a deliberate cost + latency choice.
- Anything from your CRM, FSM, or accounting tools — those are handled by integrations which produce tasks, not by the on-glasses AI directly.
- Other tenants' data. Sessions are strictly tenant-scoped.
What it won't do
We deliberately tune the AI to:
- Defer rather than hallucinate. On any safety-critical step (gas, electrical, working at height), if the AI doesn't have a definitive source in your knowledge base or a loaded procedure, it'll say so and recommend the engineer check the spec sheet or call a senior.
- Not interpret regulations. It'll quote and surface relevant regulatory content from documents you've uploaded; it won't pretend to be the regulator.
- Not fabricate part numbers, model numbers, or fault-code meanings. If it doesn't know, it says it doesn't know.
Minute consumption
AI session time consumes minutes from your tenant's minute pool. One minute of active session ≈ one minute consumed. See Minute pools & top-ups for the pricing model.
The minute counter is per-tenant, not per-engineer — engineers don't "run out" individually; the tenant's pool depletes. The Dashboard shows the current balance and burn rate on Reports → Minutes.
Privacy
- The voice stream is in-flight between the glasses and TrainAR's cloud only.
- Frames captured by
capture_cameraare sent to the AI as part of the in-flight message; they're not separately stored. - The full session recording is saved (so admins can review). It can be deleted on request — see Sessions → Privacy & GDPR.
- Knowledge base content stays within your tenant — the AI never references another tenant's documents.
When something goes wrong
The AI is hearing me but not responding. Network issue mid-session. The voice channel may be reconnecting. Wait 3-5 seconds; if no response, stop and restart the session.
The AI keeps interrupting me. VAD is too sensitive — usually caused by background noise being misread as speech. Move to a quieter spot, or report the session via the Dashboard so we can investigate.
The AI gave me a wrong answer. Please flag it in the Dashboard session review. We use those reports to tune the agents.
The session ended unexpectedly. Check the session detail page in the Dashboard for a status reason — usually a network drop, occasionally a quota issue (out of minutes).