QORA AI Solutions · June 18, 2026 · 8 min read

Edge AI for security cameras: when video stays on-premise and when it goes to the cloud

Most “edge AI for cameras” pitches frame the question wrong. They make it a binary choice — edge or cloud, pick a side. In practice, no real security operation runs all on one side. The right question is harder and more useful: which video, for which inference, for which decision, lives where? This is the decision matrix we’d hand any security team facing the question.

What “edge AI” actually means for a security camera

Three places inference can live, in order of how close to the lens it is:

On-camera — the camera itself runs a model. A handful of detection models (person, vehicle, basic line-crossing) ship with most modern IP cameras. Strict power and compute budget; usually one model at a time.
On a local edge box — a small server on the same LAN as the cameras, often a fanless mini-PC or an industrial NVR variant with a GPU or NPU. Runs heavier models (full multi-class detection, re-identification, anomaly scoring). One box can serve 8–64 cameras, depending on resolution and inference rate.
In the cloud — frames or events are streamed to a cloud GPU pool. Effectively unlimited model size, easy to keep current, expensive in bandwidth and latency.

When industry people say “edge AI,” they usually mean the middle one — the local edge box. That’s the layer with the most interesting trade-offs.

Three real reasons to keep video on the edge

1. Privacy and data sovereignty. Some footage simply can’t leave the building. Hospitals, schools, certain government facilities, jurisdictions with GDPR-style data-residency rules, organizations with internal privacy policies that predate the cloud. If the camera is pointed at a classroom, the safest answer is “the bytes never reach a server we don’t own.” Edge inference makes that promise structural rather than a policy line in a contract.

2. Latency to alert. A door-held-open detection that takes 3–4 seconds round-trip to a cloud GPU is not useful for the operator who needs to redirect a guard now. Edge inference gives you a sub-200ms detection-to-alert path, because the inference and the alert dispatcher are on the same LAN. For incidents where seconds matter — tailgating, fence climbs, vehicle approaches to a restricted area — that latency floor is the whole game.

3. Bandwidth cost and reliability. A single 4K camera at 30 fps is 12–25 Mbps of bitrate. Twenty cameras at one site is a quarter-gig of upstream. Most sites don’t have that budget continuously available; many don’t have a symmetric link at all. Edge inference flips the math: you stream events to the cloud (a few KB of JSON plus an optional thumbnail), not raw video. A site that would have needed a 500 Mbps uplink runs comfortably on a residential-grade connection.

Three real reasons to send video to the cloud anyway

The case for cloud isn’t about being cheaper or simpler. It’s about three specific things edge boxes can’t match.

1. Model freshness. Computer-vision models for security have measurably improved every six months for years. A cloud GPU pool gets the new model the day it’s validated. An edge box at a customer site gets it whenever the integrator can drive out, or whenever the auto-update finally fires. If your operation depends on catching novel threats — new vehicle types, new uniform patterns, evolving attack tactics — the model-update path is a first-class concern. Cloud wins on this dimension without much argument.

2. Multi-site analytics. The edge box knows what’s happening at its site. It can’t tell you whether the same person tried four gates across three buildings tonight. Cross-site recognition, federated incident search, executive dashboards that span a portfolio — these need the events landed somewhere shared. The site’s edge box does the inference; the cloud does the cross-site reasoning on top of the events.

3. Forensic search and retraining. When something happened last Tuesday at 3am, the operator wants to scrub through five hours of footage and ask “show me every person who passed through this gate.” That’s a vector-search problem against a re-identification index, and it’s much easier to do against a cloud-side archive than against twenty edge boxes none of which were designed for that workload. The same archive feeds the next model-improvement cycle.

What a hybrid pattern looks like in practice

Hybrid isn’t a compromise — it’s a deliberate split that puts each kind of work in the place that suits it. The shape that works for most security operations:

Live inference stays local. Real-time detection, tracking, and alerting happen on the same network as the cameras, so the latency floor isn’t set by the WAN.
Events — not video — travel to the cloud. What the cloud sees is structured data about what happened, not the raw pixels. That keeps the bandwidth bill, and the privacy surface, in check.
Forensic review is on-demand. When the operator does need to look at footage, they pull the specific clip they need — not an always-on stream.
Per-site policy beats a single global mode. Different sites have different bandwidth, privacy, and operator-trust constraints. A good architecture lets each site sit where it should without re-architecting.
The cloud is the model’s home; the edge is the model’s workplace. Updates originate centrally, get rolled out on schedules customers control, and earn their place in production before they replace what’s already running.

One design choice worth being deliberate about: what happens to a site when its edge layer is offline. There are two defensible answers — degrade visibly, or fail over to cloud — and they have different security implications. The wrong answer is not deciding, and discovering during an incident that the system silently chose for you.

A decision matrix for security teams

If you’re standing up video AI for the first time — or deciding whether to migrate — the questions worth answering, in order:

How many cameras per site? Under 4, the on-camera AI bundled with the camera is often enough; a dedicated edge box is overkill. 5–64, a single edge box covers the site cleanly. Above 64, you’re into a multi-box per-site cluster or a cloud-anchored architecture.
What’s the upstream bandwidth? Symmetric gig fiber — cloud-anchored is fine. Asymmetric cable or 4G — edge inference is the only credible answer. Mixed fleet — per-site policy. Don’t average across sites; the worst site sets the floor.
What’s the latency tolerance for the use case? Forensic and analytics — cloud is fine. Live alerting on safety-critical detections — edge is the only credible answer. If you’re unsure, default to edge and re-evaluate after the first quarter of operator data.
What’s the privacy regime? If anything pointed at a camera is covered by GDPR, FERPA, HIPAA, or an internal policy you can’t safely defend in a deposition — edge, and never let video leave the premise. Bake that into the architecture; don’t put it on an “encrypted in transit” bullet in your privacy policy and hope.
How often does the model change? Quarterly or slower — edge is manageable. Monthly — you need a robust model-update pipeline whether you go edge or cloud. Weekly — cloud is much easier; if you must go edge, accept that you’re building serious update plumbing.
Do you need cross-site analytics? Yes — you need cloud for the analytics, but edge can still do the heavy inference. No — pure edge is viable.

Common pitfalls

Treating edge as a cost cut. Edge boxes are real hardware with real CapEx, real installation, real failure modes. The win isn’t saving money on cloud GPUs; it’s the latency, privacy, and bandwidth properties. If your vendor pitches edge as “cheaper than cloud,” ask them to show you the five-year TCO including refresh cycles and on-site failure response. The math is rarely as clean as the slide.

Ignoring the model-update path. An edge box running a model from 2024 in 2026 isn’t edge AI; it’s a paperweight. Before you commit to edge, get explicit answers about: how new models reach the box, who pays for the rollout, whether updates are gated by customer maintenance windows, what happens if a customer skips an update for 18 months. If your vendor doesn’t have a clean answer, that’s the answer.

Forgetting the failover story. Edge boxes fail. Power supplies, fans, SSDs, the random kernel panic. Before you go live, decide explicitly: when this box is down, what happens? Does the cloud silently take over (and how would you know)? Does the site go dark (and is your operator console honest about it)? Both answers are defensible; not having decided is the only answer that isn’t.

Lumping privacy and bandwidth together. They’re separate problems with separate answers. A site with a slow uplink but no privacy issue is fine streaming thumbnails to the cloud. A site with great bandwidth but a hard privacy line should still be edge-only. Don’t let the easy reason for edge (bandwidth) hide a hard requirement (privacy) that needs to be designed for separately.

How to pick

If you’re starting from scratch and your sites are reasonably homogeneous: design for hybrid by default. Build the edge inference layer first — that’s the load-bearing one for latency and bandwidth. Wire the cloud event pipeline second. Keep the per-site policy something you can adjust without a redeploy, because the constraints on different sites will surprise you.

If you’re already running pure cloud and your bandwidth bill is climbing as you add sites — that’s the signal to introduce edge inference for the worst-offender sites first. Don’t boil the ocean. Pick the few sites that hurt the most and start there.

If you’re already running pure edge and your operators are starting to ask “has this person tried other gates this week?” — that’s the signal that you need a cloud event layer. Ship the events, leave the video on the premise. Most of the operational value of cloud video AI is in the events, not in the pixels.

Most security teams shouldn’t have to build any of this themselves — this is exactly what QORA is for. If any of the trade-offs above are live questions for your operation, book a demo and we’ll walk you through how it shows up in practice.

← All posts

See QORA →