Metaheuristic - Applied AI agency

Anyone can demo an LLM. We ship the system that survives production.

[About]

Metaheuristic is an applied AI agency. We take teams from "we tried ChatGPT" to agents that take real actions across your CRM, tickets, docs, and APIs - with permission-aware RAG, evals, guardrails, and the LLMOps to keep them reliable, secure, and on budget.

Agentic workflows

Plan · act · tool calls · human approvals

→

contracts/2026-q2.pdf allow ✓

finance/payroll.xlsx deny ✕

wiki/runbooks allow ✓

hr/reviews deny ✕

Cites

what you can see.

ACL-filtered retrieval

Permission-aware RAG

Citations · ACLs · freshness · evals

→

traces evals prompts versions routing cost latency drift alerts

Run it

like production.

LLMOps / AgentOps

Monitoring · evals · versions · routing

→

LLM01 · prompt injection guarded

LLM02 · insecure output guarded

LLM04 · data poisoning guarded

LLM05 · supply chain review

OWASP LLM Top 10

red-teamed

LLM security review

Injection · output handling · supply chain

→

✓ Answer grounded in sources ✓ No hallucinated citations ✓ Tool call args validated Regression vs golden set

offline online CI gate

Evals & quality gates

Golden sets · graders · CI gates

→

cost · this month

$4,120 → $1,180

−71% · caching + routing

Caching · routing · context budgets

→

route(task) {
  if (simple)   → haiku
  if (reason)   → sonnet
  if (critical) → opus
  if (cached)   → return hit
}

Right model.

Right price.

Model routing

Per task · per cost · with fallback

→

retrieve · reason · act

span-level traces

tokens · latency
· cost

Observability & tracing

Spans · tokens · cost · per step

→

Pending Approved Blocked

act · refund needs approval

issue $480 refund · order #2841

tool: stripe.refunds.create

act · email approved

send follow-up to lead

tool: crm.send_email

act · delete blocked

drop records · out of policy

guardrail: destructive

Human-in-the-loop

Approvals · safety limits · logging

→

Aug 2024 · entered into force

Feb 2025 · prohibited practices

Aug 2025 · GPAI obligations

Aug 2026 · broad applicability

Compliant

by design.

EU AI Act readiness

Risk class · docs · governance

→

Across

your stack.

Tool & system integrations

CRM · ERP · tickets · docs · APIs

→

Audit logs & lifecycle

Every step. Every action. Replayable.

→

[Process]

Audit in. System out.

01 · Audit

Map & prioritize

We map workflows, assess data readiness, and rank opportunities by ROI and risk.

02 · Blueprint

Architecture + prototype

A clear technical plan and a working prototype - before you commit to a production build.

03 · Build

RAG & agents

Permission-aware RAG or agentic workflows with guardrails, memory, and integrations.

04 · Evaluate

Evals + security

Golden-set evals, retrieval quality, and an OWASP LLM red-team before anything ships.

05 · Deploy

Approvals + logging

Ship with human approvals, safety limits, and full audit logs. Code ownership stays with you.

06 · Operate

LLMOps retainer

Monitoring, evals, prompt/version management, model routing, and cost control.

Agentic workflows· Permission-aware RAG· LLMOps / AgentOps· EU AI Act ready· OWASP LLM Top 10· Cost optimization· Evals + guardrails· Agentic workflows· Permission-aware RAG· LLMOps / AgentOps· EU AI Act ready· OWASP LLM Top 10·

[Engagements]

Start small. Scale to production.

AI Workflow Audit

from $9,500

1–2 weeks · diagnostic

· Map workflows + data readiness
· High-ROI opportunity ranking
· Prioritized implementation plan

Book an audit

AI System Blueprint Sprint

from $22,500

2–4 weeks · architecture

· Target architecture + risks
· Working prototype
· Success metrics + roadmap

Scope a sprint

Most requested

Production RAG / Copilot

from $45,000

secure knowledge system

· Permissions + citations
· Retrieval evals + freshness
· Monitoring + audit logs

Talk to us

Agentic Workflow MVP

from $75,000

agents that take action

· Controlled actions across tools
· Human approvals + safety limits
· Logging + observability

Talk to us

LLMOps / AgentOps Retainer

from $7,500 / mo

recurring · on-call

· Monitoring + evals + routing
· Prompt / version management
· Cost control + governance

Start a retainer

Not sure where to start?

A 30-minute call and a fixed-scope audit usually answer it.

Clear scope, milestones, code ownership with you, and measurable success metrics on every engagement.

Book a call →

Fixed scope and milestones. You own the code. Volume and enterprise terms (compliance, multi-agent, production hardening) scoped on request.

[Landscape]

Most shops ship a demo. We ship a system you can run.

At a glance	Metaheuristic	In-house hire	Generic dev shop	No-code (Zapier/Make)	Off-the-shelf copilot	DIY ChatGPT
What you get	Production system	Capacity over time	A build, maybe	Brittle automations	Generic chat	A prototype
Time to production	2–10 weeks	Months to hire	Months	Days, then stuck	Instant	Never quite
Guardrails & approvals	Built in	Depends	-	-	Vendor-set	-
Evals & security	Evals + OWASP LLM	Depends	Rarely	-	Black box	-
Cost control	Routing + caching	Ad hoc	Ad hoc	Per-task fees	$ / seat / mo	Unmanaged
Ownership	Your code + IP	Yours	Negotiated	Locked-in	Vendor	Yours
Ongoing ops	LLMOps retainer	On the team	Hand-off	You maintain	Vendor SLA	You maintain

A demo proves the model can do it once. We build the guardrails, evals, security, and ops that make it hold up every time - and we stay on to run it.

[How it runs]

An agent that asks before it acts.

support-agent · trace

user› "Customer #2841 was double-charged. Fix it."

retrieve› permission-aware RAG · 3 sources
  - orders/2841.json        allow ✓
  - policy/refunds.md       allow ✓
  - finance/ledger.xlsx     deny ✕ (out of scope)

reason› duplicate charge confirmed · within policy
tool› stripe.refunds.create({ amount: 480_00 })

⏸ guardrail: refund > $200 needs human approval
✓ approved by jordan@acme · 41s

› refund issued · customer notified · logged
$ ▍

[Audit log]

{ "action": "refund",
  "amount": 48000,
  "approved_by": "jordan",
  "sources": 2,
  "status": "completed" }

signed · immutable · replayable

[Eval scorecard]

groundedness0.98 tool accuracy0.96 refusalsok

gated in CI before deploy

Book an AI Workflow Audit → Talk to us applied AI agency · metaheuristic.co