Financial institutions face two separate AI security problems. The first is employees using ChatGPT, Claude, and Gemini to draft emails and summarize documents. The second is building an AI agent that can read Jira tickets, query AWS, post to Slack, and trigger incident response workflows. Those problems require different architectures.
This article walks through both: governance controls for daily employee AI usage and the production security harness needed when an agent gets access to sensitive systems.
The Two Security Models
Most banks conflate these problems. They are not the same.
| Scenario | Primary Risk | Control Mechanism |
|---|---|---|
| Employee asks ChatGPT to rewrite email | Data leakage to third-party LLM | Acceptable use policy, workspace admin controls |
| Engineer asks Claude to explain code | Source code exposure, incorrect output | Data handling rules, human review |
| Analyst asks Gemini to summarize docs | Oversharing through document permissions | Google Workspace access governance |
| AI agent reads Jira, GitHub, AWS, Slack | Unauthorized API calls, privilege escalation | Identity federation, tool-level permissions, approval workflows |
The first three are AI usage governance problems. The last one is a secure harness architecture problem.
AI Usage Governance: Controlling Employee LLM Access
When employees use public LLMs, the bank needs to prevent sensitive data from leaving the perimeter while still allowing productivity gains.
Workspace Admin Controls
Google Workspace, Microsoft 365, and similar platforms offer admin-level toggles for third-party AI integrations. These controls block or allow access to ChatGPT, Claude, Gemini, and similar services at the organizational level.
Key controls:
- OAuth app allowlisting: Only approved AI services can authenticate via SSO
- Data loss prevention (DLP) rules: Block uploads of files tagged as confidential
- Context-aware access: Restrict AI service access to managed devices only
- Audit logs: Track which employees accessed which AI services and when
Acceptable Use Policy
Technical controls alone do not work. The policy must define:
- Which AI services are approved for which use cases
- What data can and cannot be shared with external LLMs
- Whether employees must use enterprise AI subscriptions (ChatGPT Team, Claude for Work) or can use free consumer accounts
- How to handle AI-generated code, contracts, or financial analysis (human review requirements)
Shadow IT Detection
Employees will route around controls. Monitor for:
- Browser extensions that inject AI features into Gmail, Slack, or Jira
- Personal devices accessing corporate Google Drive or email
- Copy-paste patterns that suggest data exfiltration to external AI services
Production AI Agent Security: The Harness Architecture
When the bank builds an AI agent that can take actions in production systems, the security model changes. The agent needs identity, permissions, approval workflows, audit trails, and incident response hooks.
Identity and Authentication
The agent must authenticate as a service principal, not a human user. This allows fine-grained permission boundaries.
Option 1: Service Account with Federated Identity
The agent runs in AWS ECS or Kubernetes. It uses workload identity federation to assume an IAM role. That role has permission to call specific APIs in Jira, GitHub, Slack, and AWS.
# AWS IAM role trust policy for agent workload
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE:sub": "system:serviceaccount:ai-agents:incident-responder"
}
}
}
]
}
The agent never holds long-lived credentials. It exchanges a Kubernetes service account token for temporary AWS credentials.
Option 2: OAuth Machine-to-Machine Flow
For SaaS tools like Jira, GitHub, and Slack, the agent uses OAuth client credentials flow. Each tool gets a separate OAuth client ID with scoped permissions.
Jira example:
- Client ID:
ai-agent-incident-responder - Scopes:
read:jira-work,write:jira-work(but notadmin:jira-configuration) - Refresh token stored in AWS Secrets Manager, rotated every 30 days
Tool-Level Permission Boundaries
The agent can call multiple APIs. Each tool must enforce its own permission boundary.
Approach 1: Wrapper Functions with Policy Enforcement
Every tool the agent can call goes through a wrapper that checks permissions before execution.
class ToolRegistry:
def __init__(self, policy_engine):
self.policy_engine = policy_engine
self.tools = {}
def register_tool(self, name, func, required_permissions):
self.tools[name] = {
"func": func,
"permissions": required_permissions
}
def execute_tool(self, agent_id, tool_name, params):
tool = self.tools.get(tool_name)
if not tool:
raise ToolNotFoundError(tool_name)
# Check permissions before execution
allowed = self.policy_engine.check(
agent_id=agent_id,
tool=tool_name,
action="execute",
resource=params.get("resource_id")
)
if not allowed:
self.log_unauthorized_attempt(agent_id, tool_name, params)
raise PermissionDeniedError(f"Agent {agent_id} cannot execute {tool_name}")
return tool["func"](params)
The policy engine can be Open Policy Agent (OPA), AWS IAM policy evaluation, or a custom rules engine.
Approach 2: API Gateway with Runtime Policy Evaluation
All agent tool calls go through an API gateway. The gateway evaluates policies in real time.
Flow:
- Agent decides to call
jira.create_ticket - Agent sends request to internal API gateway
- Gateway extracts agent identity from JWT
- Gateway queries policy engine: “Can agent X create Jira tickets in project Y?”
- If allowed, gateway forwards request to Jira API
- Gateway logs request, response, and decision
This centralizes policy enforcement but adds latency (typically 10-50ms per tool call).
Approval Workflows for High-Risk Actions
Some agent actions require human approval. Examples:
- Deploying code to production
- Modifying AWS security groups
- Posting to public Slack channels
- Creating or closing Jira tickets in customer-facing projects
Synchronous Approval (Human-in-the-Loop)
The agent pauses execution and sends an approval request to Slack. A human reviews and approves or denies.
def deploy_to_production(params):
approval_request = {
"agent_id": "incident-responder",
"action": "deploy_to_production",
"params": params,
"risk_score": 8,
"requested_at": datetime.utcnow()
}
# Send to Slack approval channel
slack_response = slack_client.send_approval_request(
channel="#agent-approvals",
request=approval_request
)
# Wait for human decision (timeout after 5 minutes)
decision = wait_for_approval(slack_response.thread_ts, timeout=300)
if decision == "approved":
execute_deployment(params)
else:
log_rejected_action(approval_request, decision)
raise ApprovalDeniedError("Human reviewer denied deployment")
Asynchronous Approval (Post-Hoc Review)
The agent executes the action immediately but flags it for review. If a reviewer later rejects it, the system triggers a rollback or alert.
This works for low-risk actions where speed matters more than pre-approval.
Risk-Scored Auto-Approval
The agent calculates a risk score for each action. Low-risk actions (score < 3) auto-approve. Medium-risk actions (3-7) require one approver. High-risk actions (8-10) require two approvers.
Risk factors:
- Which system is being modified (production vs. dev)
- What data is being accessed (customer PII vs. internal logs)
- Time of day (business hours vs. 3 AM)
- Recent agent error rate (if agent failed 3 times in the last hour, increase risk score)
Audit Logging and Observability
Every agent action must be logged with enough detail to reconstruct what happened and why.
Minimum Log Fields
timestamp: When the action occurredagent_id: Which agent took the actiontool_name: Which tool was calledparams: Input parameters (sanitized to remove secrets)result: Success or failuredecision_trace: Why the agent chose this action (LLM reasoning trace)approval_status: Auto-approved, human-approved, or rejectedrisk_score: Calculated risk score for the action
Log Storage
Logs must be immutable and tamper-evident. Options:
- AWS CloudWatch Logs with log group retention policies
- Splunk or Datadog with role-based access control
- Append-only S3 bucket with object lock enabled
Alerting on Anomalies
Set up alerts for:
- Agent attempts unauthorized tool calls (permission denied errors)
- Agent makes more than N API calls per minute (possible runaway loop)
- Agent accesses resources outside its normal scope (e.g., suddenly querying HR data)
- Agent approval requests spike (possible attack or misconfiguration)
Incident Response Hooks
When an agent makes an unauthorized or harmful action, the system must respond quickly.
Rollback Mechanisms
For state-changing actions, implement rollback:
- Jira ticket creation: Store ticket ID and provide
delete_ticketfunction - AWS security group modification: Store previous rule set and provide
restore_security_groupfunction - Slack message posting: Store message ID and provide
delete_messagefunction
Kill Switch
A single API call or Slack command should disable the agent immediately:
def emergency_shutdown(agent_id, reason):
# Revoke all OAuth tokens
oauth_client.revoke_tokens(agent_id)
# Disable IAM role
iam_client.attach_role_policy(
RoleName=f"agent-{agent_id}",
PolicyArn="arn:aws:iam::aws:policy/DenyAllAccess"
)
# Stop running tasks
ecs_client.stop_task(cluster="ai-agents", task=agent_id)
# Log shutdown
logger.critical(f"Agent {agent_id} emergency shutdown: {reason}")
# Alert security team
pagerduty_client.trigger_incident(
title=f"Agent {agent_id} emergency shutdown",
severity="critical",
details=reason
)
Forensic Log Replay
After an incident, security teams need to replay what the agent did. Store:
- Full LLM prompt and response for each decision
- Tool call parameters and responses
- Intermediate reasoning steps
- External API responses
This allows post-mortem analysis: “Why did the agent decide to delete that S3 bucket?”
Deployment Architecture
The agent runs in a controlled environment with network and compute boundaries.
Compute Isolation
- Agent runs in dedicated ECS tasks or Kubernetes pods
- No SSH access to agent runtime
- Secrets injected via AWS Secrets Manager or Kubernetes secrets
- Outbound network traffic restricted to approved API endpoints
Network Boundaries
- Agent cannot access the internet directly
- All tool calls go through internal API gateway
- API gateway enforces rate limits and permission checks
- Egress traffic logged and monitored
Secrets Management
- OAuth tokens stored in AWS Secrets Manager
- Secrets rotated every 30 days
- Agent retrieves secrets at runtime, never stores them on disk
- Secrets access logged to CloudTrail
Failure Modes and Mitigations
| Failure Mode | Impact | Mitigation |
|---|---|---|
| Agent calls unauthorized API | Permission denied error, logged attempt | Policy engine blocks call, alert triggered |
| Agent enters infinite loop | API rate limit exhaustion | Circuit breaker stops agent after N failed calls |
| LLM hallucinates tool parameters | Invalid API call, possible data corruption | Schema validation on all tool inputs |
| OAuth token compromised | Attacker gains agent permissions | Short-lived tokens, rotation, anomaly detection |
| Agent approver is unavailable | High-risk action blocked indefinitely | Timeout with fallback to secondary approver |
| Audit logs deleted | Loss of forensic evidence | Immutable log storage with object lock |
Technical Verdict
Use this architecture when:
- You are deploying AI agents in regulated industries (finance, healthcare, government)
- Agents need access to production systems with sensitive data
- Compliance requires audit trails and approval workflows
- You need to demonstrate to auditors that agents operate within defined boundaries
Avoid this architecture when:
- Agents only read public data or operate in sandboxed environments
- You are prototyping and need to move fast (start simple, add controls later)
- Your organization lacks the infrastructure to run policy engines, approval workflows, and centralized logging
The key insight is that employee AI usage and production AI agents require different security models. Conflating them leads to either over-restrictive policies that block productivity or under-restrictive policies that create risk. Build the right harness for the right problem.