Banking AI Agent Security: From ChatGPT Shadow IT to Production-Grade Permission Boundaries

Financial institutions face two separate AI security problems. The first is employees using ChatGPT, Claude, and Gemini to draft emails and summarize documents. The second is building an AI agent that can read Jira tickets, query AWS, post to Slack, and trigger incident response workflows. Those problems require different architectures.

This article walks through both: governance controls for daily employee AI usage and the production security harness needed when an agent gets access to sensitive systems.

The Two Security Models

Most banks conflate these problems. They are not the same.

Scenario	Primary Risk	Control Mechanism
Employee asks ChatGPT to rewrite email	Data leakage to third-party LLM	Acceptable use policy, workspace admin controls
Engineer asks Claude to explain code	Source code exposure, incorrect output	Data handling rules, human review
Analyst asks Gemini to summarize docs	Oversharing through document permissions	Google Workspace access governance
AI agent reads Jira, GitHub, AWS, Slack	Unauthorized API calls, privilege escalation	Identity federation, tool-level permissions, approval workflows

The first three are AI usage governance problems. The last one is a secure harness architecture problem.

AI Usage Governance: Controlling Employee LLM Access

When employees use public LLMs, the bank needs to prevent sensitive data from leaving the perimeter while still allowing productivity gains.

Workspace Admin Controls

Google Workspace, Microsoft 365, and similar platforms offer admin-level toggles for third-party AI integrations. These controls block or allow access to ChatGPT, Claude, Gemini, and similar services at the organizational level.

Key controls:

OAuth app allowlisting: Only approved AI services can authenticate via SSO
Data loss prevention (DLP) rules: Block uploads of files tagged as confidential
Context-aware access: Restrict AI service access to managed devices only
Audit logs: Track which employees accessed which AI services and when

Acceptable Use Policy

Technical controls alone do not work. The policy must define:

Which AI services are approved for which use cases
What data can and cannot be shared with external LLMs
Whether employees must use enterprise AI subscriptions (ChatGPT Team, Claude for Work) or can use free consumer accounts
How to handle AI-generated code, contracts, or financial analysis (human review requirements)

Shadow IT Detection

Employees will route around controls. Monitor for:

Browser extensions that inject AI features into Gmail, Slack, or Jira
Personal devices accessing corporate Google Drive or email
Copy-paste patterns that suggest data exfiltration to external AI services

Production AI Agent Security: The Harness Architecture

When the bank builds an AI agent that can take actions in production systems, the security model changes. The agent needs identity, permissions, approval workflows, audit trails, and incident response hooks.

Identity and Authentication

The agent must authenticate as a service principal, not a human user. This allows fine-grained permission boundaries.

Option 1: Service Account with Federated Identity

The agent runs in AWS ECS or Kubernetes. It uses workload identity federation to assume an IAM role. That role has permission to call specific APIs in Jira, GitHub, Slack, and AWS.

# AWS IAM role trust policy for agent workload
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE:sub": "system:serviceaccount:ai-agents:incident-responder"
        }
      }
    }
  ]
}

The agent never holds long-lived credentials. It exchanges a Kubernetes service account token for temporary AWS credentials.

Option 2: OAuth Machine-to-Machine Flow

For SaaS tools like Jira, GitHub, and Slack, the agent uses OAuth client credentials flow. Each tool gets a separate OAuth client ID with scoped permissions.

Jira example:

Client ID: ai-agent-incident-responder
Scopes: read:jira-work, write:jira-work (but not admin:jira-configuration)
Refresh token stored in AWS Secrets Manager, rotated every 30 days

Tool-Level Permission Boundaries

The agent can call multiple APIs. Each tool must enforce its own permission boundary.

Approach 1: Wrapper Functions with Policy Enforcement

Every tool the agent can call goes through a wrapper that checks permissions before execution.

class ToolRegistry:
    def __init__(self, policy_engine):
        self.policy_engine = policy_engine
        self.tools = {}
    
    def register_tool(self, name, func, required_permissions):
        self.tools[name] = {
            "func": func,
            "permissions": required_permissions
        }
    
    def execute_tool(self, agent_id, tool_name, params):
        tool = self.tools.get(tool_name)
        if not tool:
            raise ToolNotFoundError(tool_name)
        
        # Check permissions before execution
        allowed = self.policy_engine.check(
            agent_id=agent_id,
            tool=tool_name,
            action="execute",
            resource=params.get("resource_id")
        )
        
        if not allowed:
            self.log_unauthorized_attempt(agent_id, tool_name, params)
            raise PermissionDeniedError(f"Agent {agent_id} cannot execute {tool_name}")
        
        return tool["func"](params)

The policy engine can be Open Policy Agent (OPA), AWS IAM policy evaluation, or a custom rules engine.

Approach 2: API Gateway with Runtime Policy Evaluation

All agent tool calls go through an API gateway. The gateway evaluates policies in real time.

Flow:

Agent decides to call jira.create_ticket
Agent sends request to internal API gateway
Gateway extracts agent identity from JWT
Gateway queries policy engine: “Can agent X create Jira tickets in project Y?”
If allowed, gateway forwards request to Jira API
Gateway logs request, response, and decision

This centralizes policy enforcement but adds latency (typically 10-50ms per tool call).

Approval Workflows for High-Risk Actions

Some agent actions require human approval. Examples:

Deploying code to production
Modifying AWS security groups
Posting to public Slack channels
Creating or closing Jira tickets in customer-facing projects

Synchronous Approval (Human-in-the-Loop)

The agent pauses execution and sends an approval request to Slack. A human reviews and approves or denies.

def deploy_to_production(params):
    approval_request = {
        "agent_id": "incident-responder",
        "action": "deploy_to_production",
        "params": params,
        "risk_score": 8,
        "requested_at": datetime.utcnow()
    }
    
    # Send to Slack approval channel
    slack_response = slack_client.send_approval_request(
        channel="#agent-approvals",
        request=approval_request
    )
    
    # Wait for human decision (timeout after 5 minutes)
    decision = wait_for_approval(slack_response.thread_ts, timeout=300)
    
    if decision == "approved":
        execute_deployment(params)
    else:
        log_rejected_action(approval_request, decision)
        raise ApprovalDeniedError("Human reviewer denied deployment")

Asynchronous Approval (Post-Hoc Review)

The agent executes the action immediately but flags it for review. If a reviewer later rejects it, the system triggers a rollback or alert.

This works for low-risk actions where speed matters more than pre-approval.

Risk-Scored Auto-Approval

The agent calculates a risk score for each action. Low-risk actions (score < 3) auto-approve. Medium-risk actions (3-7) require one approver. High-risk actions (8-10) require two approvers.

Risk factors:

Which system is being modified (production vs. dev)
What data is being accessed (customer PII vs. internal logs)
Time of day (business hours vs. 3 AM)
Recent agent error rate (if agent failed 3 times in the last hour, increase risk score)

Audit Logging and Observability

Every agent action must be logged with enough detail to reconstruct what happened and why.

Minimum Log Fields

timestamp: When the action occurred
agent_id: Which agent took the action
tool_name: Which tool was called
params: Input parameters (sanitized to remove secrets)
result: Success or failure
decision_trace: Why the agent chose this action (LLM reasoning trace)
approval_status: Auto-approved, human-approved, or rejected
risk_score: Calculated risk score for the action

Log Storage

Logs must be immutable and tamper-evident. Options:

AWS CloudWatch Logs with log group retention policies
Splunk or Datadog with role-based access control
Append-only S3 bucket with object lock enabled

Alerting on Anomalies

Set up alerts for:

Agent attempts unauthorized tool calls (permission denied errors)
Agent makes more than N API calls per minute (possible runaway loop)
Agent accesses resources outside its normal scope (e.g., suddenly querying HR data)
Agent approval requests spike (possible attack or misconfiguration)

Incident Response Hooks

When an agent makes an unauthorized or harmful action, the system must respond quickly.

Rollback Mechanisms

For state-changing actions, implement rollback:

Jira ticket creation: Store ticket ID and provide delete_ticket function
AWS security group modification: Store previous rule set and provide restore_security_group function
Slack message posting: Store message ID and provide delete_message function

Kill Switch

A single API call or Slack command should disable the agent immediately:

def emergency_shutdown(agent_id, reason):
    # Revoke all OAuth tokens
    oauth_client.revoke_tokens(agent_id)
    
    # Disable IAM role
    iam_client.attach_role_policy(
        RoleName=f"agent-{agent_id}",
        PolicyArn="arn:aws:iam::aws:policy/DenyAllAccess"
    )
    
    # Stop running tasks
    ecs_client.stop_task(cluster="ai-agents", task=agent_id)
    
    # Log shutdown
    logger.critical(f"Agent {agent_id} emergency shutdown: {reason}")
    
    # Alert security team
    pagerduty_client.trigger_incident(
        title=f"Agent {agent_id} emergency shutdown",
        severity="critical",
        details=reason
    )

Forensic Log Replay

After an incident, security teams need to replay what the agent did. Store:

Full LLM prompt and response for each decision
Tool call parameters and responses
Intermediate reasoning steps
External API responses

This allows post-mortem analysis: “Why did the agent decide to delete that S3 bucket?”

Deployment Architecture

The agent runs in a controlled environment with network and compute boundaries.

Compute Isolation

Agent runs in dedicated ECS tasks or Kubernetes pods
No SSH access to agent runtime
Secrets injected via AWS Secrets Manager or Kubernetes secrets
Outbound network traffic restricted to approved API endpoints

Network Boundaries

Agent cannot access the internet directly
All tool calls go through internal API gateway
API gateway enforces rate limits and permission checks
Egress traffic logged and monitored

Secrets Management

OAuth tokens stored in AWS Secrets Manager
Secrets rotated every 30 days
Agent retrieves secrets at runtime, never stores them on disk
Secrets access logged to CloudTrail

Failure Modes and Mitigations

Failure Mode	Impact	Mitigation
Agent calls unauthorized API	Permission denied error, logged attempt	Policy engine blocks call, alert triggered
Agent enters infinite loop	API rate limit exhaustion	Circuit breaker stops agent after N failed calls
LLM hallucinates tool parameters	Invalid API call, possible data corruption	Schema validation on all tool inputs
OAuth token compromised	Attacker gains agent permissions	Short-lived tokens, rotation, anomaly detection
Agent approver is unavailable	High-risk action blocked indefinitely	Timeout with fallback to secondary approver
Audit logs deleted	Loss of forensic evidence	Immutable log storage with object lock

Technical Verdict

Use this architecture when:

You are deploying AI agents in regulated industries (finance, healthcare, government)
Agents need access to production systems with sensitive data
Compliance requires audit trails and approval workflows
You need to demonstrate to auditors that agents operate within defined boundaries

Avoid this architecture when:

Agents only read public data or operate in sandboxed environments
You are prototyping and need to move fast (start simple, add controls later)
Your organization lacks the infrastructure to run policy engines, approval workflows, and centralized logging

The key insight is that employee AI usage and production AI agents require different security models. Conflating them leads to either over-restrictive policies that block productivity or under-restrictive policies that create risk. Build the right harness for the right problem.

Source Links

Primary Source: Securing AI Agents in a Bank