mech.app
Dev Tools

Flowsint's Graph-Based Investigation Engine: How OSINT Tools Handle Entity Relationships, Enrichment Pipelines, and Autonomous Exploration

Flowsint's enricher pipeline chains domain transformations without circular loops. Examine graph state, Docker deployment, and self-hosted investigation...

Source: github.com
Flowsint's Graph-Based Investigation Engine: How OSINT Tools Handle Entity Relationships, Enrichment Pipelines, and Autonomous Exploration

Flowsint is a self-hosted OSINT investigation tool built around a graph database and a plugin system for entity enrichment. It currently ranks #7 on GitHub’s TypeScript trending list with 4,228 stars. The project offers concrete lessons about how autonomous exploration tools chain transformations, manage graph state, and handle deployment boundaries when privacy is a hard requirement.

Architecture: Graph State and Enricher Orchestration

Flowsint stores entities (domains, IPs, ASNs, websites) and relationships in a graph. Each enricher is a plugin that consumes one entity type and produces new entities or edges. The orchestration layer chains these enrichers automatically when you add a seed entity.

Core components:

  • Graph store: Persists entities, relationships, and historical snapshots across sessions
  • Enricher registry: Maps entity types to available transformations (domain → IP, IP → ASN, domain → subdomain)
  • Execution queue: Schedules enricher jobs and tracks completion state
  • Frontend canvas: Renders the graph and exposes manual enricher triggers

The system avoids infinite loops by tracking which enrichers have already run on a given entity. Once a domain has been resolved to IPs, the DNS enricher will not re-run unless you explicitly reset its state.

Enricher Pipeline: Chaining Without Circular Dependencies

Flowsint ships with 15+ domain enrichers. Here’s how a typical chain unfolds:

  1. You add a domain entity (example.com)
  2. DNS Resolution enricher runs, creates IP entities and edges
  3. Reverse DNS enricher runs on each IP, discovers additional domains
  4. Subdomain Discovery enricher runs on the original domain, creates subdomain entities
  5. WHOIS enricher runs on the root domain, adds registration metadata
  6. ASN enricher runs on IPs, creates ASN entities and edges

Each enricher declares input and output types. The orchestrator uses these declarations to build a dependency graph. If an enricher produces an entity type that another enricher consumes, the second enricher is queued automatically.

Deduplication strategy:

  • Entities are keyed by type and value (domain:example.com, ip:192.0.2.1)
  • If two enrichers produce the same entity, the graph merges them
  • Relationships are deduplicated by source, target, and edge type
  • Historical snapshots preserve the state before each merge

This approach prevents circular dependencies because the enricher registry is static. You cannot create a cycle where domain → IP → domain → IP loops forever. Each enricher runs once per entity unless you manually reset its state.

Deployment Model: Docker, Localhost, and Session Boundaries

Flowsint deploys as a Docker Compose stack with three containers:

  • Frontend: React app served on localhost:5173
  • Backend: API server with enricher execution logic
  • Database: Graph store (likely Neo4j or a similar graph database)

The deployment is localhost-only by default. There are no default credentials. You must visit /register and create an account before you can start an investigation.

Security boundaries:

LayerMechanismImplication
NetworkLocalhost-only bindingNo external access without reverse proxy
AuthenticationUser registration requiredEach user gets isolated graph state
SessionCookie-based authSessions persist across browser restarts
DataLocal Docker volumesNo cloud sync, no external API calls

This model works well for single-user investigations on a trusted machine. It breaks down if you need multi-user collaboration or remote access. The localhost-only binding means you cannot expose Flowsint to a team without adding a reverse proxy and handling TLS termination yourself.

Autonomous Exploration vs. Manual Queries

Flowsint supports two modes:

Manual mode: You click an entity, select an enricher from a menu, and wait for results. This is useful when you know exactly which transformation you need.

Autonomous mode: You add a seed entity and let the orchestrator run all applicable enrichers. The graph grows automatically as new entities are discovered.

The autonomous mode is not fully autonomous in the agent sense. It does not make decisions about which enrichers to prioritize or when to stop. It simply runs every enricher that matches an entity type until the queue is empty.

Stopping conditions:

  • All enrichers have run on all entities
  • You manually pause the execution queue
  • An enricher fails and you configure the system to halt on errors

There is no built-in budget for API calls, execution time, or graph size. If you seed Flowsint with a popular domain, the subdomain enricher could discover thousands of entities and trigger thousands of downstream enricher jobs.

State Management: Historical Snapshots and Relationship Merging

Flowsint preserves historical snapshots of the graph. When an enricher adds new entities or relationships, the system creates a snapshot before merging. You can rewind to a previous state if an enricher produces bad data.

Snapshot strategy:

  • Snapshots are keyed by timestamp and enricher ID
  • Each snapshot stores a diff (added entities, added edges, modified properties)
  • Rewinding applies diffs in reverse order
  • Snapshots are stored in the same graph database as the live state

This approach works well for small investigations (hundreds of entities). It becomes expensive for large graphs because each snapshot stores a full diff. If you run 50 enrichers on 1,000 entities, you could end up with 50,000 snapshot records.

Relationship merging:

When two enrichers produce the same relationship (domain → IP), Flowsint merges them by adding metadata to the edge. The edge stores:

  • First seen timestamp
  • Last seen timestamp
  • List of enrichers that confirmed the relationship

This metadata helps you understand which relationships are stable and which are transient. If a domain resolves to an IP today but not tomorrow, the edge remains in the graph with updated timestamps.

Observability: Enricher Logs and Execution Traces

Flowsint exposes enricher execution logs in the UI. Each log entry shows:

  • Enricher name
  • Input entity
  • Output entities
  • Execution time
  • Error messages (if any)

There is no distributed tracing or structured logging. If an enricher fails, you see the error message but not the stack trace or the API response that caused the failure.

Failure modes:

  • Rate limiting: Enrichers that call external APIs (WHOIS, DNS) can hit rate limits. Flowsint does not retry or backoff automatically.
  • Timeout: Long-running enrichers (subdomain discovery) can timeout. The default timeout is not documented.
  • Invalid data: If an enricher produces malformed entities, the graph store rejects them but does not log the validation error in a structured way.

You can add custom enrichers by writing Python plugins. The plugin API is minimal: you implement a function that takes an entity and returns a list of new entities. Flowsint handles the rest.

Code: Enricher Plugin Interface

Here’s a simplified example of a custom enricher plugin:

from flowsint.enrichers import Enricher
from flowsint.entities import Domain, IP

class CustomDNSEnricher(Enricher):
    input_type = Domain
    output_types = [IP]
    
    def enrich(self, entity: Domain) -> list[IP]:
        # Call your DNS resolver
        ips = resolve_domain(entity.value)
        
        # Return new IP entities
        return [IP(value=ip) for ip in ips]
    
    def metadata(self):
        return {
            "name": "Custom DNS Resolver",
            "description": "Resolves domains using a custom DNS server",
            "timeout": 30,
            "retry": False
        }

The orchestrator reads input_type and output_types to build the dependency graph. When a Domain entity is added, the orchestrator queues this enricher automatically.

Privacy and Data Residency

Flowsint’s localhost-only deployment means all data stays on your machine. There are no external API calls unless an enricher explicitly makes them. The WHOIS and DNS enrichers call public APIs, but you can disable them or replace them with self-hosted alternatives.

Data retention:

  • Graph state persists in Docker volumes
  • Snapshots are never pruned automatically
  • You must manually delete investigations to free disk space

This model works well for sensitive investigations where data residency is a legal or ethical requirement. It breaks down if you need to share findings with a team or archive investigations to cold storage.

Technical Verdict

Use Flowsint when:

  • You need a self-hosted OSINT tool with full data control
  • Your investigations involve domain reconnaissance and IP mapping
  • You want a visual graph interface for exploring entity relationships
  • You are comfortable managing Docker deployments and writing Python plugins

Avoid Flowsint when:

  • You need multi-user collaboration or remote access
  • Your investigations require real-time enrichment with strict latency SLAs
  • You need built-in rate limiting, retries, or distributed tracing
  • You want a fully autonomous agent that decides which enrichers to run

Flowsint is a solid foundation for building custom OSINT workflows. The enricher plugin system is simple and the graph state management is transparent. The localhost-only deployment is a feature, not a bug, if privacy is your top concern. The lack of observability tooling and execution controls means you will need to add your own monitoring if you run large investigations.