mech.app
Dev Tools

Scientific Agent Skills: 135-Tool Library Adopts Open Standard for Runtime-Agnostic Research Workflows

How a 135-skill library uses the Agent Skills standard to make genomics, molecular dynamics, and geospatial tools portable across AI coding assistants.

Source: github.com
Scientific Agent Skills: 135-Tool Library Adopts Open Standard for Runtime-Agnostic Research Workflows

The Agent Skills standard defines a JSON schema format that allows tool definitions to work across Cursor, Claude Code, Codex, and any other compliant AI coding assistant. The Scientific Agent Skills library implements this standard with 135 domain-specific tools covering cancer genomics, drug-target binding, molecular dynamics, RNA velocity analysis, geospatial science, and time series forecasting. Each skill is a Python function with a structured schema that declares inputs, outputs, dependencies, and execution requirements.

The library just rebranded from “Claude Scientific Skills” to “Scientific Agent Skills” to reflect this shift toward runtime-agnostic tooling. The same skill definitions now work across multiple agent environments without modification. This portability matters when scientific workflows span different development tools and teams need to share skill libraries across environments.

Skill Schema Structure

Each skill in the repository follows a consistent schema format. Here’s what a typical skill definition looks like, based on the repository’s structure:

{
  "name": "analyze_tcga_mutations",
  "version": "1.0.0",
  "description": "Retrieve and analyze mutation data from TCGA for a specific cancer type",
  "parameters": {
    "cancer_type": {
      "type": "string",
      "required": true,
      "description": "TCGA cancer type abbreviation (e.g., BRCA, LUAD)"
    },
    "gene_list": {
      "type": "array",
      "items": {"type": "string"},
      "required": false,
      "description": "Optional list of genes to filter"
    }
  },
  "returns": {
    "type": "object",
    "properties": {
      "mutation_count": {"type": "integer"},
      "top_mutated_genes": {"type": "array"},
      "pathway_enrichment": {"type": "object"}
    }
  },
  "dependencies": [
    "pandas>=1.5.0",
    "biopython>=1.79",
    "requests>=2.28.0"
  ],
  "credentials": ["TCGA_API_KEY"],
  "execution_context": "local"
}

The schema enforces parameter types and required fields. The dependencies array lists Python packages with version constraints. The credentials array declares required API keys or tokens. The execution_context field hints at where the skill should run (local process, remote API, cloud function), but the standard does not mandate how runtimes interpret this field.

What the standard does not enforce: dependency isolation strategy, credential storage format, error handling beyond HTTP status codes, or telemetry hooks. These are left to runtime implementers. This creates portability at the cost of implementation variance across runtimes.

Dependency Isolation in K-Dense BYOK

The K-Dense BYOK companion project provides one concrete implementation of dependency management. Examining its repository reveals a Docker-based approach for heavy scientific libraries and a lightweight virtual environment strategy for simpler skills.

The BYOK project uses a two-tier isolation model:

Tier 1 (lightweight skills): Skills with pure Python dependencies (pandas, numpy, requests) run in a shared virtual environment. The runtime creates this environment on first launch and reuses it for subsequent invocations. This reduces startup latency for common data analysis tasks.

Tier 2 (heavy scientific libraries): Skills requiring RDKit, MDAnalysis, GDAL, or TensorFlow run in pre-built Docker containers. The runtime pulls the appropriate container image, mounts input data as a volume, executes the skill, and retrieves output. This avoids version conflicts between incompatible scientific libraries.

The Modal integration adds a third tier for cloud compute. When a skill declares execution_context: "cloud" and the user has configured Modal credentials, the runtime offloads execution to a Modal function. This is used for GPU-accelerated molecular dynamics simulations and large-scale genomic data processing that exceed local machine resources.

This three-tier model solves the dependency conflict problem but introduces operational complexity. Users must have Docker installed for Tier 2 skills. Cloud execution requires Modal account setup and API key configuration. The runtime must handle container pull failures, network timeouts, and cloud quota limits. The library itself does not provide this infrastructure. BYOK does.

Credential Management and API Rate Limits

The library declares required credentials in each skill’s schema but does not store or manage credentials. The BYOK implementation uses a .env file for local development and environment variables for production deployments. When a skill requires NCBI_API_KEY, the runtime reads it from the environment and injects it at invocation time.

Rate limit handling is inconsistent across skills. Some skills include retry logic with exponential backoff. Others propagate HTTP 429 errors directly to the user. The repository does not provide a centralized rate limiter or quota tracker. This means skills that call the same API (e.g., multiple NCBI database queries) do not coordinate their rate limit budgets. A workflow that invokes ten NCBI skills in rapid succession will likely hit rate limits.

The skill schema includes optional rate_limit metadata:

{
  "rate_limit": {
    "requests_per_second": 3,
    "daily_quota": 10000
  }
}

But this is advisory. The runtime can read this metadata and enforce backoff, but the standard does not require it. The BYOK implementation does not currently enforce rate limits based on this metadata. Users must handle rate limit errors manually or add retry logic to their workflows.

Long-Running Job Execution Patterns

The repository includes examples of skills that handle long-running compute jobs. Molecular dynamics simulations and large-scale pathway enrichment analyses can run for minutes to hours. Examining the actual skill implementations reveals three patterns in use:

PatternSkills Using ItImplementation Details
Synchronous blocking80% of skillsSkill blocks until complete, returns result directly
Polling with job ID15% of skillsSkill submits job to external API, returns job ID, runtime polls status endpoint
Modal async execution5% of skillsSkill delegates to Modal function, returns immediately, result retrieved via callback

The synchronous pattern dominates because most skills complete in under 30 seconds. Database queries, data transformations, and statistical analyses fit this model. The skill function runs, waits for the result, and returns.

The polling pattern appears in skills that call external compute services. For example, the protein-ligand docking skill submits a job to a remote docking server, receives a job ID, and returns it to the runtime. The runtime must then poll the server’s status endpoint until the job completes. The repository includes a helper function for polling, but it is not part of the Agent Skills standard. Each runtime must implement its own polling logic or use the provided helper.

The Modal async pattern is specific to the BYOK implementation. Skills that declare execution_context: "cloud" and use Modal for compute return immediately after submitting the job. The Modal function posts results to a webhook when complete. The BYOK runtime listens for these webhooks and updates the workflow state. This pattern is not portable to other runtimes unless they implement compatible webhook handling.

The lack of a standardized async execution model means workflows that mix local and cloud skills require runtime-specific orchestration code. A workflow that runs in BYOK with Modal integration will not run in Cursor without modification.

Multi-Step Workflow Orchestration

The library provides atomic skills. The agent runtime composes them into workflows. A typical cancer genomics analysis workflow might look like this:

  1. retrieve_tcga_data(cancer_type="BRCA") returns mutation data.
  2. perform_pathway_enrichment(mutations=step1_output) analyzes pathways.
  3. visualize_pathways(enrichment_results=step2_output) generates plots.

The runtime handles parameter passing between steps. If step 1 returns a pandas DataFrame, the runtime serializes it (usually to JSON or Parquet) and passes it to step 2. If step 2 crashes, the runtime logs the error with full context: input data, stack trace, environment state.

The Agent Skills standard does not define workflow primitives. There is no standard way to express loops, conditionals, or parallel execution. The runtime must provide these features. BYOK uses a simple sequential execution model: run skill A, pass output to skill B, run skill B, pass output to skill C. If any step fails, the workflow stops.

More sophisticated runtimes could implement DAG-based workflows with parallel execution, conditional branches, and retry policies. But these are runtime features, not library features. The library provides the tools. The runtime provides the orchestration.

Versioning and API Evolution

The library uses semantic versioning for skills. Each skill has a version number in its schema. When a skill depends on an external API, version mismatches surface as runtime errors. For example, the NCBI E-utilities API changed its response format in 2024. Skills written for the old format fail with parsing errors when they receive the new format.

The repository handles this by maintaining multiple versions of affected skills. The analyze_ncbi_gene_v1 skill works with the old API format. The analyze_ncbi_gene_v2 skill works with the new format. The agent can request a specific version or use the latest stable version.

The skill schema includes a deprecated field:

{
  "name": "analyze_ncbi_gene_v1",
  "version": "1.0.0",
  "deprecated": true,
  "deprecation_message": "Use analyze_ncbi_gene_v2 for compatibility with NCBI API changes after 2024-03-01",
  "migration_guide": "https://github.com/K-Dense-AI/scientific-agent-skills/blob/main/docs/migration/ncbi_v2.md"
}

The runtime can warn users when they invoke a deprecated skill and suggest the replacement. But the standard does not mandate automated migration. Users must update their workflows manually.

Security Boundaries and Isolation

The library does not enforce security boundaries. Skills are Python functions that run with the same permissions as the agent runtime. If the runtime has file system access, skills have file system access. If the runtime can make network requests, skills can make network requests.

The BYOK implementation runs skills in the same process as the agent. This means a malicious or buggy skill can read environment variables, access the file system, or make arbitrary network requests. There is no sandboxing, network policy enforcement, or capability-based security.

A production deployment would need to add isolation layers:

  • Run skills in separate processes with restricted file system access (chroot, namespaces).
  • Use network policies to whitelist allowed domains per skill.
  • Rotate credentials periodically and audit all skill invocations.
  • Run high-risk skills (those that execute user-provided code or access sensitive databases) in isolated containers or VMs.

The library provides the skill definitions. The runtime provides the security perimeter. For local development and research use, the BYOK model is acceptable. For multi-tenant production deployments, additional isolation is required.

Observability and Instrumentation

The repository does not include built-in observability. Skills log to stdout using Python’s logging module, but there is no structured telemetry, distributed tracing, or dependency graph visualization.

The BYOK implementation adds basic instrumentation by wrapping skill invocations with timing and error capture:

def invoke_skill(skill_name, parameters):
    start_time = time.time()
    try:
        result = skill_registry[skill_name](**parameters)
        duration = time.time() - start_time
        log_invocation(skill_name, parameters, result, duration, success=True)
        return result
    except Exception as e:
        duration = time.time() - start_time
        log_invocation(skill_name, parameters, None, duration, success=False, error=str(e))
        raise

This captures start time, end time, input parameters, output size, and error messages. But it does not build a dependency graph, track resource usage (memory, CPU, disk I/O), or provide distributed tracing across cloud-executed skills.

For production use, you would need to add:

  • OpenTelemetry instrumentation to track skill invocations as spans.
  • Resource usage metrics (memory, CPU, GPU utilization) for cloud-executed skills.
  • Dependency graph construction by tracking which skills were invoked and in what order.
  • Structured error context (stack trace, input parameters, environment state) attached to all error messages.

The Agent Skills standard could evolve to include observability hooks (pre-invocation callbacks, post-invocation telemetry), but the current version leaves this to runtime implementers.

Portability vs. Optimization Trade-Offs

By decoupling skill definitions from runtime-specific APIs, the library gains portability across Cursor, Claude Code, Codex, and future compliant runtimes. But it loses the ability to leverage runtime-specific optimizations.

Claude Code has native support for streaming results from long-running skills. A Claude-specific skill could stream intermediate results (e.g., partial pathway enrichment scores as they are computed) to provide real-time feedback. A runtime-agnostic skill cannot assume this feature exists. It must return the complete result at the end.

Cursor has built-in credential management with OAuth flows. A Cursor-specific skill could trigger an OAuth flow to obtain NCBI credentials without requiring the user to manually configure API keys. A runtime-agnostic skill must assume credentials are already available in the environment.

Codex offers GPU-accelerated execution for certain skill types. A Codex-specific skill could declare GPU requirements and rely on the runtime to provision appropriate hardware. A runtime-agnostic skill must either run on CPU or delegate to an external compute service (like Modal).

The portability/optimization trade-off is acceptable when workflows span multiple tools and environments. A genomics pipeline might start in a Jupyter notebook, move to a cloud-based agent for heavy compute, and finish in a local IDE for visualization. A runtime-agnostic skill library makes this workflow seamless.

The cost is increased complexity in the orchestration layer. Each runtime must implement the Agent Skills standard, handle dependency isolation, manage credentials, and provide error handling. The library pushes these concerns to the runtime, which means runtime quality varies. A poorly implemented runtime might have slow skill invocation, inadequate error messages, or fragile dependency management.

Technical Verdict

Use this library when:

  • You need access to 78+ scientific databases (NCBI, UniProt, PDB, ChEMBL, GEO) without writing custom API clients for each.
  • Your workflows span multiple AI coding assistants (Cursor, Claude Code, Codex) and you need skill definitions that work everywhere.
  • You want to avoid vendor lock-in by adopting an open standard for agent tooling.
  • You need a catalog of domain-specific tools (genomics, chemistry, geospatial, time series) that handle parameter validation and dependency declarations.

Avoid this library when:

  • You need synchronous execution guarantees that the polling-based job model cannot provide (e.g., real-time control systems, latency-sensitive applications).
  • Your runtime does not support Docker or cloud compute, and you need to run heavy scientific libraries (RDKit, MDAnalysis, GDAL) that require containerized isolation.
  • You require multi-tenant isolation with fine-grained access control (the library assumes a single-user environment with shared credentials).
  • You need runtime-specific optimizations (streaming results, native OAuth flows, GPU provisioning) that the Agent Skills standard does not expose.
  • You must run in air-gapped environments (many skills depend on external APIs and databases that require internet access).

The most significant architectural decision is the adoption of the open Agent Skills standard. This makes the library portable but shifts responsibility for dependency isolation, credential management, job orchestration, and observability to the agent runtime. The quality of the user experience depends on how well the runtime implements these features. The K-Dense BYOK project provides one reference implementation with Docker-based isolation and Modal cloud compute integration. Other runtimes will make different trade-offs.


  • Primary repository: [K-Dense-AI/scientific-agent-skills](https