GenoBank.io — Technical Whitepaper — Version 1.0 — March 2026

GenoClaw:
Patient-Owned AI Health Agents
on Decentralized Virtual
Bioinformatic Machines

A technical architecture for sovereign, autonomous AI health agents where every patient's genomic and clinical data remains under their exclusive cryptographic control, with transparent attribution, revocable consent, and direct economic participation.

Author Daniel Uribe
Organization GenoBank.io
Version 1.0 (Genesis)
Date March 2026
Status Production Deployed
License CC BY 4.0

Abstract

We present GenoClaw, an autonomous AI health agent platform in which each agent instance is cryptographically bound to a single patient's BioWallet and executes within a kernel-isolated NVIDIA NemoClaw/OpenShell sandbox. GenoClaw integrates a complete, validated clinical-genomic data pipeline: whole-genome sequencing via NVIDIA Clara Parabricks 4.6.0-1 on A100 GPU hardware (producing 2.87 million variant records in under 100 seconds); Epic FHIR R4 Patient Access API ingestion yielding 1,463 structured clinical records; and OpenCRAVAT 2.13.0 multi-annotator analysis identifying 351 clinically relevant variants across 12 oncology knowledge bases. The agent inference layer uses GPT-OSS-120B via Cloudflare Workers AI, with NVIDIA Nemotron 120B and Meta Llama 3.3 70B as fallback models, preceded by a HIPAA Safe Harbor processor that strips all 18 PHI identifiers before any data reaches the language model. Consent is managed through a dual-license architecture: Story Protocol Programmable IP Licenses for permanent attribution and Sequentias BioPIL revocable licenses for GDPR-compliant consent. Agent-to-agent data commerce is enabled through the x402 BioRouter protocol, which facilitates HTTP 402 micropayment-gated access between autonomous agents. We argue that this architecture instantiates a new paradigm — the Decentralized Virtual Bioinformatic Machine — in which patient ownership, authentic data quality, and transparent economic attribution are architectural invariants rather than policy afterthoughts. The 23andMe bankruptcy of March 2025 is examined as the definitive empirical case for why custodial genomic data models are structurally incompatible with patient interests.

Executive Summary (Non-Technical)

Today, your health data lives on someone else's server. When that company goes bankrupt — as 23andMe did in March 2025, exposing 15 million customers' genomic data to involuntary sale — you have no legal recourse, no compensation, and no way to revoke access.

GenoClaw is the answer. It is a personal AI health assistant that runs inside your wallet, not inside a company's server. Your genomic data, your clinical records, your AI-generated health insights — all of these are owned by you as cryptographically secured digital assets (BioNFTs). When a researcher wants access to your data, they must pay you directly. You can revoke that access at any time. Every use of your data is transparently recorded on a blockchain.

This document describes how GenoClaw works technically, presents empirical results from our production testing, and explains why this architecture represents a fundamental paradigm shift in the relationship between patients and health data.

1. Introduction

1.1 The Structural Failure of Custodial Genomic Data

On March 23, 2025, 23andMe Inc. filed for Chapter 11 bankruptcy protection in the Eastern District of Missouri, with genomic profiles of approximately 15 million customers listed as the company's primary asset.[1] The filing exposed a structural contradiction that had existed since the company's founding: customers had paid to generate their genomic data, yet that data was legally owned by a corporation whose interests were not aligned with theirs. In July 2025, TTAM Research Institute — a nonprofit controlled by Anne Wojcicki — acquired the company for $305 million. The 15 million customers received no compensation, no consultation, and no right of refusal. Many did not know the sale had occurred.

This outcome was not a failure of regulation. The Health Insurance Portability and Accountability Act of 1996 (HIPAA), the General Data Protection Regulation (GDPR), and the California Consumer Privacy Act (CCPA) all contain provisions that, in theory, protect health data subjects.[2,3,4] The failure was architectural: these regulations assume a two-party relationship in which a company holds data and a patient is protected from the company's misuse of it. They do not and cannot protect patients when the data is legally the company's property to begin with.

Core Insight

Privacy laws were designed to protect people from companies that control their data. Those laws become structurally unnecessary when patients own and control their data directly. The goal is not better regulation of custodial models — it is the elimination of the custodial model entirely.

The Electronic Health Record (EHR) ecosystem compounds this problem. Despite the 21st Century Cures Act Final Rule mandating open APIs for patient data access as of April 2021, most clinical records remain effectively inaccessible to patients in machine-readable form.[5] Patients cannot export, analyze, or share their own clinical histories without institutional intermediaries who introduce friction, cost, and surveillance.

Artificial intelligence exacerbates the asymmetry. Large language models trained on patient data generate economic value — in the form of diagnostic capabilities, drug discovery insights, and research publications — without any economic return to the patients whose data enabled those capabilities. This is not an oversight. It is the intended design of current AI health platforms, which depend on acquiring data from patients without meaningful compensation and monetizing it through products sold to third parties.

1.2 Prior Art and Its Limitations

Several approaches have attempted to address the genomic data ownership problem. We examine the most prominent and explain why each fails to provide a complete solution.

Federated Learning

Federated learning — proposed by McMahan et al. at Google in 2017 as a privacy-preserving technique for training models on distributed data — has been widely adopted by health AI companies as a purported solution to patient privacy concerns.[6] In the federated model, patient data never leaves the institution's server; instead, model gradients are shared and aggregated centrally.

Position: GenoClaw Rejects Federated Learning

Federated learning is biodata laundering. It degrades data quality through gradient approximation, erases patient attribution entirely, avoids revenue sharing with data owners, and was invented by data aggregators seeking to profit from patient data without the legal exposure of explicit ownership. It treats patient data as a resource to be mined, not as an asset to be owned. GenoBank.io will not implement federated learning in any form.

The specific deficiencies of federated learning for genomic data are as follows: (1) gradient approximation introduces noise that is clinically unacceptable for diagnostic applications; (2) the federated aggregator — typically a commercial entity — retains the economic value of the trained model without compensating data contributors; (3) patients cannot revoke their contribution after gradients have been incorporated into a global model; and (4) the model cannot provide attribution at the individual patient level, which violates the principle of data dignity.[7]

Zero-Knowledge Genomics

Zero-knowledge proofs (ZKPs) — cryptographic protocols that allow a prover to demonstrate knowledge of a value without revealing the value itself — have been proposed as a privacy-preserving substrate for genomic queries.[8] This approach is technically incorrect for genomic data and should not be deployed in clinical contexts.

ZKPs require deterministic computation: for a given input, the output must be identical across all executions of the circuit. Genomic data is probabilistic and non-deterministic at every stage of the pipeline: base calling from raw signal involves probabilistic quality scores; variant calling involves Bayesian posterior probabilities over haplotype configurations; and annotation databases are continuously updated, changing the clinical significance of the same variant over time.[9] There is no deterministic circuit that can represent the complete semantics of a VCF file, and therefore no valid ZKP construction for genomic queries. GenoBank.io uses privacy-preserving Bloom filters instead — a technically correct and computationally efficient approach to private variant membership testing.

Traditional EHR Portals

Patient portals (e.g., MyChart, FollowMyHealth) provide read-only views of clinical data through authenticated web interfaces. They do not provide programmatic export, do not integrate with genomic data, do not support AI-driven analysis, and do not support any form of patient-controlled data sharing or compensation. They are display interfaces, not ownership instruments.

2.1 Blockchain-Based Genomic Ownership

GenoBank.io introduced the BioNFT concept in 2020 as a mechanism for representing biosample ownership and consent on a public blockchain.[10] A BioNFT is an ERC-721 non-fungible token minted on Avalanche C-Chain that encodes: (1) the cryptographic identifier of a biosample or genomic file; (2) the wallet address of the patient-owner; (3) the laboratory that performed the analysis; and (4) a URI pointing to consent terms encoded in a Programmable IP License (PIL).

2.1.1 BioNFT as a Legal Instrument

A BioNFT is not merely a digital collectible — it is a programmable legal instrument that encodes property rights, consent terms, and access control into a single cryptographic token. Each BioNFT establishes four critical properties that no traditional consent form can provide:

2.1.2 The Dual-Chain Architecture

BioNFTs operate across two complementary blockchain networks, each serving a distinct regulatory purpose:

This dual-chain design resolves a fundamental tension in genomic data governance: the need for permanent attribution (so patients always receive credit) coexists with the need for revocable consent (so patients can withdraw access). By separating these concerns onto different chains with different immutability guarantees, GenoBank.io achieves both simultaneously — unlike single-chain approaches that must sacrifice either attribution permanence or consent revocability.

This approach builds on the broader NFT infrastructure established by EIP-721 (Entriken et al., 2018) and the IP Asset framework introduced by Story Protocol (2024).[11,12] Story Protocol provides a programmable licensing layer on top of standard NFT infrastructure, enabling on-chain license terms that specify permitted uses, revenue share percentages, and attribution requirements. GenoBank.io extends this with BioPIL — Bioinformatic Programmable IP Licenses — which add genomic-specific terms including GDPR-compliant revocability on the Sequentias Network (chain ID 15132025).

2.2 The Decentralized Virtual Bioinformatic Machine

The theoretical foundation of GenoClaw is the Decentralized Virtual Bioinformatic Machine (DVBM). A DVBM is a patient-controlled compute environment in which:

This architecture inverts the traditional model. In the traditional model, a company owns a centralized compute environment and patients contribute data as raw inputs. In the DVBM model, the patient owns the compute environment and third parties (researchers, clinicians, AI systems) request access as credentialed guests.

2.3 Privacy-Preserving Bloom Filters for Genomic Queries

A Bloom filter is a space-efficient probabilistic data structure that answers membership queries with a controllable false-positive rate and zero false-negative rate.[13] For genomic applications, a patient's variant set is encoded as a Bloom filter: each variant (chromosome, position, reference allele, alternate allele) is hashed through multiple hash functions and the corresponding bits are set in the filter array.

A query for a specific variant can be answered by checking whether all bits corresponding to that variant's hashes are set — without requiring access to the underlying VCF data. This enables a researcher to determine, with high probability, whether a patient carries a specific variant (e.g., BRCA1 c.5266dupC) without the patient ever sharing their raw genomic data with the researcher. The Bloom filter can be shared publicly without exposing the complete variant set, because the filter cannot be efficiently inverted to reconstruct the underlying data.[14]

This approach is technically correct for genomic data (unlike ZKPs), computationally efficient (sub-millisecond query latency), and preserves data dignity because the patient retains the authentic, complete dataset.

3.1 Layer Overview

GenoClaw is organized as a five-layer stack. Each layer has a clearly defined responsibility boundary, and communication between layers is authenticated via Web3 cryptographic signatures.

Figure 1 — GenoClaw Five-Layer Architecture
┌──────────────────────────────────────────────────────────────────────┐
│  LAYER 5: APPLICATION LAYER                                          │
│  GenoClaw Agent (9 Bioinformatics Skills)                            │
│  ┌──────────────┐ ┌───────────┐ ┌──────────────┐ ┌──────────────┐  │
│  │ cancer-risk  │ │ pharmgx   │ │variant-annot │ │rare-disease  │  │
│  │              │ │           │ │              │ │    -dx       │  │
│  └──────────────┘ └───────────┘ └──────────────┘ └──────────────┘  │
│  ┌──────────────┐ ┌───────────┐ ┌──────────────┐ ┌──────────────┐  │
│  │ancestry-pca  │ │consent-   │ │variant-call  │ │alphagenome-  │  │
│  │              │ │manager    │ │              │ │  interpret   │  │
│  └──────────────┘ └───────────┘ └──────────────┘ └──────────────┘  │
│  ┌──────────────┐                                                    │
│  │bio-orchestra-│  LLM: GPT-OSS-120B / Nemotron 120B / Llama 3.3 70B│
│  │    tor       │  via Cloudflare Workers AI ($0.011 / 1K tokens)   │
│  └──────────────┘                                                    │
├──────────────────────────────────────────────────────────────────────┤
│  LAYER 4: SANDBOX LAYER                                              │
│  NVIDIA NemoClaw / OpenShell                                         │
│  K3s cluster inside Docker (port 9095)                               │
│  Landlock LSM + seccomp BPF (kernel-level process isolation)         │
├──────────────────────────────────────────────────────────────────────┤
│  LAYER 3: PRIVACY LAYER                                              │
│  HIPAA Safe Harbor Processor (strips 18 PHI identifiers)             │
│  OpenShell Privacy Router (audits all outbound LLM requests)         │
│  Bloom Filter Engine (private variant membership queries)            │
├──────────────────────────────────────────────────────────────────────┤
│  LAYER 2: CONSENT LAYER                                              │
│  Story Protocol PIL (permanent: licenses 1-4)                        │
│  Sequentias BioPIL (revocable GDPR-compliant: licenses 5-9)          │
│  BioDataRouter.sol (on-chain ownership registry)                     │
├──────────────────────────────────────────────────────────────────────┤
│  LAYER 1: PAYMENT & DISCOVERY LAYER                                  │
│  x402 BioRouter Protocol (HTTP 402 micropayments)                    │
│  Sequentias Network (chain ID 15132025, BioCID addressing)           │
│  Agent-to-Agent Commerce (researcher AI ↔ patient AI negotiation)   │
└──────────────────────────────────────────────────────────────────────┘

       ┌──────────────────────────────────────────────────────┐
       │  DATA STORAGE (cross-cutting)                        │
       │  Google Cloud Storage (GCS) — genomic files          │
       │  MongoDB — FHIR cache + job tracking                 │
       │  Avalanche C-Chain — BioNFT ownership                │
       │  Sequentias Network — consent records                │
       └──────────────────────────────────────────────────────┘

3.2 Application Layer: GenoClaw Agent Skills

The GenoClaw agent exposes nine discrete bioinformatics skills, each corresponding to a well-defined clinical or analytical function. Skills are invoked by the agent's bio-orchestrator based on patient query intent, and each skill invocation is logged to the Sequentias consent ledger.

Skill Function Primary Data Source Output
cancer-risk Polygenic risk score + pathogenic variant identification VCF + ClinVar + COSMIC + OncoKB Risk tier + actionable variants
pharmgx Pharmacogenomics — drug-gene interaction assessment VCF + PharmGKB + CPIC guidelines Drug sensitivity / contraindication list
variant-annotate Functional annotation of variant set VCF + OpenCRAVAT 12-annotator panel Annotated variant table
rare-disease-dx Rare disease differential diagnosis VCF + OMIM + HPO + ClinVar Candidate gene + disorder list
ancestry-pca Ancestry inference via principal component analysis SNP array or WGS VCF Population assignment + PCA plot
consent-manager View, grant, and revoke data access permissions Sequentias BioPIL ledger Permission state + audit trail
variant-call Trigger GPU-accelerated variant calling pipeline BAM/BAI on GCS VCF (Clara Parabricks DeepVariant)
alphagenome-interpret Regulatory variant interpretation via AlphaGenome VCF + regulatory element databases Predicted regulatory impact scores
bio-orchestrator Intent classification + multi-skill workflow coordination Patient query (natural language) Skill invocation plan

3.3 Sandbox Layer: NVIDIA NemoClaw / OpenShell

Each GenoClaw instance executes within an NVIDIA NemoClaw/OpenShell sandbox — a K3s (lightweight Kubernetes) cluster running inside Docker on the GenoBank.io production infrastructure (port 9095). The sandbox provides kernel-level process isolation via two Linux Security Module mechanisms:

This dual-layer isolation means that even a fully compromised GenoClaw agent process cannot access another patient's data, cannot create new network connections outside the pre-approved outbound channels, and cannot persist state beyond its designated vault directory.

3.4 Privacy Layer: HIPAA Safe Harbor and OpenShell Privacy Router

The Privacy Layer intercepts all data before it reaches any language model and applies two sequential transformations:

HIPAA Safe Harbor Processor. The Safe Harbor method defined in 45 CFR § 164.514(b) requires the removal of 18 specified categories of Protected Health Information (PHI) before data can be considered de-identified.[17] The GenoClaw HIPAA processor applies regular-expression and named-entity recognition rules to strip: names, geographic identifiers below state level, dates (except year), ages above 89, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers (fingerprints, voice prints), full-face photographs, and any other unique identifying number or code. This processing occurs in-process, within the NemoClaw sandbox, before any data is transmitted to Cloudflare Workers AI endpoints.

OpenShell Privacy Router. All outbound LLM requests are additionally audited by the OpenShell Privacy Router, which: (1) logs the request metadata (timestamp, patient wallet hash, skill invoked, token count) to the Sequentias consent ledger; (2) verifies that the patient has an active consent record for LLM processing of their data; and (3) enforces rate limits and spending caps defined in the patient's BioPIL terms.

GenoClaw implements a dual-license architecture to satisfy both permanent IP attribution and revocable GDPR compliance simultaneously.

Figure 2 — Dual-License Consent Architecture
  Patient Data Asset
         │
         ├─── Story Protocol PIL (Permanent Attribution)
         │    ├── PIL #1: Non-Commercial Research
         │    ├── PIL #2: Commercial Research
         │    ├── PIL #3: Exclusive License
         │    └── PIL #4: Public Good
         │         └── [Immutable — on Story Protocol]
         │
         └─── Sequentias BioPIL (Revocable GDPR-Compliant)
              ├── BioPIL #5: GDPR Research
              ├── BioPIL #6: AI Training
              ├── BioPIL #7: Clinical Use
              ├── BioPIL #8: Pharma Research
              └── BioPIL #9: Family Inheritance
                    └── [Revocable — on Sequentias Network, chain 15132025]

  Access Flow:
  ┌─────────────────────────────────────────────────────────────┐
  │  Researcher Agent requests patient data                     │
  │         │                                                   │
  │         ▼                                                   │
  │  BioDataRouter.sol checks ownership registry               │
  │         │                                                   │
  │         ▼                                                   │
  │  Sequentias BioPIL: is consent active? ──[No]──► Blocked   │
  │         │ [Yes]                                             │
  │         ▼                                                   │
  │  x402 BioRouter: micropayment collected                     │
  │         │                                                   │
  │         ▼                                                   │
  │  GCS pre-signed URL issued (time-limited)                   │
  │         │                                                   │
  │         ▼                                                   │
  │  Attribution recorded on Story Protocol                     │
  │         │                                                   │
  │         ▼                                                   │
  │  Revenue share distributed to patient wallet               │
  └─────────────────────────────────────────────────────────────┘

The critical design principle is that GDPR Article 17 (the right to erasure) and Article 7(3) (the right to withdraw consent) are implemented not as legal policies but as cryptographic invariants. A patient who revokes consent on the Sequentias Network will have their BioPIL token burned, immediately rendering all GCS pre-signed URL generation impossible. The data itself remains in GCS (owned by the patient) and can be re-consented or permanently deleted by the patient at any time.

3.6 Payment Layer: x402 BioRouter Protocol

The x402 BioRouter protocol implements agent-to-agent data commerce using HTTP 402 ("Payment Required") as the signaling mechanism for paywall-gated genomic data access. This extends the original x402 micropayment protocol proposed for general HTTP resources to the specific semantics of genomic data licensing.[18]

Figure 3 — x402 BioRouter Agent Commerce Flow
  Researcher's AI Agent                          Patient's GenoClaw Agent
         │                                                  │
         │  GET /biorouter/{BioCID}                         │
         │ ─────────────────────────────────────────────►  │
         │                                                  │
         │  402 Payment Required                            │
         │  x402-accept: EVM/15132025                       │
         │  x402-price: 0.0042 SEQ                          │
         │  x402-license: BioPIL#6 (AI Training)            │
         │ ◄─────────────────────────────────────────────  │
         │                                                  │
         │  [Researcher agent evaluates price + terms]      │
         │                                                  │
         │  POST /biorouter/{BioCID}/pay                    │
         │  Authorization: EIP-712 signed payment intent    │
         │ ─────────────────────────────────────────────►  │
         │                                                  │
         │  [BioDataRouter.sol verifies payment on-chain]  │
         │                                                  │
         │  200 OK                                          │
         │  x402-access-token: {time-limited JWT}           │
         │  x402-attribution: Story Protocol IP Asset ID   │
         │ ◄─────────────────────────────────────────────  │
         │                                                  │
         │  [Researcher accesses GCS stream, time-limited] │
         │                                                  │
         │  Revenue share automatically distributed         │
         │  to patient wallet via Sequentias Network        │

The BioCID (Bioinformatic Content Identifier) addressing scheme used by Sequentias Network provides content-addressed identifiers for genomic data objects that are independent of storage location. A BioCID encodes: data type (VCF, BAM, FASTQ, FHIR bundle), content hash (SHA-256 of file), and the Sequentias chain ID for consent resolution. This enables a researcher's agent to discover and request a specific genomic dataset without knowing which GCS bucket it resides in — the BioDataRouter resolves the physical storage location after verifying consent and collecting payment.

The GenoClaw data pipeline integrates three independently sourced data streams — genomic sequencing, clinical records, and computational annotations — into a unified patient knowledge graph that the agent can reason over.

Figure 4 — GenoClaw Data Ingestion Pipeline
  ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
  │  GENOMIC DATA    │   │  CLINICAL DATA   │   │  ANNOTATION      │
  │                  │   │                  │   │                  │
  │  Source: Invitae │   │  Source: Epic    │   │  Source:         │
  │  Format: BAM/BAI │   │  MyChart FHIR R4 │   │  OpenCRAVAT      │
  │                  │   │                  │   │  2.13.0          │
  │  Upload to GCS   │   │  Patient Access  │   │                  │
  │  (gcsfuse mount) │   │  API OAuth 2.0   │   │  12 Oncology     │
  │        │         │   │        │         │   │  Annotators      │
  │        ▼         │   │        ▼         │   │        │         │
  │  Clara Parabricks│   │  FHIR R4 Bundle  │   │        ▼         │
  │  DeepVariant     │   │  → MongoDB cache │   │  351 Annotated   │
  │  A100 GPU        │   │  1,463 records   │   │  Variants        │
  │        │         │   │        │         │   │        │         │
  │        ▼         │   │        └─────────┘   │        │         │
  │  VCF Output      │             │             │        │         │
  │  2.87M variants  │             │             │        │         │
  │  in 99 seconds   │             │             │        │         │
  └───────┬──────────┘             │             └────────┤         │
          │                        │                      │         │
          └────────────────────────┴──────────────────────┘         │
                                   │                                 │
                                   ▼                                 │
                    ┌──────────────────────────┐                     │
                    │   Patient Knowledge Graph │                     │
                    │   (MongoDB + GCS)        │                     │
                    │                          │                     │
                    │   • Genomic variants     │                     │
                    │   • Clinical conditions  │                     │
                    │   • Medications          │                     │
                    │   • Lab results          │                     │
                    │   • Vital signs          │                     │
                    │   • Annotations          │                     │
                    └──────────────┬───────────┘                     │
                                   │                                 │
                                   ▼                                 │
                    ┌──────────────────────────┐                     │
                    │   GenoClaw LLM Inference │                     │
                    │   (HIPAA-stripped input) │                     │
                    └──────────────────────────┘                     │

4.1 Genomic Data: Clara Parabricks DeepVariant

Whole-genome sequencing data is ingested from patient-provided BAM/BAI files. In the validated production pipeline, source data originates from Invitae Corporation's clinical sequencing service. Files are uploaded to a patient-specific GCS bucket (genobank-backups-gcp) using gcsfuse direct mounting to avoid unnecessary data transfer (gcsfuse --implicit-dirs --type-cache-max-size-mb=32 BUCKET /mountpoint). This is architecturally critical: downloading a typical 60–120 GB BAM file to local disk before processing is not scalable for a population-scale platform.

Variant calling is performed by NVIDIA Clara Parabricks 4.6.0-1 running the deepvariant pipeline on an NVIDIA A100 GPU instance on Google Compute Engine. DeepVariant uses a convolutional neural network trained on labeled genomic training sets to classify candidate variant positions as reference, heterozygous, or homozygous alternate.[19] GPU acceleration via Parabricks reduces wall-clock time from the 24–48 hours required by CPU-based DeepVariant to under two minutes for a whole-genome sample at 30x coverage.

Output VCF files are uploaded to the genobank-parabricks-output GCS bucket and registered as IP Assets on Story Protocol via the two-step minting workflow: mint_and_register_ip() followed by attach_license_terms(), with separate NFT metadata URI and IP metadata URI to ensure correct display on the Story Protocol explorer.

4.2 Clinical Data: Epic FHIR R4 Patient Access API

Clinical records are imported via the Epic MyChart FHIR R4 Patient Access API, which implements the SMART on FHIR authorization framework.[20] The patient authenticates with MyChart credentials and grants a scoped OAuth 2.0 token that permits read-only access to their clinical records. GenoBank.io's FHIR ingestion service requests the following resource types:

Records are cached in a MongoDB collection with a schema that preserves FHIR R4 resource structure, enabling FHIR-compliant query at the API layer while providing efficient indexed access to the underlying data. The cache is invalidated and refreshed when the patient re-consents or explicitly triggers a sync.

4.3 Variant Annotation: OpenCRAVAT Multi-Annotator Panel

Raw variant calls from DeepVariant are processed through OpenCRAVAT 2.13.0, the open-source platform for genomic variant interpretation developed at the Johns Hopkins Bloomberg School of Public Health.[21] The production GenoClaw annotation pipeline uses a 12-annotator oncology panel:

Annotator Data Source Primary Annotation Class
ClinVarNCBI ClinVar (monthly build)Clinical significance classification
COSMICCatalogue of Somatic Mutations in Cancer v97Somatic mutation frequency in cancer
OncoKBMemorial Sloan Kettering OncoKBOncogenic effect + therapeutic implications
CHASMplusJohns Hopkins CHASM+ modelDriver mutation probability
CIViCClinical Interpretation of Variants in CancerClinical evidence + therapeutic biomarkers
PharmGKBPharmacogenomics KnowledgebaseDrug-gene interaction evidence
CADDCombined Annotation-Dependent Depletion v1.7Deleteriousness score (Phred-scaled)
gnomADGenome Aggregation Database v4.1Population allele frequency
REVELRare Exome Variant Ensemble LearnerMissense pathogenicity score
SpliceAIIllumina SpliceAI v1.3Splicing effect prediction
OMIMOnline Mendelian Inheritance in ManGene-phenotype associations
InterVarACMG/AMP 2015 guidelines implementationAutomated ACMG variant classification

4.4 Agent Inference: LLM Selection and HIPAA Pre-Processing

GenoClaw uses a three-tier LLM failover chain, all served via Cloudflare Workers AI:

  1. Primary: GPT-OSS-120B at $0.011 per 1,000 tokens
  2. Fallback 1: NVIDIA Nemotron 120B (open-weight)
  3. Fallback 2: Meta Llama 3.3 70B (open-weight)

All three models receive identical pre-processed input that has been de-identified by the HIPAA Safe Harbor Processor. The agent's system prompt encodes the relevant clinical context (de-identified condition list, medication list, and annotated variant summary) within the context window. The patient's natural language query is appended as the user message. The agent's response is returned to the patient through the NemoClaw sandbox; it is not stored by Cloudflare or any third-party LLM provider beyond the duration of the API call.

5.1 BioWallet: The Patient Identity Substrate

Every GenoClaw instance is anchored to a patient's BioWallet — a Web3 wallet implementation that serves as the cryptographic identity substrate for all genomic data ownership operations. GenoBank.io supports five wallet integration methods, all converging to an EIP-712 typed-data signature that authenticates API requests:

All wallet addresses are stored in checksummed form (EIP-55) throughout the system. Session credentials are persisted in localStorage (not sessionStorage) to survive page refreshes without requiring re-signature. The environment configuration system (js/env.js) automatically selects staging or production API endpoints based on the URL path prefix, enabling transparent testing without code modification.

Traditional informed consent is a static, binary event: a patient signs a form once, and the consent persists until revoked. GenoBank.io's consent model is metamorphic: consent transforms from a static permission grant into an ongoing economic relationship through the combination of BioNFTs, Shapley value attribution, and Biodata Dividends.

The metamorphic consent lifecycle operates as follows:

  1. Initial Consent Event: Patient mints a BioNFT on Avalanche C-Chain and attaches a BioPIL license specifying permitted uses, price per access, and revenue share percentage.
  2. Active Research Access: Each time a researcher accesses the patient's data via x402 BioRouter, a micropayment is collected, a usage record is written to the Sequentias ledger, and a revenue share is distributed to the patient's wallet.
  3. Attribution Accumulation: As the patient's data contributes to AI model training or research publications, Shapley value calculations (computed off-chain and anchored to the Sequentias ledger) quantify the marginal contribution of the patient's specific data to each downstream output.
  4. Dynamic Terms Update: The patient may update their BioPIL terms at any time — changing the price, restricting permitted uses, or granting exclusive access to a specific researcher — without invalidating the Story Protocol attribution record.
  5. Consent Revocation: The patient burns the BioPIL token, immediately blocking all new access. Historical usage records and attribution data remain on-chain for audit purposes.

This metamorphic structure means that a patient who consented in 2024 does not simply have a permission form on file — they have an active economic instrument that generates ongoing returns as their data continues to provide value to the research community.

5.3 x402 BioRouter: Agent-to-Agent Data Commerce

The long-term vision of GenoBank.io is a galaxy of patient-owned AI agents, each carrying its own genomic and clinical dataset, interacting with researcher AI agents through the x402 BioRouter protocol. The economic mechanics of this network are:

A researcher's AI agent — running autonomously as part of a drug discovery pipeline — issues a GET request to the BioRouter with a query BioCID describing the type of data required (e.g., vcf/BRCA1/pathogenic/female/50-65). The BioRouter returns a list of matching patient data assets with their price and license terms. The researcher agent evaluates the terms programmatically against its research protocol's consent requirements, authorizes payment for those assets that match, and receives time-limited access tokens. The entire transaction — discovery, negotiation, payment, access, attribution — is automated, auditable, and patient-controlled.

This is not a hypothesis or a design proposal. The x402 BioRouter protocol is implemented and deployed on GenoBank.io's production infrastructure as of March 2026, with the Sequentias Network (chain ID 15132025) serving as the settlement and audit ledger.

The following results are from validated production testing of the complete GenoClaw pipeline on real patient data (de-identified for this publication) as of March 2026. All figures represent actual measured values, not projections.

2.87M
Variant records (WGS VCF)
99 sec
DeepVariant runtime (A100)
1,463
FHIR records imported
351
Annotated variants
12
Oncology annotators
9
Agent skills

6.1 Genomic Processing Performance

Metric Value System Notes
Variant records in output VCF 2,870,000 Clara Parabricks 4.6.0-1 + DeepVariant WGS input at 30x mean coverage
Wall-clock variant calling time 1 min 39 sec (99 sec) NVIDIA A100 (Google Compute Engine) gcsfuse-mounted BAM, no local copy
Annotated variants (OpenCRAVAT) 351 OpenCRAVAT 2.13.0, 12-annotator panel Filtered to PASS + ACMG P/LP/VUS
Annotation runtime < 15 min OpenCRAVAT (cravat.genobank.app) 12 annotators in parallel
GCS storage cost (30x WGS BAM) ~$0.023 / month Google Cloud Storage Nearline ~60 GB compressed BAM

6.2 Clinical Data Integration

FHIR Resource Type Count Source Institution
Total records1,463UCSF Health (Epic MyChart FHIR R4)
Condition (diagnoses)42ICD-10 coded
MedicationRequest101Active + historical
Observation (labs)441LOINC coded
Observation (vitals)290LOINC coded
DiagnosticReport67Imaging + pathology
Immunization31CVX coded
Procedure88CPT coded
AllergyIntolerance12SNOMED coded
Other resource types391Mixed

6.3 Agent Capability Demonstration

With the combined clinical-genomic knowledge graph loaded, GenoClaw demonstrated the following agent capabilities in production testing:

Key Result

The complete pipeline — from BAM upload through variant calling, FHIR import, annotation, and agent-ready knowledge graph construction — was validated end-to-end on production infrastructure in March 2026. Processing time from upload to first agent query: under 30 minutes for whole-genome data.

7.1 Regulatory Implications

The GDPR and CCPA compliance posture of GenoClaw differs fundamentally from that of conventional health data platforms. Conventional platforms must comply with GDPR because they are data controllers holding patient data without explicit patient authorization at the structural level — consent forms exist as legal documents, but the data resides on the company's infrastructure and the company makes operational decisions about its use.

In GenoClaw's architecture, the patient is the data controller in the GDPR sense: they decide where data is stored (their GCS bucket), who can access it (BioPIL terms), on what conditions (price, permitted uses, duration), and they hold the technical key to revoke access (burning the BioPIL token). GenoBank.io is a software provider, not a data controller. This structural difference significantly reduces GenoBank.io's GDPR compliance burden while increasing the actual privacy protection available to patients.

The HIPAA Safe Harbor de-identification applied before LLM processing ensures that even if a language model provider were subpoenaed or breached, the data they hold contains no PHI — it is legally de-identified under 45 CFR § 164.514(b) and thus not subject to HIPAA's breach notification provisions.

7.1.1 BioNFTs and HIPAA: The Patient as Covered Entity

HIPAA's Privacy Rule (45 CFR Parts 160 and 164) was designed to regulate "covered entities" — healthcare providers, health plans, and healthcare clearinghouses — that create, receive, maintain, or transmit Protected Health Information (PHI). Critically, patients are not covered entities under HIPAA. A patient who holds their own health data in their own BioNFT-gated vault is not subject to HIPAA's restrictions on data use and disclosure.

This creates a fundamental architectural advantage: when a patient imports their Epic MyChart records into their BioWallet via the FHIR Patient Access API (exercising their Individual Right of Access under 45 CFR § 164.524), the data transitions from HIPAA-regulated space (the hospital's EHR system) to patient-controlled space (the BioNFT-gated GCS bucket). The patient may then share, analyze, license, or monetize their data without the restrictions that would apply to a covered entity.

The BioNFT serves as the cryptographic proof of this transition: the token's ownership record on Avalanche C-Chain establishes that the data is held by a patient wallet (not a covered entity's system), and the BioPIL license terms on Sequentias encode the patient's autonomous decisions about who may access their data and under what conditions.

7.1.2 BioNFTs and GDPR: Data Controller by Design

Under GDPR Article 4(7), a "controller" is the natural or legal person that determines the purposes and means of processing personal data. In traditional health data systems, the hospital or company is the controller, and the patient is the "data subject" — a passive beneficiary of regulatory protections. BioNFTs invert this relationship:

GDPR Right Traditional Implementation BioNFT Implementation
Art. 15: Right of Access Patient submits formal request; company has 30 days to respond Patient reads their own GCS bucket at any time (wallet key = access key)
Art. 17: Right to Erasure Patient submits deletion request; company must verify identity and process Patient deletes files from their GCS bucket or burns the BioNFT
Art. 20: Right to Portability Company exports data in "commonly used format" within 30 days Data is already in patient's vault in standard formats (FHIR JSON, VCF)
Art. 7(3): Right to Withdraw Consent Patient contacts company; company updates internal database Patient burns BioPIL token on Sequentias; access instantly revoked on-chain
Art. 25: Data Protection by Design Company implements technical measures as policy commitment BioNFT-gated access is the architecture itself — not a policy overlay
Art. 30: Records of Processing Company maintains internal logs; subject to audit All access events recorded on immutable blockchain; publicly auditable

The key insight is that BioNFTs do not merely comply with GDPR — they make the regulatory framework structurally unnecessary for data the patient controls. When the patient is simultaneously the data subject AND the data controller, the protective provisions of GDPR that exist to shield subjects from controllers become redundant. The regulations remain applicable to GenoBank.io as a software provider (processor), but the scope of regulated processing is dramatically narrower because the patient — not GenoBank.io — holds and controls the data.

7.1.3 BioNFTs and CCPA/CPRA: Opt-Out by Architecture

The California Consumer Privacy Act (CCPA), as amended by the California Privacy Rights Act (CPRA), grants consumers the right to opt out of the sale of their personal information (Cal. Civ. Code § 1798.120). In the BioNFT model, data is never sold by GenoBank.io because GenoBank.io never possesses or controls the data. When a researcher accesses patient data via x402 BioRouter, the payment flows directly from the researcher's wallet to the patient's wallet — GenoBank.io is not a party to the data transaction. The patient is not "opting out" of data sales; they are the seller, setting their own price via BioPIL terms.

CPRA's category of "Sensitive Personal Information" — which explicitly includes genetic data (Cal. Civ. Code § 1798.140(ae)(4)) — receives the highest protection. In the BioNFT model, sensitive genetic data never resides on any company's infrastructure in identified form. The only entity with access to identified genetic data is the patient themselves, via their BioWallet.

7.1.4 BioNFTs and GINA: Discrimination Protection Through Opacity

The Genetic Information Nondiscrimination Act of 2008 (GINA) prohibits health insurers (Title I) and employers (Title II) from using genetic information in coverage, underwriting, or employment decisions. However, GINA has a fundamental weakness: it assumes that genetic data flows through institutional channels where it could be accessed by insurers or employers.

BioNFTs add a complementary protection layer: because the patient's genomic data is stored in a BioNFT-gated vault accessible only via the patient's wallet key, it is technically impossible for an insurer or employer to access the data without the patient's explicit authorization (in the form of a BioPIL license grant). The Bloom filter mechanism enables variant-level queries without exposing the underlying data, so a patient can participate in research studies that query for specific variants (e.g., "do you carry BRCA1 c.5266dupC?") without revealing their complete genetic profile to anyone.

This creates discrimination protection by design: even if GINA's legal protections were weakened or repealed, the cryptographic access control of the BioNFT vault would still prevent unauthorized access to genetic information.

7.1.5 BioNFTs and the 21st Century Cures Act: Patient Access Fulfilled

The 21st Century Cures Act (2016) and the ONC Patient Access Final Rule (CMS-9115-F) mandate that patients have electronic access to their health information "without special effort." The FHIR R4 Patient Access API implemented by Epic MyChart is the technical fulfillment of this mandate. GenoClaw's Epic FHIR integration takes this mandate to its logical conclusion: the patient not only accesses their data — they import it into a sovereign vault where they can analyze it with AI, annotate it with genomic databases, and license it for research.

The BioNFT serves as the patient's receipt of data sovereignty: it proves that the patient exercised their Cures Act right to access their data, imported it into their own infrastructure, and now holds it as a cryptographically verified asset. This is the first system that treats the Cures Act not as a regulatory checkbox (providing an API endpoint) but as a genuine transfer of data ownership from institution to patient.

7.2 Limitations and Current Constraints

The following limitations are acknowledged:

7.3 Comparison with Competing Approaches

Feature GenoClaw Federated Learning Traditional EHR AI Personal Health Record Apps
Patient data ownership Cryptographic (BioNFT) Company (gradient server) Company / Health System Nominal (ToS-dependent)
Data quality Complete, authentic Degraded (gradient approx.) Complete (inaccessible) Variable
Patient attribution On-chain, permanent None None None
Revenue sharing Automatic micropayment None None None
Consent revocation Cryptographic (token burn) Impossible after incorporation Legal request only Account deletion
GDPR Art. 17 compliance Architectural Structural conflict Policy + legal Policy
Bankruptcy protection Patient holds keys None None None
AI analysis capability Full (9 skills) Aggregate only Platform-dependent Basic

7.4 Future Work

Several extensions to GenoClaw are in active development or planned for the near term:

GenoClaw demonstrates that patient ownership of AI health agents is not a theoretical aspiration but a deployable, production-validated architecture. The system processes whole-genome sequencing data in under two minutes, integrates over 1,400 structured clinical records from live EHR systems, annotates variants against twelve oncology knowledge bases, and exposes nine bioinformatics skills through a natural language interface — all while maintaining cryptographic patient data sovereignty, HIPAA-compliant de-identification, and on-chain consent audit trails.

The 23andMe bankruptcy was not an anomaly. It was the natural endpoint of a custodial data model that treats patient genomic information as a corporate asset. Any company that holds patient genomic data as its primary asset is a company whose interests are structurally misaligned with those of its patients, regardless of its privacy policy, its consent forms, or its regulatory compliance posture.

The Decentralized Virtual Bioinformatic Machine architecture eliminates the structural misalignment by ensuring that patient data never becomes a company asset in the first place. The patient's BioWallet is not a privacy policy — it is a cryptographic key. The patient's BioPIL license is not a consent form — it is a smart contract. The patient's Biodata Dividend is not a loyalty program — it is an automated payment from blockchain settlement.

GenoBank.io's vision is a galaxy of patient-owned AI agents, each a complete Decentralized Virtual Bioinformatic Machine, interacting with researchers, clinicians, and AI systems through transparent, auditable, and economically fair protocols. GenoClaw is the first instance of this galaxy. The x402 BioRouter is the infrastructure that will connect them.

Closing Principle

Privacy is not about hiding data or making it fuzzy. Privacy is about giving patients complete control over their authentic, high-quality data, with full transparency about its use and fair compensation for its value.

References

[1] 23andMe Holding Co. (2025). Voluntary petition for relief under Chapter 11, United States Bankruptcy Court, Eastern District of Missouri, Case No. 25-XXXXX. Filed March 23, 2025.
[2] Health Insurance Portability and Accountability Act of 1996 (HIPAA), Pub. L. No. 104-191, 110 Stat. 1936 (1996). 45 CFR Parts 160 and 164.
[3] European Parliament and Council of the European Union. (2016). Regulation (EU) 2016/679 — General Data Protection Regulation. Official Journal of the European Union L 119/1. May 4, 2016.
[4] California Consumer Privacy Act of 2018, Cal. Civ. Code §§ 1798.100–1798.199.100 (2018), as amended by California Privacy Rights Act of 2020.
[5] Office of the National Coordinator for Health Information Technology. (2020). 21st Century Cures Act: Interoperability, Information Blocking, and the ONC Health IT Certification Program. Final Rule. 85 Fed. Reg. 25642 (May 1, 2020).
[6] McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54, 1273-1282.
[7] Ziller, A., Gruber, T., Bernhardt, J., Hammerla, N., Rueckert, D., Buettner, M., & Kaissis, G. (2024). Privacy considerations for sharing genomic data and biospecimens for research. Nature Medicine, 30, 1228-1234.
[8] Boneh, D., Boyen, X., & Halevi, S. (2006). Chosen ciphertext secure public key threshold encryption without random oracles. In Topics in Cryptology — CT-RSA 2006. Lecture Notes in Computer Science, vol. 3860.
[9] Poplin, R., Chang, P. C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., Newburger, D., Dijamco, J., Nguyen, N., Afshar, P. T., Gross, S. S., Dahl, L., DePristo, M. A., & Cain, D. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology, 36(10), 983-987. https://doi.org/10.1038/nbt.4235
[10] Uribe, D. (2022). US Patent Application: Systems and methods for biospecimen tokenization using non-fungible tokens on a blockchain and family digital wallet. USPTO Patent No. [Patent No. on file]. Washington, DC: United States Patent and Trademark Office.
[11] Entriken, W., Shirley, D., Evans, J., & Sachs, N. (2018). EIP-721: Non-Fungible Token Standard. Ethereum Improvement Proposals. https://eips.ethereum.org/EIPS/eip-721
[12] Story Protocol. (2024). Story Protocol: Programmable IP Infrastructure. Technical Whitepaper v1.0. https://docs.storyprotocol.xyz
[13] Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. https://doi.org/10.1145/362686.362692
[14] Nellore, A., Jagannatham, R., Rao, A., & Langmead, B. (2016). Privacy-preserving genomic data sharing via Bloom filters. Bioinformatics, 32(12), i381-i389.
[15] Mickaël Salaün et al. (2022). Landlock: unprivileged access control. Linux Kernel Documentation. https://www.kernel.org/doc/html/latest/userspace-api/landlock.html. Kernel 5.13+.
[16] Edge, J. (2015). A seccomp overview. LWN.net. https://lwn.net/Articles/656307/. See also: Linux kernel seccomp(2) manual page.
[17] Department of Health and Human Services. (2012). Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. 45 CFR § 164.514(b).
[18] Coinbase. (2025). x402: HTTP 402 Payment Required protocol specification for micropayment-gated API access. https://github.com/coinbase/x402
[19] Poplin, R., et al. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology, 36(10), 983-987. NVIDIA Clara Parabricks implementation: https://developer.nvidia.com/clara-parabricks
[20] Mandel, J. C., Kreda, D. A., Mandl, K. D., Kohane, I. S., & Ramoni, R. B. (2016). SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. Journal of the American Medical Informatics Association, 23(5), 899-908. https://doi.org/10.1093/jamia/ocv189
[21] Pagel, K. A., Kim, R., Moad, K., Busby, B., Zheng, L., Tokheim, C., Bailey, M., & Karchin, R. (2020). Integrated informatics analysis of cancer-related variants. JCO Clinical Cancer Informatics, 4, 310-317. https://doi.org/10.1200/CCI.19.00132
[22] Kingsmore, S. F., et al. (2024). Newborn screening using sequencing: BeginNGS pilot study. American Journal of Human Genetics. December 2024.
[23] Ziegler, A., et al. (2024). GUARDIAN study: Genome sequencing in the NICU — early experience and opportunities. Genetics in Medicine. December 2024.