System Catalog Query Optimization

Automated RBAC drift detection is only as trustworthy as the catalog reads that feed it. Every diff, every score, and every audit artifact downstream inherits the fidelity — and the latency — of the queries that pull privilege state out of a live database. When platform operations teams and database reliability engineers extract grants across a heterogeneous fleet, an unoptimized catalog scan is not a minor inefficiency: it is a correctness and availability hazard. A full sweep of information_schema against a production instance holding tens of thousands of roles and hundreds of thousands of object grants can hold a read transaction open for minutes, pin a connection from an already-strained pool, and return a snapshot that no longer reflects the state the diff engine believes it captured.

The failure scenario is concrete. A nightly extraction job issues an unpredicated SELECT against information_schema.role_table_grants on every instance in parallel. On the largest instance the query plan degenerates to a sequential scan over the underlying pg_class and pg_attribute relations; the job exceeds its statement timeout, the extraction worker retries, and the retry storm exhausts the connection pool that the application also depends on. The drift run either aborts with a partial dataset — silently under-reporting real privilege changes — or completes so late that the resulting evidence artifact carries a capture timestamp auditors will not accept as “current.” Optimizing catalog queries is therefore a foundational control, not a performance nicety. This work sits at the front of the pipeline defined in Cross-Environment Privilege Extraction & Parsing, and every technique below exists to make catalog interrogation fast enough to run continuously and precise enough to be legally defensible.

Why Naive Catalog Sweeps Break Drift Pipelines

Three distinct failure modes make broad metadata sweeps unusable at scale, and each one maps to a specific optimization discipline covered later on this page.

Plan degeneration. information_schema views are portable SQL abstractions layered over the native catalogs. They join and filter far more than a drift extraction needs, and their predicates rarely align with the indexes on the underlying pg_catalog or data-dictionary relations. The planner cannot push a selective filter down through the view, so a query that “should” touch a few hundred rows scans millions.
Connection pressure. A long-running catalog read holds a connection and, under some isolation levels, a snapshot that pins vacuum. Fan out that read across an entire fleet concurrently and the extraction tier competes with production traffic for the same finite pool.
Snapshot skew. The longer a sweep runs, the wider the window between the first row read and the last. A drift diff assumes a coherent point-in-time capture; a slow scan violates that assumption and produces deltas that are artifacts of timing rather than real change.

The remedy is the same in every engine: anchor each query to an indexed principal identifier, filter by object type and scope at the source, unpack native privilege structures instead of re-deriving them through portable views, and bind the grantee as a parameter so the planner can reuse a cached plan. Where a single instance is too large for one synchronous read, partition the work and hand it to the batching layer described in Async Privilege Batching.

Prerequisites and Catalog Read Permissions

Before implementing the queries below, confirm the following in each target environment. Getting the permission grants wrong is the single most common reason an extraction job silently returns an incomplete privilege set.

PostgreSQL 12+. aclexplode() and acldefault() are available in all supported versions. The extraction role needs no superuser rights, but it must be able to read pg_catalog. A role can always see ACLs on objects it owns; to read every object’s ACL, grant the extraction principal pg_read_all_stats and membership in the object-owning roles, or use a dedicated auditor role with pg_monitor. Note that a non-privileged role querying pg_class.relacl will only see rows for objects it can access — a subtle source of under-reporting.
Oracle 12c+. Reading DBA_TAB_PRIVS, DBA_ROLE_PRIVS, and DBA_SYS_PRIVS requires SELECT ANY DICTIONARY or the SELECT_CATALOG_ROLE. Prefer a named auditor account with exactly these grants over reusing an application schema.
MySQL 8.0+. Role edges live in mysql.role_edges; object grants live in mysql.tables_priv, mysql.db, and the information_schema.*_privileges views. The extraction account needs SELECT on the mysql system schema (or the ROLE_ADMIN and SELECT privileges granted through an auditor role). Roles are a MySQL 8.0 feature; on 5.7 there is no role graph to extract.
Python 3.11+ with asyncpg (PostgreSQL), python-oracledb in thin mode (Oracle), and aiomysql or asyncmy (MySQL). Pin these in your lockfile — catalog column names and driver defaults do shift between major versions.
A read-only transaction discipline: every extraction connection should open with SET TRANSACTION READ ONLY (or the driver equivalent) so the catalog read can never mutate state, no matter what a code path does.

Building the Optimized Extraction Queries

The following four steps produce a coherent, indexed extraction path for each engine. Each query is written to run under a bound parameter and a READ ONLY transaction, and each returns rows already shaped for the normalizer that hands off to Schema Validation Pipelines.

Step 1 — Targeted PostgreSQL grant extraction with aclexplode

Rather than joining through information_schema.role_table_grants, read the native ACL array directly from pg_class.relacl and expand it with aclexplode(). This avoids the view’s internal catalog scans and lets the planner use the index on pg_roles.rolname when the grantee is bound as a parameter.

-- Extract object-level grants for a specific grantee without full catalog sweeps.
-- pg_class.oid is indexed; the nspname filter on pg_namespace is cheap.
SELECT
    r.rolname                          AS grantee,
    n.nspname                          AS table_schema,
    c.relname                          AS table_name,
    acl.privilege_type,
    acl.is_grantable
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
JOIN pg_catalog.pg_roles r     ON r.rolname = $1   -- bind the grantee
-- Unnest the ACL array to get one row per privilege.
CROSS JOIN LATERAL aclexplode(COALESCE(c.relacl, acldefault('r', c.relowner))) AS acl(
    grantor, grantee_oid, privilege_type, is_grantable
)
WHERE acl.grantee_oid = r.oid
  AND c.relkind IN ('r', 'v', 'm')   -- tables, views, materialized views
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY n.nspname, c.relname, acl.privilege_type;

aclexplode() unpacks the native ACL stored in pg_class.relacl, so the query never touches the heavier information_schema path. Binding the grantee as $1 lets the planner reuse a prepared plan and hit the pg_roles.rolname index instead of scanning the role catalog. Verify the plan before shipping:

EXPLAIN (ANALYZE, BUFFERS)
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_roles r ON r.rolname = 'ci_reader'
CROSS JOIN LATERAL aclexplode(COALESCE(c.relacl, acldefault('r', c.relowner))) acl
WHERE acl.grantee_oid = r.oid;
-- Expect an Index Scan (or Index Cond) on pg_roles, NOT a Seq Scan over pg_class
-- with a large filtered row count.

For the role graph itself — who is a member of whom — read pg_auth_members directly rather than reconstructing hierarchy from grants:

SELECT
    m.rolname   AS member,
    g.rolname   AS granted_role,
    a.admin_option
FROM pg_catalog.pg_auth_members a
JOIN pg_catalog.pg_roles m ON m.oid = a.member
JOIN pg_catalog.pg_roles g ON g.oid = a.roleid
ORDER BY member, granted_role;

Step 2 — Predicate-pushed Oracle data-dictionary extraction

Oracle’s data dictionary views (DBA_TAB_PRIVS, DBA_ROLE_PRIVS, DBA_SYS_PRIVS) are unforgiving of unpredicated access — an open SELECT provokes a full scan of very large dictionary structures. Anchor every query on GRANTEE and, for object privileges, OWNER, and let cursor sharing amortize parse cost across grantees.

-- Object privileges for one grantee, scoped away from Oracle-maintained schemas.
SELECT
    grantee,
    owner,
    table_name,
    privilege,
    grantable
FROM dba_tab_privs
WHERE grantee = :grantee
  AND owner NOT IN ('SYS', 'SYSTEM', 'OUTLN', 'DBSNMP', 'XDB')
ORDER BY owner, table_name, privilege;

The full-page walkthrough in Extracting user grants from Oracle data dictionary shows how bind variables plus CURSOR_SHARING keep parse overhead flat when you iterate over thousands of grantees, and how to union DBA_ROLE_PRIVS and DBA_SYS_PRIVS into the same canonical row shape. Bind the grantee as :grantee — never interpolate it into the SQL text — so the shared pool caches a single plan instead of hard-parsing one statement per user.

Step 3 — MySQL role-graph and object-grant extraction

MySQL 8.0 splits its RBAC state across the mysql system schema. The role graph lives in mysql.role_edges; object grants live in mysql.tables_priv and the information_schema privilege views. Bind the grantee and read the native tables directly.

-- Role membership edges for one account (MySQL 8.0+).
SELECT
    FROM_USER          AS granted_role,
    FROM_HOST          AS granted_role_host,
    TO_USER            AS grantee,
    TO_HOST            AS grantee_host,
    WITH_ADMIN_OPTION  AS admin_option
FROM mysql.role_edges
WHERE TO_USER = ?          -- bind the grantee account
ORDER BY FROM_USER;

-- Table-level object grants for one account, scoped away from system schemas.
SELECT
    `User`         AS grantee,
    `Db`           AS table_schema,
    `Table_name`   AS table_name,
    `Table_priv`   AS privileges     -- SET column: 'Select,Insert,Update'
FROM mysql.tables_priv
WHERE `User` = ?
  AND `Db` NOT IN ('mysql', 'sys', 'performance_schema', 'information_schema')
ORDER BY `Db`, `Table_name`;

The behavioral gap that trips normalizers: MySQL stores table privileges as a single SET column (Select,Insert,Update) that must be split into one row per privilege to match PostgreSQL’s already-per-privilege output. That split belongs in the Cross-DB Parser Adapters layer, not in the SQL, so each engine’s extraction query stays as thin and index-friendly as possible.

Step 4 — Wrapping the queries in a bounded async worker

Extraction rarely fits in a single synchronous transaction window across a fleet, so wrap each query in an async worker that binds parameters, enforces a READ ONLY transaction, and applies a hard statement timeout. This is the unit the batching layer schedules.

import asyncio
import asyncpg

EXTRACT_PG_GRANTS = """
SELECT r.rolname AS grantee, n.nspname AS table_schema,
       c.relname AS table_name, acl.privilege_type, acl.is_grantable
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
JOIN pg_catalog.pg_roles r     ON r.rolname = $1
CROSS JOIN LATERAL aclexplode(COALESCE(c.relacl, acldefault('r', c.relowner))) AS acl(
    grantor, grantee_oid, privilege_type, is_grantable)
WHERE acl.grantee_oid = r.oid
  AND c.relkind IN ('r', 'v', 'm')
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY n.nspname, c.relname, acl.privilege_type
"""


async def extract_grants(pool: asyncpg.Pool, grantee: str,
                         statement_timeout_ms: int = 5000) -> list[dict]:
    """Read one grantee's object grants under a read-only, timeout-bounded txn.

    Idempotent: issues no writes and returns the same rows for the same
    catalog state, so repeated runs are safe to retry.
    """
    async with pool.acquire() as conn:
        await conn.execute(f"SET statement_timeout = {int(statement_timeout_ms)}")
        async with conn.transaction(readonly=True):
            rows = await conn.fetch(EXTRACT_PG_GRANTS, grantee)
    return [dict(r) for r in rows]


async def extract_many(dsn: str, grantees: list[str], concurrency: int = 8):
    """Bounded fan-out so the extraction tier never starves the shared pool."""
    pool = await asyncpg.create_pool(dsn, min_size=2, max_size=concurrency)
    sem = asyncio.Semaphore(concurrency)

    async def _one(g: str):
        async with sem:
            return g, await extract_grants(pool, g)

    try:
        return dict(await asyncio.gather(*(_one(g) for g in grantees)))
    finally:
        await pool.close()

The Semaphore caps concurrency so the extraction tier cannot open more connections than the pool budget allows, and statement_timeout guarantees a runaway plan fails fast and loud instead of holding a snapshot open. Full retry, checkpoint, and partition strategy live in Async Privilege Batching.

Idempotency and Read-Only Safety Contract

Catalog extraction must be a pure read: running it once or a hundred times against an unchanged catalog must return the same privilege set and must never mutate the target. Three properties make that contract enforceable.

Read-only transactions everywhere. Every connection opens with SET TRANSACTION READ ONLY (PostgreSQL/MySQL) or a read-only session (Oracle). This is defense in depth: even a bug that constructs a DDL string cannot execute it. It also lets you run the entire pipeline in a genuine dry-run mode, because there is no other mode — extraction never writes to the target in the first place.
Deterministic ordering and shaping. Every query carries an explicit ORDER BY on (schema, object, privilege). Deterministic row order means the serialized snapshot is byte-stable for identical catalog state, which is what lets the diff engine and the evidence hash converge. Two runs against the same state produce the same artifact, so a re-run is a no-op rather than a spurious “change.”
Checkpointed cursors, not partial commits. When extraction is partitioned across batches, persist the batch cursor and a capture timestamp to a local state table — never to the target. A retried batch resumes from its checkpoint and overwrites its own partition in the local snapshot, so retries are idempotent by construction and never double-count a grant.

Because the read is pure, “dry-run” and “production” are the same code path with the same side effects (none) on the target. The only thing that changes downstream is whether the resulting canonical matrix is handed to a reporting-only diff or to a remediation apply — a decision that belongs to Drift Detection Engines & Diff Logic, not to extraction.

Compliance Alignment and Evidence Artifacts

Optimized catalog extraction is the evidence-generation step for several access-control controls, and the artifact it emits is what an auditor actually inspects.

SOC 2 CC6.1 / CC6.3 (logical access). A timestamped, point-in-time capture of who holds which privileges on which objects is direct evidence that logical access is provisioned and reviewed. The determinism contract above is what makes that evidence reproducible on demand.
PCI-DSS Requirement 7 (need-to-know access). The per-grantee, per-object output maps cleanly onto Requirement 7’s demand to demonstrate least-privilege assignment across the cardholder data environment.
HIPAA §164.312(a)(1) (access control). The role-graph extraction from pg_auth_members / mysql.role_edges / DBA_ROLE_PRIVS documents the technical access-control mechanism protecting electronic PHI.

The artifact itself is a JSON privilege matrix with one record per (grantee, object, privilege, is_grantable) tuple plus a capture timestamp and the source instance identifier. Because extraction is deterministic, that JSON can be hashed and the hash recorded as a tamper-evident manifest — auditors can re-run the extraction and confirm the hash matches. Vendor-specific quirks (PostgreSQL’s boolean is_grantable versus Oracle’s 'YES'/'NO' GRANTABLE, or MySQL’s SET-encoded Table_priv) are reconciled into one canonical shape by the Schema Validation Pipelines before the artifact is finalized, so the evidence a compliance officer reviews is engine-neutral.

Troubleshooting Matrix

Failure scenario	Root-cause signature	Remediation
Extraction returns fewer grants than a manual audit shows	Extraction role can only see ACLs on objects it can access; non-owned object rows are silently omitted	Grant the extraction principal `pg_monitor`/`pg_read_all_stats` (PostgreSQL), `SELECT_CATALOG_ROLE` (Oracle), or `SELECT` on `mysql.*` (MySQL); re-run and diff the row count
Query exceeds statement timeout on the largest instance	`EXPLAIN` shows a `Seq Scan` over `pg_class` / a full dictionary scan — the grantee predicate was not pushed down	Confirm the grantee is bound as a parameter (not interpolated), verify the `pg_roles.rolname` / `DBA_TAB_PRIVS.GRANTEE` filter is present, and partition by schema via the batching layer
Connection pool exhaustion during a fleet run	Unbounded fan-out opened more connections than the pool budget	Cap concurrency with a semaphore and set `max_size` on the pool below the instance’s available connections; back off on transient acquire failures
Diff reports “changes” that revert on the next run	Non-deterministic row order or an unstable snapshot window produced timing artifacts	Add explicit `ORDER BY (schema, object, privilege)` to every query and ensure each capture runs in a single read-only transaction
Oracle parse CPU spikes as grantee count grows	Grantee interpolated into SQL text forces a hard parse per user	Switch to a bind variable (`:grantee`) and enable `CURSOR_SHARING = FORCE` so the shared pool caches one plan
MySQL grants missing all role-based privileges	Reading only `tables_priv` ignores privileges inherited through the role graph	Also extract `mysql.role_edges` and resolve inherited grants during normalization in the parser-adapter layer

Where to Go Next

The techniques on this page are the read layer; the surrounding stages turn those reads into a diff-ready, audited dataset:

Extracting user grants from Oracle data dictionary — the full bind-variable and cursor-sharing walkthrough for DBA_*_PRIVS extraction.
Async Privilege Batching — partitioning, bounded concurrency, retry, and checkpointing that let these queries run continuously across a fleet.
Schema Validation Pipelines — type coercion and privilege-hierarchy mapping that turn raw catalog rows into the canonical, auditable matrix.

Up: Cross-Environment Privilege Extraction & Parsing