Skip to main content

Handling Hot Keys in List Partitioned Tables

Hot keys in list-partitioned tables cause severe write amplification and I/O bottlenecks when specific discrete values receive disproportionate traffic. Without intervention, concentrated inserts trigger row-level lock contention, buffer cache thrashing, and degraded query planner performance. This guide provides exact diagnostic steps, routing adjustments, and DDL configurations to mitigate partition skew without sacrificing query performance. Foundational routing concepts and architectural decision matrices are detailed in Partitioning Implementation Patterns & Routing.

Key Operational Objectives:

  • Identify partition skew using I/O metrics and lock contention traces.
  • Apply key salting and sub-partitioning to distribute hot values.
  • Implement dynamic routing fallbacks for unbalanced discrete categories.

Root Cause Analysis: Identifying Partition Skew

Before applying structural changes, quantify the skew using database telemetry. Static list routing often masks underlying I/O bottlenecks until transactional latency spikes. Compare your current setup with Range Partitioning Strategies to understand why sequential write patterns fail under discrete category spikes.

Diagnostic Playbook

  1. Monitor Partition-Level I/O & Lock Waits: Track pg_stat_user_tables for disproportionate n_tup_ins and n_tup_upd counts. Correlate with pg_stat_activity to isolate Lock wait events on hot partitions.
  2. Analyze Buffer Cache Efficiency: Low cache hit ratios on specific partitions indicate hot key thrashing. Use pg_statio_user_tables to measure heap and index block reads per partition.
  3. Validate Cardinality Distribution: Compare actual partition row counts against expected business logic distributions. A single partition holding >30% of total writes is a confirmed hot key candidate.

Diagnostic Query (PostgreSQL):

SELECT
  schemaname,
  relname          AS partition_name,
  n_live_tup,
  n_tup_ins,
  n_tup_upd,
  heap_blks_read,
  heap_blks_hit
FROM pg_stat_user_tables
WHERE relname LIKE 'events_%'
ORDER BY n_tup_ins DESC
LIMIT 5;

Failure Mode: Relying solely on table-level metrics obscures partition-level contention. Always drill down to partition-specific statistics before authorizing schema changes.


Mitigation Strategy 1: Hash Sub-Partitioning

Key salting distributes write load across multiple physical segments by sub-partitioning a hot list value using hash distribution. PostgreSQL supports sub-partitioning natively: a list child partition can itself be partitioned by hash.

Zero-Downtime Execution Path

  1. Create the new sub-partitioned structure alongside the legacy table.
  2. Backfill historical data using batched INSERT ... SELECT with LIMIT/OFFSET or logical replication.
  3. Deploy dual-write routing in the application layer.
  4. Cutover traffic using a feature flag, then drop the legacy table after validation.

Configuration Example (PostgreSQL native sub-partitioning):

-- Parent table partitioned by LIST on event_type
CREATE TABLE events (
  event_id   UUID         NOT NULL DEFAULT gen_random_uuid(),
  event_type VARCHAR(50)  NOT NULL,
  payload    JSONB,
  created_at TIMESTAMPTZ  NOT NULL DEFAULT NOW()
) PARTITION BY LIST (event_type);

-- Hot category: sub-partition by hash on event_id to spread writes
CREATE TABLE events_click PARTITION OF events
  FOR VALUES IN ('CLICK')
  PARTITION BY HASH (event_id);

CREATE TABLE events_click_0 PARTITION OF events_click FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_click_1 PARTITION OF events_click FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_click_2 PARTITION OF events_click FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_click_3 PARTITION OF events_click FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- Low-volume categories remain as simple list partitions
CREATE TABLE events_purchase PARTITION OF events FOR VALUES IN ('purchase', 'refund');
CREATE TABLE events_default  PARTITION OF events DEFAULT;

Operational Note: PostgreSQL requires that event_id appear in the PRIMARY KEY for this to enforce uniqueness across sub-partitions. Queries filtering only on event_type = 'CLICK' will touch all four sub-partitions, but that is an acceptable cost for eliminating write contention. Validate sub-partition row counts post-migration to confirm uniform distribution.


Mitigation Strategy 2: Dynamic Routing & Automated Workflows

When DDL changes are restricted or traffic patterns are highly volatile, implement application-level routing logic to redirect hot keys to underutilized partitions.

Implementation Workflow

  1. Threshold Monitoring: Track partition IOPS and queue depth via Prometheus/Grafana.
  2. Consistent Hashing Fallback: When a partition exceeds defined IOPS thresholds, route new writes to an overflow segment using hash routing on the primary key.
  3. Automated Provisioning: Trigger infrastructure-as-code pipelines to attach new overflow partitions when trending keys breach capacity limits.

Application-Level Routing Logic (TypeScript/Node.js):

function routePartition(key: string, hotKeys: Set<string>): string {
  if (hotKeys.has(key)) {
    // Deterministic overflow routing to prevent lock contention
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash = ((hash << 5) - hash + key.charCodeAt(i)) | 0;
    }
    return `overflow_${Math.abs(hash) % 8}`;
  }
  return `partition_${key}`;
}

Failure Mode: Routing table desynchronization during deployment causes split-brain writes. Use a distributed configuration store (etcd, Consul) with versioned routing manifests and atomic hot-reload capabilities.


Lifecycle Management for Skewed Partitions

Sustained hot key traffic requires proactive retention and maintenance strategies.

  • Tiered Storage Policies: Offload historical hot key data to cold storage (e.g., AWS S3, Azure Blob) using partition detachment (ALTER TABLE events DETACH PARTITION events_click CONCURRENTLY) and external table mapping.
  • Retention Alignment: Align archival policies with partition growth rates. Automate DROP TABLE for detached partitions after successful archival via cron or event-driven schedulers.
  • Composite Key Optimization: Combine tenant_id + event_type as a composite list key to prevent full partition scans during analytical queries while maintaining insertion throughput.

Zero-Downtime Maintenance: Always execute ANALYZE and VACUUM during off-peak windows. Use DETACH PARTITION CONCURRENTLY (PostgreSQL 14+) to swap hot partitions with pre-warmed staging tables without blocking active transactions.


Common Mistakes & Failure Modes

Mistake Operational Impact Mitigation
Over-partitioning to mitigate hot keys Excessive partitions increase catalog metadata overhead and complicate VACUUM/ANALYZE cycles without resolving the I/O bottleneck Limit partition count to a few hundred per table. Use sub-partitioning or hash routing instead of proliferating list values
Salting without aligning query predicates Salting breaks partition pruning for queries that filter only on the original unsalted key, forcing full scans Ensure application queries include the sub-partition key, or use generated columns to maintain pruning compatibility
Static routing without dynamic fallback Hardcoded list-to-partition mappings fail during traffic spikes, causing lock contention and cascading connection pool exhaustion Implement circuit breakers and consistent hashing fallbacks with real-time IOPS monitoring

FAQ

How do I detect hot keys in an existing list-partitioned table? Monitor partition-level I/O metrics, row lock wait times, and buffer pool hit ratios. Compare partition sizes and write frequencies against expected cardinality distributions. Use EXPLAIN (ANALYZE, BUFFERS) to identify sequential scans on disproportionately large partitions.

Does key salting impact query performance for hot partitions? Yes, if queries filter on the original unsalted key. Use application-level query rewriting, generated columns, or composite partition pruning to maintain read performance while distributing writes across sub-partitions.

When should I switch from list to hash partitioning for hot keys? Switch when discrete categories lose semantic routing value or when hot keys consistently exceed 30% of total write volume. Hash partitioning guarantees uniform distribution but sacrifices category-based partition pruning and complicates targeted archival workflows.