Task Processing System
The task processing system handles operations ranging from quick validations to long-running deployments. It serves as the execution backbone of the platform, accepting requests, scheduling them into priority queues, executing them in ephemeral workers, and streaming progress back to clients.
Task Lifecycle
A task moves through a defined sequence of stages from submission to completion. The client submits a task to the API, which performs validation and priority analysis before enqueueing the task. The client receives a task identifier and WebSocket channel for receiving updates. A worker manager polls the queue, spawns an ephemeral worker for the task, and the worker executes while reporting progress. Upon completion, the worker terminates and its resources are cleaned up.
Queue Classification
Tasks are routed into one of three queues based on their characteristics and latency requirements.
The critical queue handles short operations that typically block interactive workflows. Linting, quick validations, and configuration checks fall into this category. The queue maintains dedicated worker capacity to ensure rapid feedback, with tasks typically completing in seconds rather than minutes.
The normal queue serves as the default for day-to-day automation work. Package installations, deployment validations, environment preparation, and source tracking operations run through this queue. Tasks are scheduled fairly to balance throughput and responsiveness, accommodating the variable execution times of different operation types.
The batch queue handles CPU-intensive or long-running work. Full test suites, bulk operations, nightly builds, and comprehensive org validations are routed here. Batch work can be deprioritized when the system is under load and is often scheduled to take advantage of off-peak capacity.
Worker Execution Model
Each worker follows a strict lifecycle designed to maintain isolation and repeatability across task executions.
A worker starts from a clean container image with no residual state from previous executions. It loads only the credentials required for its assigned task using just-in-time secret access. The runtime environment is prepared according to task requirements, and execution begins with progress updates streamed to the task service throughout. Upon completion, credentials are explicitly cleared, temporary state is removed, and the container terminates.
This model is intentionally stateless between runs. It reduces the chance of cross-task leakage, avoids credential persistence, and makes resource cleanup predictable. A worker that fails unexpectedly leaves no sensitive data behind because credentials exist only in memory during execution.
Resource Management
The task processing system implements resource controls to maintain stability and fair allocation across concurrent workloads.
Each queue type has its own worker pool with independent scaling characteristics. The critical pool maintains a minimum number of available workers to ensure low-latency response for interactive operations. The normal pool scales based on demand, adding workers when queue depth increases and removing them during idle periods. The batch pool uses excess capacity, scaling down aggressively when higher-priority work requires resources.
Resource quotas operate at multiple levels. Per-tenant limits prevent any single organization from monopolizing system capacity. Queue-specific allocations ensure that batch work cannot starve critical operations. Individual task boundaries constrain memory and CPU usage to prevent runaway operations from affecting other tasks.
Error Handling
The system implements error handling at multiple levels to maintain reliability while providing clear feedback to users.
When a task fails, the system captures detailed error information including stack traces, environment state, and operation context. This information is persisted for debugging and is made available to clients through the WebSocket channel and API queries. Any partial changes are rolled back where possible, and the task is marked with a failed status that includes the error details.
Worker failures (crashes, timeouts, resource exhaustion) trigger forced cleanup of the worker container and its resources. The associated task is marked as failed with appropriate context, and clients are notified through the real-time update channel. The system maintains an audit trail of worker failures for operational monitoring and capacity planning.
Queue state is preserved across system restarts. Tasks that were in progress when the system stopped are detected during startup and can be requeued or marked as interrupted based on configuration. Clients can reconnect to existing task channels after transient disconnections and resume receiving progress updates from where they left off.
Last updated