Observability (Logging, Metrics, Tracing)¶
Observability is lightweight, pluggable, and enabled by default for structured logs.
Events¶
Core events emitted (subscribe via auto_workflow.subscribe):
- flow_started
- flow_completed
- task_started
- task_retry
- task_failed
- task_succeeded
Example subscriber:
from auto_workflow import subscribe
def on_flow_started(payload):
    print("flow started:", payload)
subscribe("flow_started", on_flow_started)
Logging¶
Structured logging is registered by default at import-time and writes human-friendly pretty logs
to the auto_workflow.tasks logger. A stdout handler is attached by default.
You can control this via environment variables:
- AUTO_WORKFLOW_DISABLE_STRUCTURED_LOGS=1 — disable structured logging entirely
- AUTO_WORKFLOW_LOG_LEVEL=DEBUG|INFO|... — change log level (default INFO)
Programmatic control:
from auto_workflow.logging_middleware import register_structured_logging
register_structured_logging()     # idempotent; pretty output enabled by default
Emitted log events:
- flow_started: flow, run_id, ts
- flow_completed: flow, run_id, tasks, ts
- task_started: task, node, ts
- task_ok: task, flow, run_id, ts, duration_ms
- task_err: task, flow, run_id, ts, duration_ms, error
Example pretty output:
2025-10-12 00:22:48+0100 | INFO | flow_started | flow=etl_flow run_id=...
2025-10-12 00:22:48+0100 | INFO | task_started | task=extract_raw node=extract_raw:1
2025-10-12 00:22:48+0100 | INFO | task_ok | flow=etl_flow run_id=... task=extract_raw duration=56.2ms
2025-10-12 00:22:48+0100 | INFO | flow_completed | flow=etl_flow run_id=... tasks=4
Metrics¶
The default in-memory provider records counters & simple histograms:
- tasks_succeeded
- tasks_failed
- task_duration_ms
- cache_hits
- cache_sets
- dedup_joins (number of followers waiting on an in-flight identical task)
Swap out the provider with your own implementation via set_metrics_provider().
Tracing¶
A lightweight DummyTracer yields an async span(name, **attrs) context. Flows and each task execution are wrapped, providing hook points to inject OpenTelemetry or custom logging.
What You Get¶
| Span Name Pattern | Attributes Provided | When Emitted | 
|---|---|---|
| flow:<flow_name> | (future) run_id | At flow start/end | 
| task:<task_name> | node(unique node id) | For every task execution | 
Custom Recording Tracer Example¶
See examples/tracing_custom.py for a richer script. Minimal inline version:
from auto_workflow.tracing import set_tracer
from contextlib import asynccontextmanager
import time
class RecordingTracer:
    @asynccontextmanager
    async def span(self, name: str, **attrs):
        start = time.time()
        try:
            yield
        finally:
            print(f"span={name} attrs={attrs} ms={(time.time()-start)*1000:.2f}")
set_tracer(RecordingTracer())
OpenTelemetry Integration Sketch¶
from contextlib import asynccontextmanager
from opentelemetry import trace
from auto_workflow.tracing import set_tracer
otel = trace.get_tracer("auto_workflow")
class OTELTracer:
    @asynccontextmanager
    async def span(self, name: str, **attrs):
        with otel.start_as_current_span(name) as sp:
            for k,v in attrs.items():
                sp.set_attribute(k, v)
            try:
                yield
            except Exception as e:  # record & re-raise
                sp.record_exception(e)
                sp.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
                raise
set_tracer(OTELTracer())
Planned Enhancements¶
- Error flag & retry metrics as span attributes
- Cache hit / dedup indicators
- Optional span sampling configuration
See the Logging section above for the built-in middleware and controls.