Caching & Artifacts¶

Two related mechanisms help control recomputation and large payload handling: result caching and artifact persistence.

Result Cache¶

Enable per task with cache_ttl (seconds):

@task(cache_ttl=600)
def expensive(x: int) -> int:
    return compute(x)

If a cached entry is present and fresh, the task body is skipped.

Backends¶

Configured via load_config() values: - result_cache = "memory" (default) - result_cache = "filesystem" -> stores pickled (timestamp, value) in path specified by result_cache_path (default .aw_cache).

Cache Key¶

Default key uses function qualname + argument serialization (via a deterministic string builder). Override with cache_key_fn.

Artifact Persistence¶

For large results, mark task as persist=True:

@task(persist=True)
def produce():
    return {"data": list(range(1000))}

The output becomes an ArtifactRef (lightweight handle). Retrieve the underlying value:

from auto_workflow.artifacts import get_store
val = get_store().get(ref)

Artifact Backends¶

Memory (default)
Filesystem (artifact_store = "filesystem", directory artifact_store_path, default .aw_artifacts)
- Serializer: artifact_serializer = "pickle" (default) or "json" (JSON-serializable values only).
- Security note: Pickle is only safe in trusted environments; prefer json for simple types.
- Implementation writes/reads directly to disk to avoid keeping duplicate in-memory copies in the FS backend.

Choosing Between Cache & Artifact¶

Use Case	Mechanism
Avoid re-running deterministic expensive function	Result Cache
Pass large payload downstream without duplicating in memory	Artifact (persist=True)
Both (skip recompute + offload memory)	persist + cache_ttl