Skip to content

ADLS2 Connector (Azure Data Lake Storage Gen2)

The ADLS2 connector provides a production-friendly, sync client over Azure Data Lake Storage Gen2 with lazy imports, robust error mapping, and ergonomic helpers.

Installation

poetry install -E connectors-adls2
# or
poetry install -E connectors-all

Quick usage

from auto_workflow.connectors import adls2

with adls2.client("default") as fs:
    # Ensure container exists
    fs.create_container("bronze", exist_ok=True)
    fs.make_dirs("bronze", "events/2025-10-19", exist_ok=True)
    fs.upload_bytes(
        container="bronze",
        path="events/2025-10-19/sample.csv",
        data=b"id,name\n1,alice\n",
        content_type="text/csv",
        overwrite=True,
    )
    rows = list(fs.list_paths("bronze", prefix="events/2025-10-19/"))

Authentication & connection

Pick one:

  • Connection string (recommended):
  • Env: AUTO_WORKFLOW_CONNECTORS_ADLS2_<PROFILE>__CONNECTION_STRING="..."
  • Aliases: __CONN_STR, __DSN
  • Account URL + DefaultAzureCredential (AAD-based):
  • Env: AUTO_WORKFLOW_CONNECTORS_ADLS2_<PROFILE>__ACCOUNT_URL="https://<acct>.dfs.core.windows.net"
  • Env: AUTO_WORKFLOW_CONNECTORS_ADLS2_<PROFILE>__USE_DEFAULT_CREDENTIALS=true
  • Custom credential object (advanced): set via config/JSON overlay as credential.

Environment overrides

Prefix: AUTO_WORKFLOW_CONNECTORS_ADLS2_<PROFILE>__

export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__CONNECTION_STRING="DefaultEndpointsProtocol=..."
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__ACCOUNT_URL="https://myacct.dfs.core.windows.net"
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__USE_DEFAULT_CREDENTIALS=true
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__RETRIES__ATTEMPTS=5
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__TIMEOUTS__CONNECT_S=2.0
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__TIMEOUTS__OPERATION_S=30.0
export AUTO_WORKFLOW_CONNECTORS_ADLS2_DEFAULT__JSON='{"connection_string":"..."}'

Operations

  • upload_bytes(container, path, data, content_type=None, metadata=None, overwrite=True, chunk_size=None, timeout=None) -> etag
  • download_bytes(container, path, start=None, end=None, timeout=None) -> bytes
  • download_stream(container, path, chunk_size=None, timeout=None) -> Iterator[bytes]
  • list_paths(container, prefix=None, recursive=True, timeout=None) -> Iterator[dict]
  • exists(container, path, timeout=None) -> bool
  • delete_path(container, path, recursive=False, timeout=None) -> None
  • make_dirs(container, path, exist_ok=True, timeout=None) -> None
  • create_container(container, exist_ok=True, timeout=None) -> None

Notes: - content_type uses Azure Blob ContentSettings under the hood when available. - Errors map to project exceptions: AuthError, NotFoundError, TimeoutError, TransientError, PermanentError with status-aware mapping for HttpResponseError. - Imports are lazy; missing extras yield an informative ImportError.

Example flow

See examples/adls_csv_flow.py for a CSV roundtrip flow that creates a container, writes a CSV, reads it back, and cleans up.