databricks-L3 · DE Pro · Domain 2 — Data Ingestion

Data Ingestion (Formats and Sources)

Stand up ingestion that survives reality: an Auto Loader config across multiple formats, schema evolution with a rescue path, a streaming source, and disciplined trigger choice (availableNow vs continuous). Prove intake, don't recite it.

Not submitted

Your mentor

Rao Venkataraman

active

Ingestion is the front door. If schema drift or a bad trigger choice gets through here, everything downstream pays. Wire Auto Loader, handle the rescue column, pick the trigger on purpose. I'll keep you honest about why.

The deliverable · 8 gates

▸Auto Loader (cloudFiles) configuration

▸Multi-format coverage (e.g. JSON + CSV/Parquet)

▸Schema evolution mode + rescued-data column

▸Streaming source definition

▸Trigger discipline (availableNow / processingTime) with rationale

▸Naming reconciliation note

▸Ingestion validation evidence

▸Short design explanation

Your work · artifact bundle

0 files

Empty workspace. Add files, upload a folder, or load a demo bundle to start.

Submit the bundle to the kernel. It grades the artifact gate-by-gate — the mentor never gives a verdict.

Kernel evaluation

PASS = ∏ γᵢ over 8 gates · any failed factor ⇒ FAIL

Recognition Filter

Naming Reconciliation

Auto Loader Config

Schema Evolution

Format Coverage

Streaming Source

Trigger Discipline

Evidence Completeness

World output

Intake Manifold

Locked until every gate passes. Nothing downstream can attach to an unverified layer.