DATABRICKS lessons

databricks-L3 · DE Pro · Domain 2 — Data Ingestion

Data Ingestion (Formats and Sources)

Stand up ingestion that survives reality: an Auto Loader config across multiple formats, schema evolution with a rescue path, a streaming source, and disciplined trigger choice (availableNow vs continuous). Prove intake, don't recite it.

Not submitted
RV

Your mentor

Rao Venkataraman

active
Ingestion is the front door. If schema drift or a bad trigger choice gets through here, everything downstream pays. Wire Auto Loader, handle the rescue column, pick the trigger on purpose. I'll keep you honest about why.

The deliverable · 8 gates

Auto Loader (cloudFiles) configuration
Multi-format coverage (e.g. JSON + CSV/Parquet)
Schema evolution mode + rescued-data column
Streaming source definition
Trigger discipline (availableNow / processingTime) with rationale
Naming reconciliation note
Ingestion validation evidence
Short design explanation

Your work · artifact bundle

0 files
Empty workspace. Add files, upload a folder, or load a demo bundle to start.

Submit the bundle to the kernel. It grades the artifact gate-by-gate — the mentor never gives a verdict.

Kernel evaluation

PASS = ∏ γᵢ over 8 gates · any failed factor ⇒ FAIL
Recognition Filter
Naming Reconciliation
Auto Loader Config
Schema Evolution
Format Coverage
Streaming Source
Trigger Discipline
Evidence Completeness

World output

Intake Manifold

Locked until every gate passes. Nothing downstream can attach to an unverified layer.