Did you learn to swim without water? 🏊♂️ Building data solutions without real data is just as impossible!
Art by @basilonmypizza
For years, IT has relied on the Dev → Test → Preprod → Prod model. While this works for classic software, it falls short for data and AI. Test data rarely captures the real-world complexity created daily in fast-paced, unpredictable business processes. Only live data reveals the true patterns needed to build robust pipelines and train accurate models. 📊
The solution is simple: give data teams a safe, agile environment - the Lab - to work directly with live data without jeopardizing production. The Lab mirrors the “real” environment, the Fab, but is optimized for rapid experimentation. 🚀
In a Nutshell:
Lab:
- Enables quick onboarding of new data, libraries, and AI models 🔍.
- Think of it as the research phase—where experiments run, failures are expected, and learning is constant. 💡
Fab:
- Operates stable, scalable pipelines across the organization 🔒.
- Changes here undergo tight controls and 4-eye reviews for production-grade stability.
Both environments share the same underlying structure (imagine two identical Databricks workspaces) but have different access controls:
- Data Visibility: The Lab can view Fab data, but not vice versa 👀.
- Security: Both enforce fine-grained measures like row-level security and masking 🛡️.
- Consumption: Only the Fab serves production data products; Lab data remains isolated from live processes 🔄.
Once a pipeline or model proves successful in the Lab, it transitions to the Fab via CI/CD. Keeping Lab and Fab structurally similar (often differing only by schema name) makes this handoff seamless. With robust modeling practices (e.g., Data Vault), updates roll out continuously without breaking existing processes. 🔄
We follow the “Up or Out” principle: if a pipeline or model succeeds, it moves up to Fab 🚀; if not, it’s cleaned out and deleted ❌. And no production process depends on Lab data.
Lab & Fab offer the best of both worlds: fast, real-data experimentation coupled with a stable production environment. That’s how we unlock the true potential of data and AI - and that’s how I like my swimming! 🌊
How does your team balance innovation and stability? Let’s discuss!
Art by @basilonmypizza: https://lnkd.in/eF8FkWzN
More on the modern data and AI platform: https://lnkd.in/eunKKHgK
More on AI model deployment: https://lnkd.in/eFJedckv