Ford Motor Company — GCP · Vertex AI · Tekton · Airflow
95% reduction in onboarding time — weeks to minutes
The Problem
Onboarding a new data source at Ford was a deeply manual, months-long process. A data engineer had to gather full schema details from source teams, manually author data dictionary JSON files for tables with hundreds — sometimes thousands — of columns, write ETL scripts from scratch, provision GCS buckets, create BigQuery tables via Terraform, set up Airflow DAGs, configure CI/CD pipelines in Tekton, and validate everything before a single byte of data moved.
A simple source took a minimum of one month. Complex sources with large schemas or ambiguous metadata regularly stretched to two to four months. With dozens of ingestion requests in the backlog at any time, engineering capacity was perpetually saturated on work that was largely repetitive and high-effort but low-differentiation.
The Solution
We built a fully autonomous, multi-agent system deployed on Google Cloud Run. An operator fills in a lightweight intake form — source name, file format (CSV, fixed-width, TXT), connection details, and target dataset — and the agent takes it from there.
The system uses a hierarchical multi-agent architecture: an orchestrator agent receives the request and delegates to specialized subagents for schema inference, data dictionary generation, ETL script authoring, Terraform config generation, Airflow DAG creation, and infrastructure provisioning. Once all artifacts are generated and validated, the system raises a pull request for human review. After approval, it triggers the Tekton CI/CD pipeline, monitors build status, tracks the first BigQuery load, and sends a completion notification — all without manual intervention.
System Flow
Tech Stack
Key Engineering Challenges
The core architectural challenge was decomposing the ingestion workflow into discrete, independently reliable subagents. Each subagent needed to handle partial failures gracefully and feed structured outputs to the next stage without human intervention. We designed a DAG-style execution model where the orchestrator tracks state across agents and retries failed steps with context.
Integrating with Ford's internal Tekton pipelines required navigating complex org-level security policies, custom pipeline triggers, and non-standard build environments. The agent needed to not only trigger pipelines but monitor their multi-stage execution and surface meaningful status — not just pass/fail — back to the operator.
Some source tables had thousands of columns with ambiguous or undocumented types. Manually authoring Terraform schema files for these was the single biggest time sink in the old process. We built a schema inference subagent that samples source data, resolves type conflicts, infers nullability constraints, and generates the full Terraform HCL and data dictionary JSON automatically.