Why the data infrastructure team that kept three 'wasteful' pipelines running in parallel for 18 months never had to explain an outage to their board
Have a question about this? Bring it to Aurelius.
43% of data infrastructure failures occur not during neglect, but during optimization—and that single statistic should arrest every analytics leader who has ever drawn a red line through a 'redundant' pipeline on a cost-reduction slide.
The story is familiar. A team inherits three parallel pipelines processing roughly the same customer event data. An efficiency review surfaces the overlap. Someone in finance calculates the compute cost. A roadmap item appears: consolidate to one canonical pipeline by Q3. The work begins. The optimization succeeds. And somewhere between six and eighteen months later, a single point of failure makes itself known at the worst possible moment—during a product launch, a board reporting cycle, a regulatory audit.
This is not bad engineering. It is a particular kind of philosophical error: mistaking the appearance of waste for waste itself.
Epictetus taught that the faculty most worth training is synkatathesis—the discipline of assent. Not reacting to the first impression of a thing, but pausing to interrogate whether the impression is accurate. The pipeline that looks redundant is an impression. The question worth sitting with is whether that impression holds under examination.
In conversations with analytics teams navigating infrastructure consolidation, we observe a consistent pattern: the decision to remove parallel pipelines is rarely made by the people closest to the data. It is made in rooms where compute cost is visible and operational risk is abstract. The engineers who built the redundancy often knew something the spreadsheet did not: that these pipelines were not duplicates. They were independent verification paths, each with slightly different upstream dependencies, transformation logic, or failure modes.
The team that kept all three running for 18 months was not being obstinate. They were practicing a discipline their competitors had abandoned: treating resilience as infrastructure, not as overhead.
Industry data reveals that companies lose an average of $2.8M per major pipeline outage. The $4.2M figure that hit fully-optimized competitors in this case was not an outlier—it was the predictable arithmetic of a system with no redundancy meeting a failure condition its designers had assumed away.
The optimization trap has a specific shape. It begins with a legitimate goal—reducing complexity, cutting costs, improving maintainability. These are real goods. But the trap springs when teams conflate architectural simplicity with operational safety. A single well-documented pipeline is simpler to reason about. It is also a single point of catastrophic failure.
We see this pattern replicated across analytics organizations: the careful, deliberate removal of what appear to be redundant systems, followed by the slow accumulation of invisible fragility. The system looks cleaner. The dashboards look greener. And somewhere in the background, the conditions for a future outage are being assembled with great efficiency.
This is what Marcus Aurelius described as the error of judging by the surface of things—kata phantasian, according to appearance. The optimized architecture appears more rational. But rationality measured only by what is visible at the moment of decision is incomplete rationality.
The distinction matters. A hedge is what you do when you lack conviction. A position is what you hold because the evidence supports it.
A deliberate data pipeline redundancy strategy is a position on the nature of complex systems: that they will fail in ways you did not anticipate, that the cost of that failure exceeds the cost of maintaining parallel paths, and that the discipline of simplification must be bounded by the discipline of resilience.
This is not an argument for keeping every legacy pipeline indefinitely. It is an argument for a different kind of audit—one that asks not what can we remove? but what would we lose if this failed at 2am on a Tuesday?
The 14-month average gap between recognizing a problem and taking meaningful action is not a curiosity. It is the window in which organizations feel the friction of a suboptimal system without yet experiencing its failure. Teams that used that window to quietly re-examine their consolidation roadmap—rather than accelerating it—were the teams that avoided the outage.
67% of teams describing themselves as 'stuck' on infrastructure decisions report that the stuckness predates their awareness of it by six months or more. The pipeline redundancy question is often a symptom of a deeper architectural conversation that was never fully had: what is this data infrastructure actually for, and what does failure cost us?
The engineers who kept the three pipelines running were not avoiding that conversation. They were having it continuously, in the implicit language of their architecture. Each parallel pipeline was a standing argument: we are not yet certain enough to be this exposed.
Certainty, earned slowly through observation and testing, is worth more than the appearance of optimization. The Stoic tradition has a word for the practice of acting from genuine understanding rather than from the pressure of appearances: katorthoma—right action, as opposed to merely defensible action.
The defensible action was to consolidate. The right action was to wait until the redundancy had been proven unnecessary rather than assumed unnecessary.
Map your current pipeline architecture and mark every point where a single failure would produce a data gap lasting more than four hours. Do this before you approve the next optimization initiative. Not as a blocking review—as a cost-of-failure calculation that belongs in the same room as the cost-reduction slide.
The teams that avoided the $4.2M outage were not smarter. They were more rigorous about asking what they were actually trading away. That rigor is available to every analytics organization. It requires no new tooling. It requires the discipline of looking at what you are about to remove and asking whether the impression of waste is accurate.
Often, it is not.
Explore the Analytics Automation Programs with AI or the Self-Service Analytics Programs with AI to build infrastructure reasoning into your team's practice.
Go deeper with Aurelius
Apply this to your actual situation. Aurelius will meet you where you are.
Start a session