· Valenx Press · 6 min read
Engineering Manager First 90 Days: Recovering from a Toxic AWS Team Culture
Engineering Manager First 90 Days: Recovering from a Toxic AWS Team Culture
TL;DR
The first 90 days must be spent dismantling the toxic patterns, establishing decisive processes, and aligning the team with measurable outcomes. Anything less is a temporary band‑aid that will collapse under the weight of AWS’s scale. If you cannot neutralize the cultural poison by day 60, the only viable path is to re‑staff the team.
Who This Is For
This guide is for engineers who have just accepted an Engineering Manager role overseeing a mid‑size AWS service (10‑15 engineers) that has been flagged for chronic blame‑shifting, undocumented hand‑offs, and senior‑staff attrition exceeding 30 % in the last six months. You likely earned a base of $170,000‑$190,000, have two prior people‑management cycles, and are being asked to turn the group around while keeping the service’s SLA above 99.9 %.
How should an Engineering Manager assess a toxic AWS team in the first 30 days?
The judgment is that a rapid, data‑driven audit of communication artifacts beats informal “pulse checks” every time. In a Q1 debrief, the senior director asked why the on‑call rotation had three incidents of “no‑owner” alerts in the past month; the answer was a missing run‑book page that had never been reviewed. I spent the first ten days pulling JIRA tickets, CloudWatch logs, and Slack threads for the last 90 days, then mapped ownership, latency, and escalation frequency.
The resulting heat map highlighted three engineers who were repeatedly bypassed for critical decisions—a clear sign of hidden power structures. Not “talking to the team” but “triangulating signals” revealed the true fault lines. The audit produced a one‑page “cultural defect matrix” that became the baseline for all subsequent interventions.
What signals indicate that the culture is beyond repair versus fixable?
The judgment is that a pattern of repeated, undocumented “quick fixes” signals irredeemable decay, whereas isolated incidents of miscommunication are fixable. During the second‑week HC meeting, the hiring manager pushed back on the notion that the team could survive without senior staff because the burn‑out rate had already forced two senior engineers out in 45 days.
The discussion surfaced a critical metric: the defect‑reopen rate had risen from 12 % to 28 % after the last leadership change, and the same three engineers were still the bottleneck. Not “a few bad days”, but “systemic knowledge hoarding” was the root cause. When senior staff turnover exceeds 20 % in a quarter and the remaining engineers cannot articulate the product’s core metrics, the culture is beyond repair; the only remedy is a targeted off‑boarding and fresh hiring plan.
Which interventions deliver measurable improvement by day 60?
The judgment is that transparent sprint‑review rituals and a calibrated “ownership charter” produce the first quantifiable gains. In a day‑45 debrief, the product owner complained that the feature flag rollout had stalled because “no one owned the rollback plan”. I introduced a mandatory ownership charter for each epic, signed by an engineer, a QA lead, and a reliability owner.
Within two weeks, the average time‑to‑resolution for production incidents fell from 4.3 hours to 2.1 hours, and the defect‑reopen rate dropped to 15 %. Not “more meetings”, but “clear, enforceable contracts” between roles made the difference. The key was to tie the charter to the team’s performance bonus, which was adjusted from a flat $10,000 to a variable $2,500‑$5,000 based on SLA compliance, sharpening accountability.
📖 Related: Negotiating MLE Offers: Equity vs Cash at Amazon Levels
How to communicate a culture reset without triggering political backlash?
The judgment is that framing the reset as “risk mitigation for the service” neutralizes senior‑staff defensiveness more effectively than “culture overhaul”. In a day‑55 one‑on‑one with the principal engineer, I stated, “Our incident‑response metrics are exposing a risk that could cost us $250,000 in downtime penalties; we need a unified process to mitigate that.” The engineer responded positively, noting that his team’s recent “shadow deployments” were the source of undocumented failures.
Not “telling them they’re toxic”, but “presenting the business impact” opened the door for collaboration. The communication plan included a three‑stage rollout: (1) data release, (2) joint solution design, and (3) formal policy adoption. Each stage was documented in Confluence and signed off by the VP of Engineering, preventing rumor‑driven resistance.
What compensation adjustments matter when rescuing a team?
The judgment is that targeted equity refreshes for high‑impact engineers outweigh broad salary bumps in a high‑cost environment like AWS. In a day‑70 compensation review, the finance lead offered a $5,000 base increase to all engineers but reserved a 0.03 % equity grant for the two engineers who owned the most critical incidents.
The equity grant translated to an estimated $12,000 annualized value at the current market cap, which proved a stronger retention lever than the base raise. Not “equal raises across the board”, but “strategic equity stakes” aligned personal upside with service reliability. The final package combined a $3,500 base increase, a $7,500 signing bonus for the new senior hire, and the equity grant, creating a compensation mix that emphasized performance‑linked rewards.
Preparation Checklist
- Review the last 90 days of JIRA, CloudWatch, and Slack data to build a defect‑ownership matrix.
- Conduct a confidential “toxicity pulse” survey with a single open‑ended question to capture unfiltered sentiment.
- Draft an ownership charter template and circulate it for feedback before the first sprint planning.
- Align performance bonuses with SLA metrics; set clear thresholds for variable pay.
- Schedule a one‑on‑one with each senior engineer to surface hidden blockers and negotiate equity adjustments.
- Work through a structured preparation system (the PM Interview Playbook covers culture‑diagnosis frameworks with real debrief examples) to ensure you have concrete talking points for leadership.
- Prepare a risk‑impact slide that quantifies downtime costs in dollar terms for each critical service.
Mistakes to Avoid
BAD: Relying on “team‑building” outings to fix deep‑rooted process gaps. GOOD: Deploying transparent ownership charters that tie directly to measurable incident metrics. BAD: Offering a blanket 5 % salary increase without addressing the equity imbalance that drives senior‑staff attrition. GOOD: Targeting equity grants to the engineers who own the most risky production pathways, thereby aligning incentives with reliability. BAD: Ignoring the data from the 90‑day audit and proceeding with gut‑feel decisions about staffing. GOOD: Using the defect‑ownership matrix to identify bottlenecks, then reassigning or hiring to eliminate knowledge silos.
FAQ
What is the first concrete step to prove the team’s cultural issues? Start by extracting a 90‑day artifact log (JIRA, CloudWatch, Slack) and mapping ownership gaps; the resulting matrix is the non‑negotiable evidence base for any change initiative.
How do I keep senior engineers from sabotaging the reset? Present the change as a risk‑mitigation requirement tied to measurable downtime costs; this reframes the agenda from “culture policing” to “business protection” and reduces defensive pushback.
When should I consider off‑boarding versus rebuilding the team? If senior‑staff turnover exceeds 20 % in a quarter and the defect‑reopen rate remains above 25 % despite ownership charters, the culture is likely beyond repair and a strategic off‑boarding plan should be executed.amazon.com/dp/B0GWWJQ2S3).