Case Studies: Troubleshooting and Pitfalls
Quick Navigation
- Learn Basics: Simple Feature Specs - See good examples first
- Process Help: Process Guide - Avoid pitfalls with systematic approach
- Better Prompting: Prompting Strategies - Communicate more effectively
- Execution Issues: Execution Guide - Fix implementation problems
This section documents common mistakes, failed approaches, and lessons learned from real-world spec-driven development experiences. Learning from these pitfalls can help you avoid similar issues and recover when problems arise.
Common Pitfalls by Phase
Requirements Phase Pitfalls
Pitfall 1: Vague or Ambiguous Requirements
What Went Wrong: A team specified a requirement as "The system should be fast and user-friendly." This led to disagreements during implementation about what constituted acceptable performance and usability.
Example of Poor Requirement:
### Requirement 1
**User Story:** As a user, I want the application to be fast, so that I have a good experience.
#### Acceptance Criteria
1. WHEN using the application THEN it should be fast
2. WHEN navigating THEN it should be responsive
What Should Have Been Done:
### Requirement 1
**User Story:** As a user, I want page loads to complete quickly, so that I can accomplish my tasks efficiently.
#### Acceptance Criteria
1. WHEN loading the main dashboard THEN the page SHALL render within 2 seconds
2. WHEN clicking navigation links THEN the new page SHALL load within 1.5 seconds
3. WHEN submitting forms THEN the system SHALL provide feedback within 500ms
4. IF network conditions are poor THEN the system SHALL show loading indicators after 1 second
Recovery Strategy:
- Stop implementation and return to requirements clarification
- Define specific, measurable criteria for all subjective terms
- Get stakeholder agreement on concrete metrics
- Update the requirements document before proceeding
Pitfall 2: Missing Edge Cases and Error Scenarios
What Went Wrong: A user authentication system was specified without considering password reset, account lockout, or concurrent login scenarios. This led to security vulnerabilities and poor user experience.
Example of Incomplete Requirements:
### Requirement 1
**User Story:** As a user, I want to log in with email and password, so that I can access my account.
#### Acceptance Criteria
1. WHEN providing correct credentials THEN the system SHALL authenticate the user
2. WHEN providing incorrect credentials THEN the system SHALL show an error
What Should Have Been Done:
### Requirement 1
**User Story:** As a user, I want to log in securely with email and password, so that I can access my account while maintaining security.
#### Acceptance Criteria
1. WHEN providing correct credentials THEN the system SHALL authenticate and create a session
2. WHEN providing incorrect credentials THEN the system SHALL show a generic error message
3. WHEN failing login 5 times THEN the system SHALL temporarily lock the account for 15 minutes
4. WHEN already logged in elsewhere THEN the system SHALL handle concurrent sessions appropriately
5. IF the account is locked THEN the system SHALL provide password reset options
6. WHEN the session expires THEN the system SHALL require re-authentication
Recovery Strategy:
- Conduct a systematic review of all failure scenarios
- Consider the "unhappy path" for every user story
- Add security and edge case requirements
- Review with security experts if handling sensitive data
Pitfall 3: Technology-Specific Requirements
What Went Wrong: Requirements specified "The system must use React and Node.js" instead of focusing on functional needs. This limited design flexibility and made the spec less reusable.
Example of Technology-Coupled Requirements:
### Requirement 1
**User Story:** As a developer, I want to use React for the frontend, so that the UI is interactive.
#### Acceptance Criteria
1. WHEN building the UI THEN it SHALL use React components
2. WHEN handling state THEN it SHALL use Redux
What Should Have Been Done:
### Requirement 1
**User Story:** As a user, I want an interactive web interface, so that I can efficiently manage my data.
#### Acceptance Criteria
1. WHEN interacting with forms THEN changes SHALL be reflected immediately without page refresh
2. WHEN data updates THEN the interface SHALL update automatically
3. WHEN using the interface THEN it SHALL work on modern web browsers
4. IF JavaScript is disabled THEN core functionality SHALL still be accessible
Recovery Strategy:
- Separate functional requirements from implementation choices
- Move technology decisions to the design phase
- Focus requirements on user value and business outcomes
- Allow design phase to evaluate technology options
Design Phase Pitfalls
Pitfall 4: Over-Engineering from the Start
What Went Wrong: A simple content management feature was designed with microservices, event sourcing, and complex caching layers before understanding actual usage patterns.
Example of Over-Engineered Design:
## Architecture
The content management system will use:
- 5 microservices with separate databases
- Event sourcing for all data changes
- Redis cluster for distributed caching
- Message queues for all inter-service communication
- Elasticsearch for content search
What Should Have Been Done:
## Architecture
The content management system will start with:
- Single service with clear module boundaries
- Traditional database with proper indexing
- Simple caching for frequently accessed content
- Direct API calls between modules
- Database full-text search initially
## Future Scaling Considerations
- Module boundaries designed to support future service extraction
- Database schema designed to support event sourcing if needed
- Caching layer abstracted to support distributed caching
- API design supports future microservices architecture
Recovery Strategy:
- Start with the simplest design that meets requirements
- Design for future scalability without implementing it initially
- Plan clear upgrade paths for when complexity is needed
- Focus on solving current problems, not hypothetical future ones
Pitfall 5: Insufficient Error Handling Design
What Went Wrong: A payment processing system design focused on the happy path but didn't adequately plan for network failures, timeout scenarios, or partial payment states.
Example of Incomplete Error Handling:
## Payment Processing Flow
1. Validate payment information
2. Charge payment method
3. Update order status
4. Send confirmation email
What Should Have Been Done:
## Payment Processing Flow
### Happy Path
1. Validate payment information
2. Charge payment method
3. Update order status
4. Send confirmation email
### Error Scenarios
- **Validation Failure**: Return specific field errors, log attempt
- **Payment Declined**: Store attempt, offer alternative payment methods
- **Network Timeout**: Implement retry with exponential backoff
- **Partial Charge**: Implement idempotency keys, reconciliation process
- **Database Failure**: Queue status updates, implement eventual consistency
- **Email Failure**: Queue email for retry, don't fail the payment
### Recovery Mechanisms
- Automatic retry for transient failures
- Manual reconciliation tools for payment discrepancies
- Customer service tools for payment issue resolution
- Monitoring and alerting for payment system health
Recovery Strategy:
- Map out all possible failure points in the system
- Design specific handling for each type of failure
- Implement monitoring and alerting for error conditions
- Create manual recovery procedures for complex failures
Pitfall 6: Ignoring Non-Functional Requirements
What Went Wrong: A data processing system was designed without considering performance, security, or scalability requirements, leading to production issues.
Example of Missing Non-Functional Considerations:
## Data Processing Design
The system will:
- Read data from CSV files
- Transform data according to business rules
- Store results in database
What Should Have Been Done:
## Data Processing Design
### Functional Design
- Read data from CSV files with configurable batch sizes
- Transform data using pluggable business rule engine
- Store results with transaction management
### Non-Functional Design
- **Performance**: Process 10,000 records per minute minimum
- **Scalability**: Support horizontal scaling for larger datasets
- **Security**: Encrypt data at rest and in transit
- **Reliability**: Implement checkpointing for recovery from failures
- **Monitoring**: Track processing metrics and error rates
- **Maintainability**: Support hot-swapping of business rules
Recovery Strategy:
- Review requirements for implicit non-functional needs
- Add performance, security, and scalability considerations
- Design monitoring and observability from the start
- Plan for operational concerns like deployment and maintenance
Tasks Phase Pitfalls
Pitfall 7: Tasks Too Large or Vague
What Went Wrong: Implementation tasks were defined as "Implement user management" and "Build the API," leading to unclear progress tracking and difficulty estimating work.
Example of Poor Task Definition:
- [ ] 1. Implement user management
- Build all user-related functionality
- _Requirements: 1.1, 1.2, 1.3, 2.1, 2.2_
- [ ] 2. Build the API
- Create REST endpoints for all features
- _Requirements: 3.1, 3.2, 4.1_
What Should Have Been Done:
- [ ] 1. Create user data model and validation
- Implement User interface with TypeScript types
- Create email validation with regex pattern
- Add password strength validation (8+ chars, mixed case, numbers)
- Write unit tests for validation functions
- _Requirements: 1.1, 1.2_
- [ ] 2. Implement user registration endpoint
- Create POST /api/users endpoint with request validation
- Add duplicate email checking with appropriate error response
- Implement password hashing using bcrypt
- Write integration tests for registration flow
- _Requirements: 1.1, 1.3_
- [ ] 3. Build user authentication endpoint
- Create POST /api/auth/login endpoint
- Implement credential verification and JWT token generation
- Add rate limiting for login attempts
- Write integration tests for authentication flow
- _Requirements: 2.1, 2.2_
Recovery Strategy:
- Break large tasks into specific, testable units
- Each task should be completable in 1-2 days maximum
- Include specific deliverables and acceptance criteria
- Reference specific requirements for each task
Pitfall 8: Missing Dependencies and Sequencing
What Went Wrong: Tasks were defined without considering dependencies, leading to blocked work and inefficient development flow.
Example of Poor Task Sequencing:
- [ ] 1. Build user interface components
- [ ] 2. Implement API endpoints
- [ ] 3. Create database schema
- [ ] 4. Set up authentication middleware
What Should Have Been Done:
- [ ] 1. Set up project infrastructure
- Create database schema and migrations
- Set up development environment and dependencies
- Configure testing framework
- _Requirements: Foundation for all other tasks_
- [ ] 2. Implement core data models
- Create User model with validation
- Implement database repository layer
- Write unit tests for data models
- _Requirements: 1.1, 1.2_
- [ ] 3. Build authentication services
- Implement password hashing and verification
- Create JWT token generation and validation
- Write unit tests for authentication logic
- _Requirements: 2.1, 2.2_
- [ ] 4. Create API endpoints
- Build user registration endpoint using authentication services
- Implement login endpoint with token generation
- Add authentication middleware for protected routes
- Write integration tests for complete API flows
- _Requirements: 1.1, 2.1, 3.1_
- [ ] 5. Build user interface components
- Create registration form with validation
- Implement login form with error handling
- Add authenticated user dashboard
- Write component tests and user interaction tests
- _Requirements: 3.2, 3.3_
Recovery Strategy:
- Map out dependencies between tasks
- Sequence tasks so that each builds on completed work
- Identify critical path items that block other work
- Consider parallel work streams where possible
Pitfall 9: Insufficient Testing Strategy
What Went Wrong: Tasks focused only on feature implementation without adequate testing, leading to bugs discovered late in development.
Example of Testing-Light Tasks:
- [ ] 1. Implement user registration
- Create registration form
- Add backend validation
- Store user in database
- _Requirements: 1.1_
- [ ] 2. Add user login
- Create login form
- Verify credentials
- Create user session
- _Requirements: 2.1_
What Should Have Been Done:
- [ ] 1. Implement user registration with comprehensive testing
- Create User model with validation rules
- Write unit tests for User model validation edge cases
- Implement registration API endpoint with error handling
- Write integration tests for registration flow including error scenarios
- Create registration form with client-side validation
- Write end-to-end tests for complete registration user journey
- _Requirements: 1.1_
- [ ] 2. Add user login with security testing
- Implement credential verification with secure password comparison
- Write unit tests for authentication logic including timing attacks
- Create login API endpoint with rate limiting
- Write integration tests for login flow including brute force scenarios
- Build login form with proper error handling
- Write end-to-end tests for login user journey and security measures
- _Requirements: 2.1_
Recovery Strategy:
- Add testing requirements to every implementation task
- Include unit, integration, and end-to-end testing
- Consider security testing for sensitive functionality
- Plan for both positive and negative test scenarios
Recovery Strategies for Common Problems
When Requirements Are Unclear Mid-Implementation
Symptoms:
- Developers asking frequent clarification questions
- Implementation decisions being made without stakeholder input
- Features being built that don't match user expectations
Recovery Steps:
- Stop Implementation: Pause coding work to prevent building the wrong thing
- Document Assumptions: List all assumptions being made about unclear requirements
- Stakeholder Review: Schedule immediate review with business stakeholders
- Clarify and Update: Update requirements document with specific, measurable criteria
- Impact Assessment: Evaluate what work needs to be redone
- Resume with Clarity: Continue implementation only after requirements are clear
When Design Doesn't Support Requirements
Symptoms:
- Implementation tasks seem impossible or overly complex
- Performance requirements can't be met with current design
- Security or scalability concerns emerge during implementation
Recovery Steps:
- Identify Root Cause: Determine which requirements the design fails to support
- Design Review: Conduct thorough review of design decisions
- Alternative Evaluation: Research alternative architectural approaches
- Stakeholder Communication: Explain trade-offs and get input on priorities
- Design Revision: Update design document with new approach
- Task Adjustment: Revise implementation tasks to match new design
When Implementation Tasks Are Blocked
Symptoms:
- Tasks can't be started due to missing dependencies
- Work is proceeding in wrong order
- Team members are waiting for others to complete prerequisite work
Recovery Steps:
- Dependency Mapping: Create visual map of all task dependencies
- Critical Path Analysis: Identify which tasks are blocking the most other work
- Parallel Work Identification: Find tasks that can be done simultaneously
- Task Resequencing: Reorder tasks to optimize workflow
- Resource Reallocation: Assign team members to unblocked work
- Regular Check-ins: Implement daily standups to catch blocking issues early
When Quality Issues Emerge Late
Symptoms:
- Bugs discovered during integration testing
- Performance problems in production-like environments
- Security vulnerabilities found during review
Recovery Steps:
- Issue Triage: Categorize problems by severity and impact
- Root Cause Analysis: Determine why issues weren't caught earlier
- Testing Gap Analysis: Identify what testing was missing
- Process Improvement: Add missing testing types to future tasks
- Immediate Fixes: Address critical issues blocking progress
- Prevention Planning: Update spec process to prevent similar issues
Lessons Learned from Failed Approaches
Case Study 1: The Over-Specified Spec
Background: A team created a 200-page specification document that attempted to define every possible detail of a content management system before any implementation began.
What Went Wrong: - Specification took 3 months to write - Requirements changed during the long specification phase - Implementation revealed many specification assumptions were wrong - Team spent more time updating documentation than building features
Key Lessons: - Specifications should be detailed enough to guide implementation, not replace thinking - Start with core functionality and iterate - Validate assumptions with prototypes before full specification - Keep specifications living documents that evolve with understanding
Case Study 2: The Technology-First Design
Background: A team decided to use microservices, event sourcing, and GraphQL for a simple inventory management system because these were "modern" technologies.
What Went Wrong:
- Development time increased 3x due to complexity
- Simple features required changes across multiple services
- Debugging became extremely difficult
- Team spent more time on infrastructure than business logic
Key Lessons:
- Choose technology based on requirements, not trends
- Start simple and add complexity only when needed
- Consider team expertise when making technology choices
- Focus on solving business problems, not showcasing technology
Case Study 3: The Missing Monitoring Spec
Background: A data processing pipeline was thoroughly specified for functionality but had no monitoring, logging, or observability requirements.
What Went Wrong:
- Production issues were impossible to debug
- No visibility into system performance or health
- Customer issues couldn't be traced to root causes
- System reliability was poor due to lack of operational insight
Key Lessons:
- Operational requirements are as important as functional ones
- Monitoring and observability should be specified from the start
- Consider the full lifecycle of the system, not just initial functionality
- Include operational runbooks and troubleshooting procedures
Prevention Strategies
Requirements Phase Prevention
- Use Concrete Examples: Always include specific examples of expected behavior
- Define Acceptance Tests: Write testable acceptance criteria for every requirement
- Consider Edge Cases: Systematically think through error scenarios and boundary conditions
- Stakeholder Review: Get explicit approval from business stakeholders before proceeding
- Prototype Validation: Build small prototypes to validate assumptions
Design Phase Prevention
- Start Simple: Begin with the simplest design that meets requirements
- Plan for Evolution: Design for future needs without implementing them initially
- Consider Operations: Include monitoring, logging, and maintenance in design
- Review Trade-offs: Explicitly document design decisions and their trade-offs
- Validate with Implementation: Build proof-of-concept for complex design decisions
Tasks Phase Prevention
- Right-Size Tasks: Each task should be completable in 1-2 days
- Include Testing: Every implementation task should include corresponding tests
- Map Dependencies: Understand and document task dependencies
- Plan Integration: Include tasks for integrating components together
- Consider Deployment: Include tasks for deployment and operational concerns
Quick Reference: Warning Signs
Requirements Warning Signs
- Requirements use subjective terms without definition ("fast", "user-friendly")
- No error scenarios or edge cases considered
- Technology choices embedded in requirements
- Stakeholders haven't reviewed or approved requirements
Design Warning Signs
- Design is much more complex than requirements suggest
- No consideration of non-functional requirements
- No error handling or failure scenarios planned
- Design decisions not justified or documented
Tasks Warning Signs
- Tasks are too large (more than 2-3 days of work)
- No testing included in implementation tasks
- Dependencies between tasks not considered
- No integration or deployment tasks included
When to Start Over
Sometimes the best recovery strategy is to restart with lessons learned:
Consider Restarting When: - Fundamental misunderstanding of user needs - Technical approach is completely wrong - Spec has become too complex to follow - More time spent on fixes than forward progress
How to Restart Effectively: 1. Document lessons learned from failed attempt 2. Identify the root cause of failure 3. Start with simplified scope 4. Apply prevention strategies from the beginning