Codebase Analysis Workflows¶

Procedures for comprehensive codebase analysis. Populates the analysis model for output adapters.

Analysis Workflow Overview¶

flowchart LR
    subgraph Setup
        S0[Phase 0: Setup]
    end

    subgraph Discovery
        S1[Phase 1: Reconnaissance]
        S2[Phase 2: Technology]
        S3[Phase 3: Interfaces]
        S4[Phase 4: Architecture]
        S5[Phase 5: Data Flow]
        S6[Phase 6: Dependencies]
        S7[Phase 7: Errors]
    end

    subgraph Output
        M[Analysis Model]
        A1[Adapter 1]
        A2[Adapter 2]
        AN[Adapter N]
    end

    S0 --> S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> M
    M --> A1
    M --> A2
    M --> AN

Phase 0: Setup¶

Goal: Select output adapters and establish preferences.

0.1 Select Output Adapters¶

What outputs do you need from this analysis?

☐ Architecture documentation (architecture-docs)
☐ Coding context for AI (coding-context)
☐ Product specification (product-spec)
☐ C4 model / Structurizr (structurizr)
☐ ArchiMate model (archimate)

Select one or more. Default: Architecture documentation

0.2 Diagram Format Preference¶

What diagram format do you prefer?

1. **Mermaid** (Recommended) - GitHub, GitLab, most markdown viewers
2. **ASCII** - Universal, no rendering needed
3. **PlantUML** - Feature-rich, requires renderer
4. **Excalidraw** - Hand-drawn style, VS Code extension, collaborative

Store preference in meta.preferences.diagram_format.

Note: Excalidraw outputs .excalidraw JSON files that can be edited in VS Code with the Excalidraw extension or exported to PNG/SVG for embedding.

0.3 Initialize Analysis Model¶

Create empty analysis model structure:

analysis_model:
  meta:
    project_name: ""
    analysis_date: "{today}"
    preferences:
      diagram_format: "mermaid"
      output_adapters: []

Output: Preferences captured, model initialized

Phase 1: Reconnaissance¶

Goal: High-level understanding and documentation inventory. Model Section: reconnaissance, meta

1.1 Project Identification¶

Capture project metadata: - Project name (from package.json, README, directory name) - Repository URL (from git remote) - Current commit hash - Description

Populate: meta.project_name, meta.repository, meta.commit, meta.description

1.2 Documentation Inventory¶

Find all existing documentation:

# Search patterns
README*.md, readme*.md
docs/, documentation/, wiki/
*.md in root
adr/, decisions/
api.yaml, openapi.yaml, swagger.*

Exclude AI agent configuration:

.ai-toolkit/, .aider/, .cursor/, .continue/
AGENTS.md, CLAUDE.md, .cursorrules, .aider*

Note: Exclude any AI toolkit submodule directory (may be named differently).

For each documentation file, capture: - location: File path - type: readme, api-doc, adr, architecture, guide - coverage: high, medium, low - last_updated: Git last modified date

Populate: reconnaissance.documentation[]

1.3 Project Structure¶

Identify: - Entry points (main files, index files) - Source directories (src/, lib/, app/) - Test locations (tests/, tests/, spec/) - Configuration files

Populate: architecture.structure

Phase 2: Technology Stack¶

Goal: Complete technology inventory. Model Section: technologies

2.1 Languages¶

For each language found:

Identify from file extensions and config files
Extract version from:
tsconfig.json, package.json (engines)
pyproject.toml, setup.py
go.mod, Cargo.toml, etc.
Mark primary language (most code)

Populate: technologies.languages[]

2.2 Frameworks¶

Scan package manifests for frameworks:

Category	Examples
Web	Express, FastAPI, Gin, Rails
Frontend	React, Vue, Angular, Svelte
Testing	Jest, pytest, Go testing
ORM	Prisma, SQLAlchemy, GORM

For each framework: - name, version - purpose: web, frontend, testing, database, etc. - evidence: Files where detected

Populate: technologies.frameworks[]

2.3 Libraries¶

Categorize dependencies:

Category	Examples
Utility	lodash, ramda
HTTP	axios, requests
Auth	passport, PyJWT
Validation	zod, pydantic
Logging	winston, structlog

Populate: technologies.libraries[]

2.4 Infrastructure¶

Identify from: - Docker files, docker-compose.yml - Infrastructure as code (Terraform, CloudFormation) - Configuration files (database URLs, cache configs) - Environment variable references

Type	Examples
Database	PostgreSQL, MongoDB, MySQL
Cache	Redis, Memcached
Queue	RabbitMQ, Kafka, SQS
Storage	S3, GCS, local filesystem

Populate: technologies.infrastructure[]

2.5 Build Tools¶

Document: - Package manager (npm, yarn, pip, go mod) - Build scripts (build, test, deploy commands) - CI/CD configuration

Populate: technologies.build_tools[]

Phase 3: Interface Discovery¶

Goal: Map all system boundaries. Model Section: interfaces

3.1 API Endpoints¶

Search for route definitions:

// Express patterns
app.get('/path', handler)
router.post('/path', handler)

// Decorators
@Get('/path')
@app.route('/path')

For each endpoint: - path, method - handler: File:line reference - auth: none, jwt, session, api_key - request_schema, response_schema (if discoverable)

Populate: interfaces.apis[]

3.2 Events/Messages¶

Find async communication:

// Publishers
eventBus.emit('event.name', payload)
queue.publish('queue-name', message)

// Consumers
@Subscribe('event.name')
queue.consume('queue-name', handler)

For each event: - name, type (published/consumed/both) - schema (if available) - publishers[], consumers[]

Populate: interfaces.events[]

3.3 External Integrations¶

Find third-party service calls:

// HTTP clients
axios.get('https://api.stripe.com/...')
requests.post('https://api.sendgrid.com/...')

// SDKs
new Stripe(apiKey)
twilio.messages.create(...)

For each integration: - name: Service name - direction: inbound, outbound, bidirectional - type: rest, graphql, sdk - auth: How authenticated

Populate: interfaces.integrations[]

3.4 CLI Interfaces¶

Find command-line entry points: - Main CLI files - Subcommands - Arguments and options

Populate: interfaces.cli[]

Phase 4: Architecture Synthesis¶

Goal: Understand component structure and patterns. Model Section: architecture

4.1 Component Identification¶

Identify logical components:

Type	Indicators
Service	Service classes, handlers
Module	Directory with index, init.py
Handler	Route handlers, controllers
Repository	Data access classes
Utility	Helper functions, utils

For each component: - name, type - location: Directory or file - responsibilities[]: What it does - dependencies[]: Other components it uses - interfaces[]: APIs/events it exposes

Populate: architecture.components[]

4.2 Layer Analysis¶

Identify architectural layers:

Layer	Typical Names
Presentation	routes, controllers, handlers
Application	services, use-cases
Domain	models, entities, domain
Data	repositories, db, data
Infrastructure	infra, external, integrations

For each layer: - name - components[]: Components in this layer - boundaries: How layer isolation is enforced

Populate: architecture.layers[]

4.3 Pattern Recognition¶

Identify common patterns:

Pattern	Indicators
MVC	Controllers, Models, Views
Repository	Repository classes with CRUD
Factory	create*, factory functions
Dependency Injection	Container, inject decorators
Event Sourcing	Event stores, apply methods

For each pattern: - name - where_used[]: Locations - evidence[]: File references

Populate: architecture.patterns[]

Phase 5: Data Flow¶

Goal: Trace data through the system. Model Section: data

5.1 Entity Discovery¶

Find data models:

# ORM models
class User(Model):
@Entity()
type User struct {}

# Schemas
UserSchema = z.object({...})

For each entity: - name - location: Model file - fields[]: Name, type, constraints - relationships[]: Foreign keys, associations - storage: Table/collection name

Populate: data.entities[]

5.2 Data Flow Tracing¶

For key data paths, trace:

Entry: Where data enters (API, event, file)
Validation: How validated
Transformation: How modified
Storage: Where persisted
Output: Where returned/emitted

Populate: data.flows[]

5.3 Data Lifecycle¶

For important entities, document:

create: How/where created
read: Query patterns
update: Modification paths
delete: Deletion approach (soft/hard)
retention: Time kept

Populate: data.lifecycle[]

Phase 6: Dependency Health¶

Goal: Assess package health and security. Model Section: dependencies

6.1 Package Inventory¶

List all dependencies from manifests: - package.json (npm/yarn) - requirements.txt, pyproject.toml (pip) - go.mod (Go) - Cargo.toml (Rust)

6.2 Version Analysis¶

For each package: - current_version: Installed version - latest_version: Latest available - update_type: major, minor, patch, up-to-date

6.3 Vulnerability Check¶

Check for known vulnerabilities: - CVE references - Security advisories - Severity ratings

Populate: dependencies.packages[].vulnerabilities[]

6.4 Maintenance Status¶

Assess health: - last_publish: When last updated - deprecated: Official deprecation - License type and restrictions

6.5 Health Summary¶

Aggregate: - Total packages - Outdated count - Vulnerable count - Deprecated count - Unmaintained count (2+ years stale)

Populate: dependencies.health_summary

Phase 7: Error Handling¶

Goal: Understand error patterns and gaps. Model Section: error_handling

7.1 Error Handling Patterns¶

Find error handling code:

// Try-catch
try { ... } catch (e) { ... }

// Error middleware
app.use((err, req, res, next) => ...)

// Error boundaries
class ErrorBoundary extends Component

For each pattern: - type: try-catch, middleware, boundary, etc. - location: File reference - coverage: What it handles

Populate: error_handling.patterns[]

7.2 Error Propagation¶

Trace how errors flow: - Where they originate - How they're caught/wrapped - What reaches clients

Populate: error_handling.propagation[]

7.3 Gaps Identification¶

Find unhandled scenarios: - Async operations without catch - Missing error responses - Silent failures

For each gap: - location - risk: high, medium, low - description - recommendation

Populate: error_handling.gaps[]

7.4 Logging Assessment¶

Document logging setup: - framework: Winston, Bunyan, etc. - levels[]: Which levels used - destinations[]: Console, file, service

Populate: error_handling.logging

Finalization¶

Quality Assessment¶

After all phases, assess:

documentation_coverage: Based on reconnaissance
test_coverage: Presence/quality of tests
type_safety: Static typing usage
code_organization: Structure quality

Populate: quality

Recommendations¶

Compile prioritized recommendations:

Immediate (critical/high): Security, stability issues
Improvements (medium/low): Architecture, maintainability

Populate: recommendations

Model Validation¶

Before passing to adapters:

meta.project_name populated
meta.analysis_date set
At least one phase completed
Evidence references are valid file paths

Verification Pattern¶

Throughout analysis, follow:

DISCOVER → CAPTURE → EVIDENCE → VERIFY

Discover: Find information in code
Capture: Add to analysis model
Evidence: Include file:line reference
Verify: Cross-check with documentation

This ensures traceability and accuracy.

Partial Analysis¶

For targeted analysis, run specific phases:

Request	Phases
"Check tech stack"	Phase 2 only
"Map APIs"	Phase 3 only
"Dependency health"	Phase 6 only

Adapters handle partial models gracefully.