Analysis Model Specification¶
Internal data model produced by codebase analysis. Output adapters consume this model to generate various formats.
Overview¶
The analysis model is the structured output of codebase analysis. It captures everything discovered about a codebase in a format-agnostic way that can be transformed into:
- Architecture documentation
- Coding context (AGENTS.md, CONTEXT.md)
- Product specifications
- C4 models (Structurizr DSL)
- Enterprise models (ArchiMate)
Model Structure¶
analysis_model:
# Metadata
meta:
project_name: string
description: string
repository: string
commit: string
analysis_date: string
analyzer: string
# Phase 1: Reconnaissance
reconnaissance:
documentation:
- location: string # File path
type: string # readme, api-doc, adr, inline, etc.
coverage: string # high, medium, low
last_updated: string # Date or "unknown"
accuracy: string # verified, unverified, outdated
# Phase 2: Technology Discovery
technologies:
languages:
- name: string
version: string
evidence: string[] # File paths where detected
primary: boolean # Is this the primary language?
frameworks:
- name: string
version: string
purpose: string # web, api, testing, etc.
evidence: string[]
libraries:
- name: string
version: string
category: string # utility, database, auth, etc.
evidence: string[]
infrastructure:
- name: string
type: string # database, cache, queue, storage, etc.
purpose: string
evidence: string[]
build_tools:
- name: string
config_file: string
scripts: object # Key build/test/deploy scripts
# Phase 3: Interface Discovery
interfaces:
apis:
- path: string
method: string # GET, POST, etc.
handler: string # File:line
auth: string # none, jwt, session, api_key, etc.
request_schema: object
response_schema: object
description: string
events:
- name: string
type: string # published, consumed, both
schema: object
publishers: string[] # Components that publish
consumers: string[] # Components that consume
evidence: string[]
integrations:
- name: string
direction: string # inbound, outbound, bidirectional
type: string # rest, graphql, grpc, webhook, etc.
auth: string
base_url: string
evidence: string[]
cli:
- command: string
description: string
handler: string # File:line
# Phase 4: Architecture Discovery
architecture:
components:
- name: string
type: string # service, module, library, handler, etc.
location: string # Directory or file
responsibilities: string[]
dependencies: string[] # Other component names
interfaces: string[] # API paths or event names exposed
layers:
- name: string # presentation, api, domain, data, etc.
components: string[] # Component names in this layer
boundaries: string # How layer boundaries are enforced
patterns:
- name: string # mvc, repository, factory, etc.
where_used: string[] # Locations where pattern is applied
evidence: string[]
structure:
entry_points: string[] # Main files, index files
config_locations: string[]
test_locations: string[]
# Phase 5: Data Discovery
data:
entities:
- name: string
location: string # Model/schema file
fields:
- name: string
type: string
constraints: string[]
relationships:
- target: string # Other entity name
type: string # one-to-one, one-to-many, many-to-many
foreign_key: string
storage: string # Table name, collection, etc.
flows:
- name: string
source: string # Entry point
transformations: string[] # Processing steps
destination: string # Where data ends up
data_types: string[] # Entity names involved
lifecycle:
- entity: string
create: string # Where/how created
read: string # Where/how read
update: string # Where/how updated
delete: string # Where/how deleted
retention: string # Policy if known
# Phase 6: Dependency Health
dependencies:
packages:
- name: string
current_version: string
latest_version: string
update_type: string # major, minor, patch, up-to-date
last_publish: string # Date
vulnerabilities:
- id: string
severity: string
description: string
license: string
deprecated: boolean
health_summary:
total: number
outdated: number
vulnerable: number
deprecated: number
unmaintained: number # No updates in 2+ years
# Phase 7: Error Handling
error_handling:
patterns:
- type: string # try-catch, error-boundary, middleware, etc.
location: string
coverage: string # Scope of what it handles
propagation:
- source: string # Where errors originate
handlers: string[] # Where they're caught
recovery: string # How recovered/handled
user_facing: boolean # Does it reach users?
gaps:
- location: string
risk: string # high, medium, low
description: string
recommendation: string
logging:
framework: string
levels: string[] # error, warn, info, debug
destinations: string[] # console, file, service
# Quality Indicators
quality:
documentation_coverage: string # high, medium, low
test_coverage: string # high, medium, low, none
type_safety: string # strong, partial, none
code_organization: string # excellent, good, fair, poor
# Recommendations
recommendations:
immediate:
- priority: string # critical, high, medium
category: string # security, architecture, maintainability
finding: string
recommendation: string
location: string
improvements:
- category: string
finding: string
recommendation: string
Model Sections by Analysis Phase¶
| Phase | Model Section | Description |
|---|---|---|
| 1: Reconnaissance | reconnaissance |
Documentation inventory |
| 2: Technology Stack | technologies |
Languages, frameworks, libraries, infra |
| 3: Interface Mapping | interfaces |
APIs, events, integrations, CLI |
| 4: Architecture Synthesis | architecture |
Components, layers, patterns |
| 5: Data Flow | data |
Entities, flows, lifecycle |
| 6: Dependency Health | dependencies |
Package health, vulnerabilities |
| 7: Error Handling | error_handling |
Error patterns, gaps |
| - | quality, recommendations |
Cross-cutting findings |
Using the Model¶
For Adapters¶
Each output adapter reads relevant sections of the model:
| Adapter | Primary Sections Used |
|---|---|
| architecture-docs | All sections |
| coding-context | technologies, architecture, interfaces, quality |
| product-spec | interfaces, data, architecture.components |
| structurizr | architecture, interfaces, technologies.infrastructure |
| archimate | All sections (maps to ArchiMate layers) |
Model Population¶
The analysis workflow populates the model incrementally:
Phase 1 → reconnaissance populated
Phase 2 → technologies populated
Phase 3 → interfaces populated
Phase 4 → architecture populated
Phase 5 → data populated
Phase 6 → dependencies populated
Phase 7 → error_handling populated
Final → quality, recommendations populated
Partial Analysis¶
Adapters should handle partially populated models gracefully: - Check if sections exist before accessing - Provide sensible defaults or "Not analyzed" markers - Document which phases are required for each adapter
Evidence Tracking¶
Most model elements include evidence fields pointing to source files. This enables:
- Traceability - Link findings back to code
- Verification - Human can check accuracy
- Updates - Know what to re-analyze when code changes
Evidence format: file/path:line or file/path for general references.
Extension Points¶
The model can be extended for specific needs:
# Custom extensions under 'extensions' key
extensions:
security:
# Security-specific analysis data
auth_mechanisms: [...]
sensitive_data: [...]
performance:
# Performance-specific data
hotspots: [...]
caching: [...]
Output adapters can define their own extension schemas.
Example: Minimal Model¶
analysis_model:
meta:
project_name: "acme-api"
description: "REST API for Acme Corp"
repository: "github.com/acme/api"
commit: "abc123"
analysis_date: "2024-01-15"
technologies:
languages:
- name: "TypeScript"
version: "5.0"
primary: true
evidence: ["package.json", "tsconfig.json"]
frameworks:
- name: "Express"
version: "4.18"
purpose: "web"
evidence: ["package.json", "src/app.ts"]
interfaces:
apis:
- path: "/api/users"
method: "GET"
handler: "src/routes/users.ts:15"
auth: "jwt"
architecture:
components:
- name: "UserController"
type: "handler"
location: "src/controllers/user.ts"
responsibilities: ["Handle user CRUD operations"]
dependencies: ["UserService", "AuthMiddleware"]
Validation¶
Before passing to adapters, validate:
- Required fields -
meta.project_name,meta.analysis_date - Reference integrity - Component dependencies reference existing components
- Evidence exists - File paths in evidence are valid
Adapters should validate their required sections on input.