Architecture & Pipeline Overview¶
The Cervical Cytology AI project is an end-to-end framework. Rather than a simple toy application, it simulates a production medical AI pipeline, handling raw whole-slide Equivalent images (WSI), segmenting individual cells, classifying the cells to ascertain slide-level risk, and finally reporting the outcome.
The 4-Stage Pipeline¶
1. Segmentation (Cellpose)¶
Extracts individual cells from original microscope images.
- Tool:
Cellpose(cyto2model). - Function: Robust to overlapping cells, cytoplasm artifacts, and various staining techniques.
- Output: Segmentation masks and individual bounding-box cell crops, ready for classification.
2. Classification (MobileViT)¶
A lightweight Vision Transformer (MobileViT) is assigned to classify each segmented cell into one of 5 categories:
- Superficial-Intermediate (Normal)
- Parabasal (Normal)
- Metaplastic (Benign/Reactive)
- Koilocytotic (Low Grade Lesion)
- Dyskeratotic (High Grade Lesion)
Why MobileViT-S?
We use MobileViT-S (~5.6M params) for efficient Edge AI deployment, allowing near real-time processing and the possibility of running on constrained hospital hardware. Read more in our Research Perspective.
3. Aggregation Logic¶
The ClinicalAggregator (found in src/aggregate_adaptive.py) analyzes the distribution of cell types across a slide.
- Role: Rather than reporting on single cells, it groups all cell predictions for the slide (or Field of View).
- Logic: It applies thresholds to determine if a slide is "Negative for Intraepithelial Lesion or Malignancy" (NILM), "Low-Grade Squamous Intraepithelial Lesion" (LSIL), or "High-Grade Squamous Intraepithelial Lesion" (HSIL).
4. Clinical Reporting¶
Translates JSON findings into structured PDF reports.
- Modules:
src/report_gen.py - Output Types:
- JSON Reports: Machine-readable statistics of the AI's findings.
- PDF Reports: A completely formatted, multi-section clinical document including Summary, Specimen Adequacy, Quantitative Analysis (color-coded by severity), and mandatory Disclaimers suitable for simulated clinical workflows.
Check the
/data/reports/directories (e.g.data/reports/jsonanddata/reports/pdf/) for sample outputs demonstrating this reporting capability.
Folder Organization¶
├── data/ # Datasets, reports, intermediate processing data
├── docs/ # Markdown documentation and Github Pages files
├── outputs/ # Output from analysis
├── notebooks/ # Training notebooks and experiments
├── sandbox/ # Test lab
├── scripts/ # Executable helper scripts and CLI testing tools
├── src/ # Core AI Pipeline
│ ├── config.py # Global Configuration and Data Contracts
│ ├── aggregate_adaptive.py # Aggregation engine
│ ├── report_gen.py # PDF Generation Engine
│ └── ... # Segmentation and inference
└── weights/ # Trained PyTorch state dicts (.pth)