Credits & Positioning
Attribution to the leaderboard solutions and the public-notebook lineage
This work builds on the ISIC-2024 community. Reused components and original contributions are stated explicitly, with attribution.
The leaderboard champions
This study does not beat the unconstrained private leaderboard, and does not attempt to: the two ingredients the top teams relied on (external data, synthetic positives) are banned here. Their results are the benchmark the constrained claim is positioned against.
Private pAUC 0.17264 (post-competition reports cite ~0.1755 for the best unconstrained configuration). An ensemble of EVA-02 + EdgeNeXt image models fused with a GBDT tabular stack. The solution used external ISIC-archive dermoscopy data and ~30,000 Stable-Diffusion-synthesized malignant lesions, both banned here. Their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC — a quantitative bound on synthetic augmentation for a 393-positive task.
- 2nd —
uchiyama33— image + tabular ensemble; the source of the EMA + mixup recipe adopted for the small backbone. - 3rd —
kyohei-123— image/tabular blend.
Leaderboard comparison
| Solution | External data? | Synthetic data? | pAUC |
|---|---|---|---|
| 1st — Ilya Novoselskiy (EVA-02 + EdgeNeXt + GBDT) | Yes | Yes (~30k) | 0.17264 |
| 2nd — uchiyama33 (image + tabular ensemble) | Yes | Yes | — |
| 3rd — kyohei-123 (image + tabular blend) | Yes | Yes | — |
| Ours (single-dataset, no external, no synthetic) | No | No | CV 0.17376 |
Champions used external ISIC-archive dermoscopy and synthetic positives, both banned here; their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC. Ours is an out-of-fold CV number, not private LB.
The public-notebook lineage
The tabular feature engineering and overall recipe descend from public Kaggle notebooks, reused with attribution:
| Source | Reused / learned |
|---|---|
| greysky — “only tabular” | LightGBM hyperparameters and the tabular-first thesis; the lgb_bag params are greysky-lineage, and the bagging + undersampling structure follows this work. |
| snnclsr | Tabular feature-engineering patterns and CV structure. |
| motono0223 | Image-baseline training recipe (small backbones, undersampling). |
| andreasbis — “CNN preds as features” | The stacking idea: feed an image model’s OOF probability into the GBDT as a feature. |
| richolson | Additional baseline and ensembling practice. |
The official ISIC-2024 scorer (src/metric_official.py, © 2024 N. R. Kurtansky, MSKCC) is vendored verbatim and attributed, used only to verify the metric implementation.
Reused vs. original
- The tabular-first thesis and LightGBM hyperparameter lineage (greysky).
- The CNN-preds-as-a-feature stacking idea (andreasbis).
- The EMA + mixup small-backbone recipe (2nd/4th-place public methodology).
- The official pAUC scorer (Kurtansky / MSKCC), vendored to verify the implementation.
- The efficiency frontier as the deliverable. The task is reframed from one private-LB number to a quality-vs-cost Pareto frontier across params, FLOPs, and CPU latency; a model counts only if it earns its cost, and every reported model is one measured point.
- An audited no-external / no-synthetic discipline. A locked scope (
docs/DECISIONS.md), acv-guardianwith veto power, and a hard no-leak guarantee make the constraint auditable rather than asserted. - The image-space ugly-duckling. The tabular ugly-duckling idea extended into the image expert (a within-patient image-deviation signal), tested and reported despite not improving the score.
- A complete negative-results catalogue. Meta-stacker, learned MoE gate, image embeddings, heavy transformers, over-strong EMA — each logged with its number. The finding that complexity consistently reduces the score at 393 positives is itself a contribution.
Positioning
- SOTA within the no-external-data / no-synthetic / single-dataset class, on a transparent efficiency frontier with leak-audited CV.
- Not a claim to beat the champions’ unconstrained ~0.1755, which used the banned ingredients and is impossible to match by construction.
- The best constrained point is OOF pAUC 0.17376 at 15.8 M params / 2.46 GFLOPs / 61 ms CPU; the private-test projection is that minus ~0.01–0.02 (see Results).
Continue to References →