Credits & Positioning

Attribution to the leaderboard solutions and the public-notebook lineage

This work builds on the ISIC-2024 community. Reused components and original contributions are stated explicitly, with attribution.

The leaderboard champions

This study does not beat the unconstrained private leaderboard, and does not attempt to: the two ingredients the top teams relied on (external data, synthetic positives) are banned here. Their results are the benchmark the constrained claim is positioned against.

Note1st place — Ilya Novoselskiy

Private pAUC 0.17264 (post-competition reports cite ~0.1755 for the best unconstrained configuration). An ensemble of EVA-02 + EdgeNeXt image models fused with a GBDT tabular stack. The solution used external ISIC-archive dermoscopy data and ~30,000 Stable-Diffusion-synthesized malignant lesions, both banned here. Their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC — a quantitative bound on synthetic augmentation for a 393-positive task.

Note2nd & 3rd place
  • 2nd — uchiyama33 — image + tabular ensemble; the source of the EMA + mixup recipe adopted for the small backbone.
  • 3rd — kyohei-123 — image/tabular blend.

Leaderboard comparison

Table 3. Honest comparison to top Kaggle solutions (champion private-LB pAUC; ours is leak-audited cross-validation).
Solution External data? Synthetic data? pAUC
1st — Ilya Novoselskiy (EVA-02 + EdgeNeXt + GBDT) Yes Yes (~30k) 0.17264
2nd — uchiyama33 (image + tabular ensemble) Yes Yes
3rd — kyohei-123 (image + tabular blend) Yes Yes
Ours (single-dataset, no external, no synthetic) No No CV 0.17376

Champions used external ISIC-archive dermoscopy and synthetic positives, both banned here; their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC. Ours is an out-of-fold CV number, not private LB.

The public-notebook lineage

The tabular feature engineering and overall recipe descend from public Kaggle notebooks, reused with attribution:

Source Reused / learned
greysky — “only tabular” LightGBM hyperparameters and the tabular-first thesis; the lgb_bag params are greysky-lineage, and the bagging + undersampling structure follows this work.
snnclsr Tabular feature-engineering patterns and CV structure.
motono0223 Image-baseline training recipe (small backbones, undersampling).
andreasbis — “CNN preds as features” The stacking idea: feed an image model’s OOF probability into the GBDT as a feature.
richolson Additional baseline and ensembling practice.

The official ISIC-2024 scorer (src/metric_official.py, © 2024 N. R. Kurtansky, MSKCC) is vendored verbatim and attributed, used only to verify the metric implementation.

Reused vs. original

TipReused (with attribution)
  • The tabular-first thesis and LightGBM hyperparameter lineage (greysky).
  • The CNN-preds-as-a-feature stacking idea (andreasbis).
  • The EMA + mixup small-backbone recipe (2nd/4th-place public methodology).
  • The official pAUC scorer (Kurtansky / MSKCC), vendored to verify the implementation.
ImportantOriginal contributions
  1. The efficiency frontier as the deliverable. The task is reframed from one private-LB number to a quality-vs-cost Pareto frontier across params, FLOPs, and CPU latency; a model counts only if it earns its cost, and every reported model is one measured point.
  2. An audited no-external / no-synthetic discipline. A locked scope (docs/DECISIONS.md), a cv-guardian with veto power, and a hard no-leak guarantee make the constraint auditable rather than asserted.
  3. The image-space ugly-duckling. The tabular ugly-duckling idea extended into the image expert (a within-patient image-deviation signal), tested and reported despite not improving the score.
  4. A complete negative-results catalogue. Meta-stacker, learned MoE gate, image embeddings, heavy transformers, over-strong EMA — each logged with its number. The finding that complexity consistently reduces the score at 393 positives is itself a contribution.

Positioning

WarningThe claim
  • SOTA within the no-external-data / no-synthetic / single-dataset class, on a transparent efficiency frontier with leak-audited CV.
  • Not a claim to beat the champions’ unconstrained ~0.1755, which used the banned ingredients and is impossible to match by construction.
  • The best constrained point is OOF pAUC 0.17376 at 15.8 M params / 2.46 GFLOPs / 61 ms CPU; the private-test projection is that minus ~0.01–0.02 (see Results).

Continue to References →