Credits & Positioning

Attribution to the leaderboard solutions and the public-notebook lineage

This work builds on the ISIC-2024 community. Reused components and original contributions are stated explicitly, with attribution.

The leaderboard champions

This study does not beat the unconstrained private leaderboard, and does not attempt to: the two ingredients the top teams relied on (external data, synthetic positives) are banned here. Their results are the benchmark the constrained claim is positioned against.

1st place — Ilya Novoselskiy

Private pAUC 0.17264 (post-competition reports cite ~0.1755 for the best unconstrained configuration). An ensemble of EVA-02 + EdgeNeXt image models fused with a GBDT tabular stack. The solution used external ISIC-archive dermoscopy data and ~30,000 Stable-Diffusion-synthesized malignant lesions, both banned here. Their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC — a quantitative bound on synthetic augmentation for a 393-positive task.

2nd & 3rd place

2nd — uchiyama33 — image + tabular ensemble; the source of the EMA + mixup recipe adopted for the small backbone.
3rd — kyohei-123 — image/tabular blend.

Leaderboard comparison

Table 3. Honest comparison to top Kaggle solutions (champion private-LB pAUC; ours is leak-audited cross-validation).
Solution	External data?	Synthetic data?	pAUC
1st — Ilya Novoselskiy (EVA-02 + EdgeNeXt + GBDT)	Yes	Yes (~30k)	0.17264
2nd — uchiyama33 (image + tabular ensemble)	Yes	Yes	—
3rd — kyohei-123 (image + tabular blend)	Yes	Yes	—
Ours (single-dataset, no external, no synthetic)	No	No	CV 0.17376

Champions used external ISIC-archive dermoscopy and synthetic positives, both banned here; their own ablation reports the ~30k synthetic lesions added only +0.0007 pAUC. Ours is an out-of-fold CV number, not private LB.

The public-notebook lineage

The tabular feature engineering and overall recipe descend from public Kaggle notebooks, reused with attribution:

Source	Reused / learned
greysky — “only tabular”	LightGBM hyperparameters and the tabular-first thesis; the `lgb_bag` params are greysky-lineage, and the bagging + undersampling structure follows this work.
snnclsr	Tabular feature-engineering patterns and CV structure.
motono0223	Image-baseline training recipe (small backbones, undersampling).
andreasbis — “CNN preds as features”	The stacking idea: feed an image model’s OOF probability into the GBDT as a feature.
richolson	Additional baseline and ensembling practice.

Reused vs. original

Reused (with attribution)

The tabular-first thesis and LightGBM hyperparameter lineage (greysky).
The CNN-preds-as-a-feature stacking idea (andreasbis).
The EMA + mixup small-backbone recipe (2nd/4th-place public methodology).
The official pAUC scorer (Kurtansky / MSKCC), vendored to verify the implementation.

Original contributions

The efficiency frontier as the deliverable. The task is reframed from one private-LB number to a quality-vs-cost Pareto frontier across params, FLOPs, and CPU latency; a model counts only if it earns its cost, and every reported model is one measured point.
An audited no-external / no-synthetic discipline. A locked scope (docs/DECISIONS.md), a cv-guardian with veto power, and a hard no-leak guarantee make the constraint auditable rather than asserted.
The image-space ugly-duckling. The tabular ugly-duckling idea extended into the image expert (a within-patient image-deviation signal), tested and reported despite not improving the score.
A complete negative-results catalogue. Meta-stacker, learned MoE gate, image embeddings, heavy transformers, over-strong EMA — each logged with its number. The finding that complexity consistently reduces the score at 393 positives is itself a contribution.

Positioning

The claim

SOTA within the no-external-data / no-synthetic / single-dataset class, on a transparent efficiency frontier with leak-audited CV.
Not a claim to beat the champions’ unconstrained ~0.1755, which used the banned ingredients and is impossible to match by construction.
The best constrained point is OOF pAUC 0.17376 at 15.8 M params / 2.46 GFLOPs / 61 ms CPU; the private-test projection is that minus ~0.01–0.02 (see Results).

Continue to References →