Dataset Description

To obtain the data, follow the instructions on the Data Download page. After your request is approved, you will be able to download the data from that page.


Dataset Structure

hecktor2026_training/
├── CHUM-001
│   ├── CHUM-001__CT.nii.gz
│   ├── CHUM-001__PT.nii.gz
│   └── CHUM-001.nii.gz          # Labels: Background=0, GTVp=1, GTVn=2
├── CHUM-002
├── …
└── HECKTOR_2026_Training_Task.csv   # Clinical data

The available modalities and EHR features for each task container will mimic exactly what is available in the training data.


Data Specification

  • Image data (PET/CT)
    All tasks include PET and CT for each patient. File naming:

    • CenterName_PatientID__Modality.nii.gz
    • __CT.nii.gz — computed tomography
    • __PT.nii.gz — positron emission tomography
  • Segmentations
    One label file per patient: PatientID.nii.gz

    • Label 0 — background
    • Label 1 — primary tumor (GTVp)
    • Label 2 — lymph nodes (GTVn)
  • Clinical information
    The provided CSV file will contain the following information:

    • PatientID
    • CenterID
    • Age
    • Gender
    • Tobacco Consumption
    • Alcohol Consumption
    • HPV Status
    • Relapse
    • RFS
    • T-stage
    • N-stage

Each training and test case consists of one 3D FDG-PET volume registered to a 3D CT volume of the head and neck region, together with expert-delineated contours of the ground-truth lesions (the contours are provided only for the training cases). The segmentation labels take three values: 0 = background, 1 = primary gross tumor volume (GTVp), and 2 = nodal gross tumor volume (GTVn). If multiple lymph nodes are involved, all nodal lesions share the same label 2.

For each patient, clinical information is provided, including recruiting center, age, sex, tobacco and alcohol consumption, performance status, HPV status, and treatment (radiotherapy alone or chemoradiotherapy). Some variables may be missing for a subset of patients. For the training cases only, additional outcome variables are supplied: TN stage, recurrence flag, and recurrence-free survival. These variables, together with the annotated segmentation masks, are intended to guide model development, as both lesion segmentation, staging and prognosis prediction are expected outputs of the system. The TN staging is according to AJCC/UICC 7th Edition, the M stage will not be part of the prediction as majority of cases are M0 and most PET/CT scan are cropped to the head and neck region making detection of widespread of cancer to distant organs difficult.

Some entries may contain missing data, but the 2026 edition includes significant updates.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.


Participating Centers

Data were collected from 11 centers:

Center Acronym PET/CT scanner
Hôpital général juif, Montréal, CA HGJ Discovery ST, GE Healthcare
Centre hospitalier universitaire de Sherbrooke, Sherbrooke, CA CHUS GeminiGXL 16, Philips
Hôpital Maisonneuve-Rosemont, Montréal, CA HMR Discovery STE, GE Healthcare
Centre hospitalier de l’Université de Montréal, Montréal, CA CHUM Discovery STE, GE Healthcare
Centre Hospitalier Universitaire Vaudois, CH CHUV PET/CT GE Discovery D690 TOF
Centre Hospitalier Universitaire de Poitiers, FR CHUP Biograph mCT 40 ToF, Siemens
MD Anderson Cancer Center, Houston, Texas, USA MDA Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)
UniversitätsSpital Zürich, CH USZ Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)
Centre Henri Becquerel, Rouen, FR CHB GE710, GE Healthcare
Centre Hospitalier Universitaire de Nantes, FR CHUN Siemens mCT 64 vision
Centre Hospitalier Universitaire de Brest, FR CHUB Philips GEMINI, Siemens Biograph, Siemens Biograph Vision

The total number of cases is more than 1200 from 11 centers. The total number of training cases is approximately 700 from 8 different centers. The total number of test cases is approximately 400 from 3 centers, consisting of new and previously unseen cases. Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.

The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.


Validation and Testing Process

No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.