Dataset Description¶
To obtain the data, follow the instructions on the Data Download page. After your request is approved, you will be able to download the data from that page.
Dataset Structure¶
hecktor2026_training/ ├── CHUM-001 │ ├── CHUM-001__CT.nii.gz │ ├── CHUM-001__PT.nii.gz │ └── CHUM-001.nii.gz # Labels: Background=0, GTVp=1, GTVn=2 ├── CHUM-002 ├── … └── HECKTOR_2026_Training_Task.csv # Clinical data
The available modalities and EHR features for each task container will mimic exactly what is available in the training data.
Data Specification¶
-
Image data (PET/CT)
All tasks include PET and CT for each patient. File naming:CenterName_PatientID__Modality.nii.gz__CT.nii.gz— computed tomography__PT.nii.gz— positron emission tomography
-
Segmentations
One label file per patient:PatientID.nii.gz- Label 0 — background
- Label 1 — primary tumor (GTVp)
- Label 2 — lymph nodes (GTVn)
-
Clinical information
The provided CSV file will contain the following information:- PatientID
- CenterID
- Age
- Gender
- Tobacco Consumption
- Alcohol Consumption
- HPV Status
- Relapse
- RFS
- T-stage
- N-stage
Each training and test case consists of one 3D FDG-PET volume registered to a 3D CT volume of the head and neck region, together with expert-delineated contours of the ground-truth lesions (the contours are provided only for the training cases). The segmentation labels take three values: 0 = background, 1 = primary gross tumor volume (GTVp), and 2 = nodal gross tumor volume (GTVn). If multiple lymph nodes are involved, all nodal lesions share the same label 2.
For each patient, clinical information is provided, including recruiting center, age, sex, tobacco and alcohol consumption, performance status, HPV status, and treatment (radiotherapy alone or chemoradiotherapy). Some variables may be missing for a subset of patients. For the training cases only, additional outcome variables are supplied: TN stage, recurrence flag, and recurrence-free survival. These variables, together with the annotated segmentation masks, are intended to guide model development, as both lesion segmentation, staging and prognosis prediction are expected outputs of the system. The TN staging is according to AJCC/UICC 7th Edition, the M stage will not be part of the prediction as majority of cases are M0 and most PET/CT scan are cropped to the head and neck region making detection of widespread of cancer to distant organs difficult.
Some entries may contain missing data, but the 2026 edition includes significant updates.
The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.
Participating Centers¶
Data were collected from 11 centers:
| Center | Acronym | PET/CT scanner |
|---|---|---|
| Hôpital général juif, Montréal, CA | HGJ | Discovery ST, GE Healthcare |
| Centre hospitalier universitaire de Sherbrooke, Sherbrooke, CA | CHUS | GeminiGXL 16, Philips |
| Hôpital Maisonneuve-Rosemont, Montréal, CA | HMR | Discovery STE, GE Healthcare |
| Centre hospitalier de l’Université de Montréal, Montréal, CA | CHUM | Discovery STE, GE Healthcare |
| Centre Hospitalier Universitaire Vaudois, CH | CHUV | PET/CT GE Discovery D690 TOF |
| Centre Hospitalier Universitaire de Poitiers, FR | CHUP | Biograph mCT 40 ToF, Siemens |
| MD Anderson Cancer Center, Houston, Texas, USA | MDA | Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare) |
| UniversitätsSpital Zürich, CH | USZ | Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare) |
| Centre Henri Becquerel, Rouen, FR | CHB | GE710, GE Healthcare |
| Centre Hospitalier Universitaire de Nantes, FR | CHUN | Siemens mCT 64 vision |
| Centre Hospitalier Universitaire de Brest, FR | CHUB | Philips GEMINI, Siemens Biograph, Siemens Biograph Vision |
The total number of cases is more than 1200 from 11 centers. The total number of training cases is approximately 700 from 8 different centers. The total number of test cases is approximately 400 from 3 centers, consisting of new and previously unseen cases. Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.
The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.
Validation and Testing Process¶
No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.