Introduction

In the age of personalized medicine, it is essential to assess the tumor response early and accurately to optimize cancer treatment and the patient's management. A variety of approaches for measuring response rates have been developed and, until now, the accepted response criteria usually refer to anatomic imaging. Thus, in 1976, the World Health Organization (WHO) criteria were initially proposed, followed by Response Evaluation Criteria in Solid Tumors (RECIST) (2000) and later RECIST 1.1 (2009). With the introduction of newer cytostatic rather than cytotoxic cancer treatments, the anatomic criteria did not seem to be fully adequate in response assessment [1]. In this scenario, positron emission tomography/computed tomography (PET/CT) with [18F]fluorodeoxyglucose ([18F]FDG) emerged as a useful tool capable of providing prognostically relevant imaging biomarkers [2]: an increased [18F]FDG uptake was observed in the majority of malignant tumors, usually related to proliferative activity and tumor cell viability; after effective therapy, the tumoral [18F]FDG uptake would rapidly decline, preceding changes in tumor size, and reflecting the tumor cell killing rate [2]. Furthermore, considering FDG-avid malignancies, [18F]FDG PET scan could discriminate more accurately whether the residual disease, detectable by morphological imaging, is characterized by a metabolically viable tumor or scarring. Finally, this could be quantified by measuring semiquantitative parameter changes, thus guiding the subsequent patient's therapeutic workup [3, 4]. The first PET-based scoring system came out in 1999 and it was the well-known European Organization for Research and Treatment of Cancer (EORTC) criteria [2]. From that moment on and over the last 20 years, nuclear medicine has gained space, becoming essential for the evaluation of some tumor histological types. Following the success of hybrid imaging, several PET/CT criteria were proposed to standardize the response assessment of various solid and non-solid tumors. An exemplary model is represented by the Deauville criteria, which have been introduced into all major oncological guidelines and to date represent a fundamental tool to guide the management of patients with FDG-avid lymphoma [5, 6]. On the other hand, several other criteria for response assessment have been proposed, but no single method has been fully accepted. Moreover, the introduction of new treatment options (i.e., immunotherapy) has highlighted the necessity of redefining imaging criteria for new patterns of response [7, 8]. This systematic review aims to report the main PET/CT criteria proposed for [18F]FDG-avid tumors to guide physicians on the path to standardization and approval of the best response criteria for each oncological patient.

Research strategy

The review collected all the PET-based response criteria proposed in the literature until November 30, 2022. A bibliographic search was performed on the PubMed/MEDLINE dataset to find original articles concerning the use of [18F]FDG PET/CT criteria for the response assessment in different oncological diseases, following the NCCN guidelines “Treatment by Cancer Type” [9]. Accordingly, we included original articles that evaluated [18F]FDG PET therapy response criteria edited in English and performed on humans. The retrieved articles' references were also checked for additional papers to find any further articles. Further proposed criteria for non-FDG PET/CT were excluded from this review and discussed in a separate article [10]. First, we classified the criteria used to assess the response, even for different tumor types, according to treatment type. Then, we selected treatment response criteria based on cancer types, and finally, the main findings of the emerging criteria were discussed.

Figure 1 summarizes the proposed criteria by therapy and cancer type.

Fig. 1
figure 1

The [18F]FDG PET/CT criteria categorized by therapy and cancer type

[18F]FDG PET treatment response criteria by therapy type

Treatment response criteria to standard therapy

In 1999, the European Organization for Research and Treatment of Cancer (EORTC) criteria were first proposed, based on ten studies including a total of 95 patients of which six were performed in primary brain tumor [2]. The EORTC PET study group recommended reporting [18F]FDG uptake as standardized uptake value (SUV) normalized for body surface area (SUVBSA, in m2) and using an empirical 25% cutoff for clinical response assessments, while 15%–25% cutoff is accepted after one cycle of chemotherapy [2]. After the EORTC, additional suggestions emerged leading to the updated PET Response Criteria in Solid Tumors (PERCIST, version 1.0) in 2009. This criterion was based on several studies with different cancer types, including small cell lung cancer (SCC), colorectal cancer, non-Hodgkin lymphoma (NHL), esophageal cancer, and the Ewing sarcoma family of tumors [3, 11]. Following the RECIST model, both EORTC PET response criteria and PERCIST include four response categories: complete metabolic response (CMR), partial metabolic response (PMR), stable metabolic disease (SMD), and progressive metabolic disease (PMD). However, EORTC and PERCIST show key differences in the metrics used for the analysis, slightly different thresholds to define tumor response and progression, as well as a different approach to lesion selection on the baseline and follow-up scan [12]. Specifically, the PERCIST recommended using the SUV normalized for lean body mass (SULpeak), which is less influenced by the body fat content [3, 13]; thus, therapy response is expressed as SULpeak (or sum of lesion SULs) percentage change between the pre- and post-treatment scan [3]. Even if single-pixel maximum SUV activity is easy and simple to measure in an operator-independent way, SUVmax is more susceptible to noise in comparison with SULpeak, with the risk to overestimate tumor [18F]FDG uptake [12]. Moreover, PERCIST proposed the more stringent 30% SULpeak cutoff [3] and a minimum SUL level at baseline scan to avoid overestimation of response/progression [11, 12].

For EORTC criteria, target lesions are selected on the baseline scan, and the same lesions are re-identified on the follow-up scan and their [18F]FDG uptake changes are measured. According to PERCIST, the lesion with the highest [18F]FDG uptake in the baseline and follow-up scan should be assessed, which is not necessarily the same lesion; this approach eliminates the variability in selecting target lesions and simplifies response assessment by comparing only two measurements [12]. In addition, as a direct consequence of the evolution of RECIST 1.1, PERCIST 1.0 recommends evaluating the sum of SUV changes of up to five hottest lesions (up to 2 per organ) as a secondary measure to assess response which are typically the lesions identified on RECIST 1.1 [3, 12].

As shown in Table 1, PERCIST also added assessment of total lesion glycolysis (TLG) as a secondary outcome measure in PMD, defined as more than 75% in TLG with no decline in SUL. TLG is the product of the tumor volume of interest (VOI) and the mean activity of this VOI. Although TLG provides additional information and can be a promising tool in the evaluation of response, explicit methodologic details should be provided when it is used [3, 14].

Table 1 Overview of PET standard therapy response criteria

In a comparative study, EORTC and PERCIST criteria had an almost perfect agreement in determining tumor response in patients with solid tumors. The disagreement (3.4%) was due to the different approaches (multiple vs. single lesions) as well as the response cutoff values (25% vs. 30%). However, EORTC may be more practical for clinical use, since SUVmax is still the most used parameter to express metabolic tumor activity [11]. Nonetheless, PERCIST seems more adequate for clinical trials considerations, because it provides a more detailed, well-defined, reliable, and robust standardized approach [12].

A slight modification of PERCIST (mPERCIST) was applied in the evaluation of therapy response in 17 consecutive patients with liver metastases from pancreatic cancer receiving 90Y-microsphere radioembolization [15]. Namely, the SUV was recommended to be adjusted to body weight (SUVpeak, in g/mL) [13] and assessed at a 30% cutoff [1]. Michl and colleagues demonstrated a significant prognostic value of metabolic response assessed by mPERCIST based on SUVpeak and TLG with longer overall survival (OS), progression-free survival (PFS), and time to progression (TTP) in responders patients [15]. These results are consistent with Ahmaddy et al.’s study enrolling 22 advanced radioiodine (RAI) refractory differentiated thyroid carcinoma patients receiving lenvatinib, showing a significant correlation between tumor response assessed by mPERCIST with PFS and disease-specific survival (DSS) [1]. In a similar cohort of 25 patients with advanced metastatic RAI-refractory thyroid cancer treated with lenvatinib, Rendl et al. showed the applicability and the clinical value of a further PERCIST 1.0 adaptation, called PERCISTmax, based on the SUVmax parameter [13]. Indeed, comparing PERCISTmax with EORTC, PERCIST 1.0, and mPERCIST an equal performance was demonstrated with an agreement of 100% for the identification of progressive disease requiring treatment changes, and complete response, while small differences were observed in the classification between stable disease or partial metabolic response. This study supported the use of the hottest-lesion approach, which might reflect the most biologically active lesion in thyroid carcinoma patients and could be sufficient for response assessment compared to the analysis including all lesions. In this cohort, the performance of PERCIST 1.0 based on SULpeak seems limited by the high proportion of small tumor lesions in these patients [13]. Response categories are presented in Table 1.

Treatment response criteria to immunotherapy

Immunotherapy is a new treatment approach for many types of malignancies in combination or as first-line treatment, but mainly for advanced stages of disease [16, 17]. The rationale of immunotherapy is to reshape the tumor microenvironment and restore immune surveillance against cancer cells [18] using immunomodulatory monoclonal antibodies against tumor cells or blocking immunological checkpoints [19]. In this context, the immune-checkpoint inhibitors (ICIs), such as programmed cell death protein-1 (PD-1), programmed death ligand 1 (PD-L1), and cytotoxic T lymphocyte-associated protein 4 (CTLA-4), have demonstrated considerable clinical benefits in different types of tumor, such as lung, melanoma, head and neck, and bladder cancers [20]. However, not all patients can benefit from ICIs therapies, and many severe immune-related adverse events can occur [21]. In addition, the huge economic burden derived from these treatments must necessarily lead to improve patient selection and promptly interrupt the drug if no benefit is achieved [22]. In light of these considerations, properly evaluating the response to immunotherapy becomes more and more crucial. Compared to standard therapy, immunotherapy has some challenges, such as the pseudoprogression issue. Pseudoprogression is determined by the activation of the immune system surrounding the tumor. This phenomenon consists of an initial increase in the tumor volume and/or the number of lesions (due to inflammatory cells’ infiltration that mimics cancer progression), followed by the demonstration of tumor shrinkage and a subsequent positive effect in terms of patients’ outcome [23]. If the response is based on the conventional RECIST, these patients may initially meet the response criteria for PD, but later might show a reduction in the tumor burden and a final favourable outcome. Conventional-based CT response assessment has been modified to overcome this limitation by the creation of an immune-related response criteria (irRC) and immune-RECIST criteria [24, 25]. Due to the added value of [18F]FDG PET/CT in this field [23], several PET/CT-based criteria of therapy response evaluation were recently proposed beyond EORTC and PERCIST 1.0 [2, 3]. For example, in advanced melanoma patients in treatment with ipilimumab, the PET Response Evaluation Criteria for Immunotherapy (PERCIMT) demonstrated that the appearance of new functional lesions, even without a CT correlate, has to be defined as PD [26]. Similarly, immunotherapy-modified PERCIST (imPERCIST) criteria includes new lesions in the quantification of tumor [18F]FDG uptake and a patient is classified as PMD only if the intensity of [18F]FDG uptake for measured lesions increases by at least 30%. Also, imPERCIST5 criteria suggest including the sum of SULpeak for up to five lesions to assess the response [27]. In a similar setting, Cho et al. demonstrated that combining PET-based (EORTC and PERCIST 1.0) with CT-based (RECIST 1.1 and irRC) response assessment obtained from PET/CT scans performed early in the course of ICI therapy may predict eventual response in patients with advanced melanoma, even in the presence of an initial increased [18F]FDG uptake probably associated with immune activation [28]. All these criteria have been demonstrated to further improve the prognostic value of [18F]FDG PET/CT. In NSCLC, immune PET Response Criteria in Solid Tumors (iPERCIST, which was adapted from PERCIST) introduce the concept of a dual-time point evaluation of “unconfirmed progressive metabolic disease” (UPMD) status at the scan after treatment (SCAN-2). UPMD at SCAN-2 was re-evaluated after 4 weeks with SCAN-3 to confirm PMD. Indeed, patients with CMR, PMR, or SMD at SCAN-2 or -3 were considered responders. Patients with UPMD confirmed at SCAN-3 were considered non-responders [29]. Response categories are reported in Table 2.

Table 2 Overview of the treatment response criteria to immunotherapy

Treatment response criteria by cancer type

Head and neck cancer: NI-RADS, Hopkins, Deauville, Porceddu, and Cuneo criteria

In patients with head and neck squamous cell carcinoma (HNSCC), international oncological guidelines recommended the use of [18F]FDG PET/CT [30, 31] to assess response to chemoradiotherapy 3 months after the end of treatment [9, 32, 33]. The widespread use of this functional imaging method, which has proved to be very sensitive in defining locoregional and distant disease extent, has led to the development of several PET-based response criteria as a need to standardize imaging pattern interpretation.

The Head and Neck Imaging Reporting and Data System (NI-RADS) was developed as interpretative criteria to standardize the reporting of contrast-enhanced CT (CECT) of post-treatment [18F]FDG PET/CECT. Both the primary tumor site and neck are assessed for recurrence and a category is assigned with related management recommendations, as reported in Table 3 [34]. The numerical category ranges from 0 (= incomplete imaging) through 4 (= definite recurrence). Additionally, NI-RADS 1 (= no evidence of local recurrence or adenopathy) represents an imaging study with benign findings and expected post-treatment changes; NI-RADS 2 (= low suspicion) indicates indeterminate disease, where the imaging findings are likely post-treatment changes, although tumor recurrence remains a possibility. Finally, NI-RADS 3 (= high suspicion) represents highly suspicious imaging findings for a residual or recurrent tumor.

Table 3 The Head and Neck Imaging Reporting and Data System (NI-RADS) category with related management action

Several studies have demonstrated NI-RADS feasibility also in patient outcome prediction, showing a strong association between the score and positive disease rates combining the primary site, lymph nodes, and all target sites. Indeed, positive disease rates (recurrence/persistence rates) of 3.8% for NI-RADS 1, 17.2% for NI-RADS 2, and 59.4% for NI-RADS 3 were reported [35]. Hus et al. encouraged the use of NI-RADS in the post-treatment evaluation and further confirmed the prognostic value of PET/CECT in 199 HNSCC patients, describing for NI-RADS 1, 2, and 3, a growing failure rate at the primary site of 6.4%, 11.1% and 38.5%, at the nodal site of 2.5%, 6.3%, and 50%, with an overall failure rate combining primary and nodal sites of 4.3%, 9.1% and 42.1%, respectively. Conversely, the NI-RADS category did not demonstrate a statistically significant association with treatment failure at the primary tumor site if applied for surveillance of surgically treated HNSCC patients with or without chemoradiotherapy [36]. Later, the same group observed a higher agreement among radiologists for the NI-RADS category compared to prose description (i.e., lexicon responses) at both the primary and neck sites in a total of 80 patients [37]. Wangaryattawanich et al., in their 110 HNSCC patients, reported a negative predictive value (91%) of patients with a complete response classified as NI-RADS 1, higher than that of NI-RADS 2 (85%). This result suggests that patients with an incomplete response should undergo closer imaging surveillance and may need to extend follow-up up to 16 months to detect treatment failure early with the goal of optimizing the patient’s outcome [38].

The NI-RADS demonstrated many indeterminate cases as a result of its subjective interpretation of focal mild to moderate mucosal [18F]FDG uptake without providing a reference area, making it more difficult to split up the cases compared to the other interpretative visual criteria developed in the last few years [39]. Namely, the Deauville score (DS), Hopkins score (HS), 6-point scale Cuneo score (CS), and Porceddu score (PS) were introduced and compared, but none of them was finally approved [39, 40]. They differed in the number of response categories and reference backgrounds considered for therapy response, such as the internal jugular vein (IJV) for HS or the mediastinum blood pool for DS, as reported in Table 4.

Table 4 [18F]FDG PET/CT therapy response criteria used in head and neck cancer patients

Several authors agree that the use of the Hopkins score shows an excellent prediction of PFS and overall survival (OS) [41,42,43], with a lower number of indeterminate cases. However, this criterion demonstrated a low negative predictive value (NPV) of 87.6% for human papilloma virus (HPV)-positive and 77.4% for HPV-negative patients. On the other hand, Bonomo et al. [40] in their multicenter study including a total sample of 350 patients from 11 centers reported that the six-point scale CS is feasible and allows a better positive predictive value (PPV) compared with the HS criteria. Differently, in a large patient cohort of 562 HNSCC, the Zhong et al. study showed that while all four interpretative criteria have comparable diagnostic performance, PS and DS minimize indeterminate results, maintaining a high NPV [39]. The prognostic value of PET is more uncertain with a low PPV when [18F]FDG uptake is equivocal or indeterminate across all four interpretative criteria [39]. The Cuneo criteria seem to improve the PPV value of post-treatment evaluation thanks to the introduction of a new intermediate score, taking into consideration the local background [43, 44]. The ability to distinguish between benign post-treatment inflammation and residual disease remains of paramount clinical importance, as each scenario would require significantly different patient management. Meanwhile, as advocated by NI-RADS, indeterminate cases may be followed by non-invasive closer imaging in the form of PET/CECT and a second interval PET/CT response assessment may be introduced [39].

Lung cancer: Hopkins criteria

[18F]FDG PET/CT is a fundamental imaging method for the diagnostic workup of lung cancer, and its role in diagnosis and staging is already standardized by international guidelines. Although its use in treatment response assessment is not the standard of care, promising evidence suggests the increasing application of the functional tool even in the evaluation of treatment response, mainly with the introduction of new therapies [9]. In this setting, several studies have proposed PET-based quantitative parameters as reliable biomarkers of survival in lung cancer patients in both pre-treatment and post-treatment settings [45]. After 2009, PERCIST criteria were applied for a systematic and structured assessment of PET-based therapy response evaluation [11], but they are difficult to implement in clinical practice. The Hopkins criteria [43] were validated in 2016 by Sheikhbahaei et al. also for therapy response assessment in lung cancer patients [45]. Hopkins criteria are a five-point qualitative scoring system assigned for the primary tumor, locoregional mediastinal disease, and distant metastatic sites, considering metabolic activity in the mediastinal blood pool as a reference [45] (Table 5). Notably, Sheikhbahaei et al. conducted a retrospective study, enrolling 201 patients affected by small cell lung cancer (SCLC) or non-SCLC who underwent [18F]FDG PET/CT after treatment completion (surgical resection, chemotherapy, radiation therapy, or a combination of any of these treatment modalities), and they demonstrated high sensitivity, specificity, and accuracy of Hopkins criteria in predicting survival. The average interval between the date of completion of treatment and the post-treatment [18F]FDG PET/CT study was 7.5 weeks, but no clear indication about the adequate time for the re-evaluation was provided [45]. In 2020, Riyami et al. compared the Hopkins criteria with PET semiquantitative analysis confirming that these criteria ensured a reproducible qualitative assessment of therapeutic response and can be of great value for patient management, observing substantial agreement between readers and almost perfect agreement when categorizing patients into positive and negative [46]. In addition, the authors recorded the highest SUVmax values in the mediastinal blood pool (at the aortic arch, sparing the vessel walls), in the liver background (right lobe, excluding regions involved by disease), and within the active disease in the primary tumor site, lymph nodes, or distant metastasis and they categorized patients according to the five-point scale. No significant inter-reader and inter-criteria agreement difference was identified when the Hopkins score was based on SUVmax as a semiquantitative measure of tracer uptake, thus highlighting that the simplified visual assessment is a sufficiently reliable method for scoring [46].

Table 5 Five-point scoring system (Hopkins criteria) for therapy response assessment in lung cancer patients

Lymphoma: Lugano, LYRIC, and RECIL criteria

Nowadays, [18F]FDG PET/CT is a well-recognized diagnostic tool for staging and treatment response assessment in Hodgkin lymphoma (HL) and FDG-avid non-Hodgkin lymphoma (NHL) [5, 47], becoming essential in patients’ diagnosis and workup. The standardized use of PET/CT in lymphoma has led to the development of one of the most widely used PET/CT criteria: the Deauville score, a visual five-point scale characterized by five metabolic response categories, with mediastinal blood pool and liver uptake as reference regions. Additionally, a score X was introduced to describe new areas of uptake unlikely to be related to lymphoma (Table 6) [47,48,49].

Table 6 Overview of Deauville criteria

The five-point scale, adopted in 2009 by the International Workshop on Interim-PET Scan in Lymphoma in Deauville, was subsequently incorporated into a more detailed response assessment system known as the Lugano classification, already used both for interim analysis and the end-of-treatment assessment [5, 47, 48]. The first two scores of Lugano represent a complete metabolic response at both evaluations (interim and end-of-treatment evaluation PET/CT). Score 3 also represents a good response at the end-of-treatment evaluation in HL, diffuse large B-cell lymphoma (DLBCL), and follicular lymphoma (FL). However, the timing, the clinical context, as well as the ongoing therapies, need to be taken into account when interpreting the intermediate score 3 [5]. Conversely, considering the timing of the assessment, a different interpretation belongs to scores 4 and 5. Namely, on interim evaluation, nodal or extranodal lesions could suggest a chemotherapy-sensitive disease and represent a partial metabolic response if the [18F]FDG uptake is reduced from baseline. At the end of treatment, a metabolic residual disease score of 4 or 5 is considered a treatment failure, even with decreasing uptake from the interim/baseline PET/CT scan. Moreover, scores 4 and 5 are considered treatment failure at both evaluations when the residual disease uptake is not reduced (or it increases) from the baseline and/or when new foci are detected [5, 48].

To note, considering the interim response assessment, the five-point scale proved its reliability in terms of inter-observer agreement for HL [47, 48] being both the most therapy-sensitive type of lymphoma and the most FDG-avid one [50, 51]. In this context, in the study by Biggi et al., independent agreement among four reviewers was reached on 252 out of 260 (97%) advanced HL patients [52]. Even for DLBCL and FL, the agreement was good [53, 54]. However, considering the variable PPV of [18F]FDG PET/CT between studies, it is important to underline that the prognostic value of scores 4 and 5, mainly for some NHL subtypes, is still under investigation and other different semiquantitative parameters have been also investigated [55,56,57,58,59].

The Lugano classification was developed based on conventional treatments. However, the availability of an increasing number of biological agents, such as ICI, requires flexibility in the interpretation of the recommendations to account for their biologic or immunomodulatory properties [60]. Namely, tumor flare/pseudoprogression may occur during the first 2–3 weeks after the start of treatment and is characterized by a rapid, self-limited increase in the size and FDG uptake of the disease as an expression of transient and massive immune recruitment at the cancer site. Conversely, some patients could experience hyper-progression characterized by real tumor overgrowth and poor prognosis [61, 62]. In 2016, the Lymphoma Response to Immunomodulatory Therapy Criteria (LYRIC) was proposed, representing an adaptation of the Lugano classification for the evaluation of lymphoma after immune-based treatment. The LYRIC criteria introduced the concept of the indeterminate response (IR)—instead of progression—to address such lesions until a biopsy or subsequent imaging, after 12 weeks, confirmed true disease progression or not [60, 63]. The upcoming literature showed a trend to consider IR as a real progression, mainly in the case of IR(2), defined in the presence of new or existing lesions with growth ≥ 50% in the context of lack of overall progression (< 50% increase) at any time during treatment. This consideration is a consequence of Chen et al.’s study results, showing that all patients classified as IR per LIRYC at early response assessment were subsequently confirmed as true PMD on next PET scan, while a trend toward a worse OS was observed in IR(2) patients, especially in the presence of new lesion [64].

Lastly, in 2017, the International Response Evaluation Criteria in Lymphoma (RECIL) was also proposed. Conversely to the standard criteria, the RECIL group recommended unidimensional measurements of just up to three target lesions and to combine the change in the sum of diameters of target lesions with the PET Deauville score to define CR and PR, but to consider the CT measurements in defining SD and PD to avoid metabolic misinterpretation. In this context, the minor response (MR) provisional category was introduced, defined as a reduction in the sum of the longest diameters of target lesions by ≥ 10% but < 30%, without the appearance of any new lesions, irrespective of PET scan results [65]. Berzaczy et al. compared the RECIL and Lugano criteria in 54 patients with [18F]FDG-avid NHL, assessing the rates of agreement at interim and end-of-treatment evaluation. The authors showed that when the MR was recorded as PR, the agreement between RECIL and Lugano was 83.3% at interim restaging (κ = 0.69), and 90.7% at end-of-treatment evaluation (κ = 0.79). Moreover, a comparable association with 2-year CR status was pointed out between RECIL and Lugano-based responses at interim and end-of-treatment restaging when MR was considered as responding disease, confirming the prognostic value of PET-based response in [18F]FDG-avid lymphomas [66]. Response categories are reported in Table 7.

Table 7 Overview of the [18F]FDG PET treatment response criteria in lymphoma patients

Multiple Myeloma: IMPeTUs criteria

Nowadays, [18F]FDG PET/CT is used to stage multiple myeloma (MM) patients, to accurately evaluate response to therapy, detect the site of extramedullary (EM) disease, and evaluate relapse with prognostic insights [67]. In 2015, an Italian group of nuclear medicine experts, haematologists, and medical physicists introduced the Italian Myeloma criteria for PET use named IMPeTUs. The five-point scale description evaluated the metabolic state of all aspects of MM disease: the bone marrow (BM), the number and localization of focal PET-positive lesions with or without osteolytic characteristics, the presence and site of EM disease, the presence of paramedullary (PM) disease, and the presence of fractures. The visual degree of [18F]FDG uptake is defined for the target lesion and EM lesions according to the Deauville score. Table 8 shows the IMPeTUs criteria [68]

Table 8 IMPeTUS criteria for response assessment in multiple myeloma patients

Subsequently, the same team assessed these criteria in a wide cohort of 86 symptomatic MM patients enrolled in the multicenter, phase 3 EMN02 study. Multiple [18F]FDG PET/CT scans were performed at baseline, following induction, after treatment, and before the start of maintenance therapy. End of therapy and post-induction PET/CT were carried out, respectively, 90 ± 10 days after autologous stem cell transplantation (ASCT) and 15 ± 5 after induction. The authors reported an interobserver agreement superior to 75% for all the criteria points, reaching 100% for skull lesions detection after therapy. Notably, the concordance was ≥ 75% for bone marrow [18F]FDG uptake intensity, ≥ 76% for the focal score, ≥ 95% for extramedullary disease spread, ≥ 76% for the number of focal lesions, ≥ 77% for the number of lytic lesions, and ≥ 92% for the presence of fractures. Interestingly, the study showed the highest agreement at the end of the treatment time point [69]. Recently, Sachpekidis et al. in 47 patients with newly diagnosed MM explored the potentially significant role of IMPeTUs criteria in patient stratification and response assessment, identifying some parameters to be correlated to patients’ outcomes, such as the number of focal [18F]FDG uptakes, PM or EM disease [70]. Moreover, Zamagni et al. found that focal lesions or BM involvement with [18F]FDG uptake lower than the liver background after therapy is an independent predictor for improved PFS and OS and can be proposed as the standardized criterion of PET complete metabolic response, confirming the value of the Deauville score for patients with MM [71].

Discussion

The need to properly assess treatment response in oncology is a crucial issue in clinical practice. Therapies are becoming more specific and targeted, with many lines available for different oncological diseases; therefore, it may be difficult to decide whether to continue, change, or stop a course of treatment, necessitating the use of objective tools. Since the [18F]FDG PET/CT is ductile, over time it has grown in the management of oncologic patients, especially due to its prognostic insights. Many studies have demonstrated that [18F]FDG PET/CT can be an added value, alongside common CT criteria, in the evaluation of treatment response [23]. With the introduction in clinical practice of cytostatic treatments, rather than cytotoxic, metabolic evaluation has been proven to be useful since these newer therapies may not lead to a significant decrease in tumor size or a restored morphological appearance [1]. Furthermore, the introduction of immunotherapy has raised some issues that conventional imaging alone could not overcome. The evaluation of doubtful response patterns that may occur in some scenarios can be more accurate with [18F]FDG PET/CT rather than CT, and a therapy scheme can be continued with clinical benefits even in the presence of metabolic or morphologic findings on imaging. The ability to distinguish between benign post-treatment inflammation and residual metabolically active disease remains of paramount clinical importance to correctly guide patients’ management [7]. Based on these considerations, the introduction in clinical practice of [18F]FDG PET/CT for response evaluation to cancer treatments started in 1999 with EORTC criteria, followed by PERCIST 1.0 in 2009 [2, 3]. These generic criteria may be applied for therapy response assessment in all solid tumors. However, they are not fully validated and widely utilized, leaving some unsolved clinical needs. Since some tumor types respond worse than others, various modified response criteria for particular tumor types and/or therapies have been developed [12].

In this literature review, we have listed and discussed the most relevant [18F]FDG PET/CT criteria for the evaluation of specific therapies and the staging of oncological diseases that have been created to maximize the assessment of response to therapy. With the introduction of immunotherapy in clinical practice, new imaging challenges emerged (i.e., pseudoprogression) and much effort has gone into standardizing the post-treatment imaging interpretation. The immune-modified criteria (PERCIMT, iPERCIST, imPERCIST5) have been demonstrated to overcome the limited sensitivity (94% vs. 64%) and specificity (84% vs. 80%) of conventional EORTC criteria in predicting patients’ outcomes [29, 72], especially if applied at earlier time points (PECRIT) [73]. Some authors suggested integrating functional with anatomic parameters [28, 74, 75], or to introduce a dual time point evaluation to further improve the prognostic value of PET immunotherapy response assessment [29]. However, there is still not enough data and larger prospective trials with long-term follow-up will be needed to identify the best response criteria [76].

For several oncological illnesses, early detection of recurrence and an adequate assessment of therapy response are crucial. For HNSCC patients, NI-RADS was developed as interpretative criteria, demonstrating a significant prognostic value [36], except if applied for surveillance of the primary tumor site in surgically treated patients. To overcome the NI-RADS limitations, closer non-invasive imaging surveillance at different time points was suggested [38], and other interpretative visual criteria were introduced (DS, HS, CS, and PS) to reduce many indeterminate cases, but none has been finally approved [39, 40]. Later on, the Hopkins qualitative scoring system was adapted and validated for therapy response assessment in lung cancer patients, demonstrating a high sensitivity, specificity, accuracy, and reliability in predicting survival [45, 46]. To note, our analysis highlighted the lack of defined criteria for other cancer types (such as breast, gastrointestinal tumors, gynecological malignancies). We can assume that this shortage may be due to the paucity of randomized clinical trial aiming to validate [18F]FDG PET/CT for assessing therapy response in a specific setting of various cancer patients. The need to correctly and objectively interpret post-treatment functional status in specific oncological patients has led to the adaptation of existing criteria, both semiquantitative and visual, to the specific tumor type, but robust evidence has not yet been obtained [77,78,79,80,81,82].

Conversely, the essential role of PET/CT criteria in HL and FDG-avid NHL is well recognized and routinely used in clinical practice to guide patient management. In 2016, the Lugano classification was adapted into LYRIC for the specific evaluation of immunotherapy response in lymphoma patients. The main difference was the introduction of IR in indeterminate cases waiting for biopsy or subsequent imaging to confirm either a pseudoprogression or a true progression [60, 63]. Finally, [18F]FDG PET/CT criteria in MM patients were filled in 2015 by the IMPeTUs criteria, based on the Deauville score system [68], showing an important role in patient risk stratification. These criteria need to be further studied, but could be considered as a base for harmonizing and standardizing PET response assessment in MM patients [71].

Conclusions

The increasingly crucial use of [18F]FDG PET/CT in response assessment in different oncological disease has led to the development of many PET-based criteria for the evaluation of therapy response, especially after the introduction of new biological therapeutic agents. Moreover, considering the increasing introduction of PET/CT in oncological guidelines and some current examples of success in using response criteria (e.g., Deauville), it is of paramount importance to translate these objective criteria into clinical practice to improve the management of cancer patients. In this context, a significant effort to standardize and identify the best [18F]FDG PET response criteria tailored for each oncological patient is observed, even if specific criteria for malignancies need to be further validated.