stat_compare_means comparisons with multiple groups #65

grst · 2018-01-23T13:29:10Z

I have the following plot and want to want to compare HER2+ to triple-negative
for each gene (41BB, CD8A, ...) using ggpubr's stat_compare_means.
In the end I want to have 6 p-values, one over each gene.

Is that possible currently with ggpubr? Otherwise I would highly appreciate that feature to be implemented.

What I tried:

simply adding + stat_compare_means() which obviously compares all groups
adding + stat_compare_means(comparisons = list(c("HER2+", "triple-negative")) which fails with Computation failed in stat_signif(): missing value where TRUE/FALSE needed
adding + stat_compare_means(comparisons = list(c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"))) which fails with the same error message.

See also my question on stackoverflow

The text was updated successfully, but these errors were encountered:

kassambara · 2018-01-23T13:40:57Z

Hi,

Groups to be compared should be on x-axis not in the legends.

I would go as follow: Plot y= expression by x = status and then facet by SYMBOL.

ggplot(mydata, aes(x = status, y = expression)) +
  facet_wrap(~SYMBOL)+
  stat_compare_means(comparisons = list(c("HER2+", "triple-negative")))

This will create a multi-panel plot. Each panel correspond to a gene

grst · 2018-01-23T13:43:47Z

Thanks, that's a good workaround!

grst · 2018-01-23T14:04:37Z

One issue remains: p-value adjustment does not apply for facetted plots. I think it should.
See also #59

kassambara · 2018-01-23T14:41:19Z

To display adjusted p-values, use : stat_compare_means(aes(label=..p.adj..))

grst · 2018-01-23T14:55:42Z

Yes, but it does not adjust for multiple facets.

Take the example below, it displays exactly the same p-value both for adjusted and non-adjusted:

library(ggplot2)
library(ggpubr)
library(tidyverse)

a = rnorm(mean=1, 100)
b = rnorm(mean=1.5, 100)

data = rbind_all(list(
  data.frame(facet=1, group="A", value=a),
  data.frame(facet=2, group="A", value=a),
  data.frame(facet=1, group="B", value=b),
  data.frame(facet=2, group="B", value=b)
))

data %>%
  ggplot(aes(y=value, x=group)) + 
  geom_boxplot() + 
  facet_wrap(~facet) + 
  stat_compare_means(aes(label=..p..))

data %>%
  ggplot(aes(y=value, x=group)) + 
  geom_boxplot() + 
  facet_wrap(~facet) + 
  stat_compare_means(aes(label=..p.adj..))

kassambara · 2018-01-23T15:15:53Z

ok, I can see now:

The R code below returns the adjusted p-value:

compare_means(value ~ group, group.by = "facet", data = data)

But, the function stat_compare_means() does not display the adjusted p-value.

This issue is related to the way ggplot2 facet works. When using facet, statiscal computation is applied to each single panel independently. As for each panel we have only one single comparison, the adjusted p-value remains unchanged.

I will work on this. A solution would be to give an option to users to annotate the plot using the output of compare_means()

ishengtsai · 2018-01-24T07:56:22Z

Hi,

I have exactly the same the problem. Just leaving a note here if an update is available.

kylec · 2018-03-14T20:07:18Z

A hack to annotate with adjusted pvals

library(ggplot2)
library(ggpubr)
library(tidyverse)
library(ggsignif)

df = ToothGrowth

# annotation table with adjusted pvals and y-position of the labels
anno_df = compare_means(len ~ supp, group.by = "dose", data = df) %>%
 mutate(y_pos = 40)

ggplot(df, aes(x=supp, y=len)) + 
  geom_boxplot(position=position_dodge()) + 
  geom_point(aes(color=supp), position=position_jitterdodge()) + 
  facet_wrap(~dose) + 
  ggsignif::geom_signif(
    data=anno_df, 
    aes(xmin=group1, xmax=group2, annotations=p.adj, y_position=y_pos), 
    manual=TRUE
  )

dregele · 2018-06-06T16:50:06Z

Another hacky solution by modifying the ggplot object itself. This keeps the placement and formatting of the labels created by ggpubr. Should also help #86 with some adjustments.

df <- ToothGrowth

## construct plot
boxp <- ggplot(df, aes(x=supp, y=len)) + 
  geom_boxplot(position=position_dodge()) + 
  geom_point(aes(color=supp), position=position_jitterdodge()) + 
  facet_wrap(~dose) +
  stat_compare_means()

## build plot
boxp2 <- ggplot_build(boxp)

## look at data and find the label
boxp2$data

## in this case, 'label'
##  replace with adjusted p-value
boxp2$data[[3]]$label <- p.adjust(boxp2$data[[3]]$p, method = "holm")

## plot anew
plot(ggplot_gtable(boxp2))

atakanekiz · 2018-06-28T03:52:55Z

I tried these hacks but couldn't get it to work. Is there any development on this issue? In my case, what I'm trying to achieve is slightly different, however. I have 4 groups plotted on the same graph (the same variable measured) and I'd like to add p.adj values on selected pairs of the data. Any help is appreciated.

kassambara · 2018-06-28T05:52:24Z

Please provide a reproducible R code.

Specify the argument aes(label = ..p.adj..) in the function stat_compare_means(). Read more: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/

The following solution works for me:

library(ggpubr)
library(ggsignif)
library(tidyverse)

df <- ToothGrowth

# annotation table with adjusted pvals and y-position of the labels
anno_df <- compare_means(len ~ supp, group.by = "dose", data = df, p.adjust.method = "holm") %>%
 mutate(y_pos = 40, p.adj = format.pval(p.adj, digits = 2))

# Create a boxplot and add custom p-value
ggboxplot(df, x = "supp", y = "len", ggtheme = theme_bw())+
  facet_wrap(~dose) + 
  geom_signif(
    data=anno_df, 
    aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y_pos), 
    manual= TRUE
  )

atakanekiz · 2018-06-28T16:11:35Z

Hi, please see my reprex below. I tried @dregele's solution and it worked for me. It is a nice workaround but involves few extra lines. The issue of that approach is scalability, I think. I want to create a loop function in which numerous graphs are generated by going through different genes. The main problem I'm running into in this implementation is the need of passing gene names (variable names in the data.frame) quoted (for ggboxplot() function) and unquoted (for compare_means() function).

I also tried to edit the function itself to replace p value with p.adj but I wasn't able to get it to work. If ggpubr can do this in a simple addition of a string, that would be amazing! Thanks!

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr

# Simulate data frame of 20 observations for a gene across 4 groups of sample (5 each)
df <- data.frame(matrix(NA, nrow=20, ncol = 2))
colnames(df) <- c("Sample", "Gene_expr")
df$Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
df$Gene_expr <- c(rnorm(5, mean = 1, sd = 1),
                  rnorm(5, mean = 2, sd = 1.5),
                  rnorm(5, mean = 4, sd = 0.5),
                  rnorm(5, mean = 8, sd = 2))



# Comparisons I care about
my_comparisons <- list(c("Control", "Drug_1"),c("Control", "Drug_2"),c("Control", "Drug_3"))


# See all pair-wise comparisons
compare_means(Gene_expr~Sample, df, method = "t.test")
#> # A tibble: 6 x 8
#>   .y.       group1  group2        p   p.adj p.format p.signif method
#>   <chr>     <chr>   <chr>     <dbl>   <dbl> <chr>    <chr>    <chr> 
#> 1 Gene_expr Control Drug_1 0.0143   0.0178  0.01431  *        T-test
#> 2 Gene_expr Control Drug_2 0.000441 0.00265 0.00044  ***      T-test
#> 3 Gene_expr Control Drug_3 0.00121  0.00483 0.00121  **       T-test
#> 4 Gene_expr Drug_1  Drug_2 0.000950 0.00475 0.00095  ***      T-test
#> 5 Gene_expr Drug_1  Drug_3 0.00297  0.00890 0.00297  **       T-test
#> 6 Gene_expr Drug_2  Drug_3 0.00891  0.0178  0.00891  **       T-test

# This plots unadjusted p value (p.format column)
ggboxplot(df, x="Sample", y="Gene_expr", fill = "Sample")+
  stat_compare_means(comparisons=my_comparisons, method = "t.test", aes(label=..p.adj..))

Created on 2018-06-28 by the reprex package (v0.2.0).

atakanekiz · 2018-06-28T19:29:23Z

Tried with that order but didn't work for me (please see below). Also, a point of curiosity: Are the p.signif symbols determined based on p value, or based on the value of p.adj?

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr

set.seed(123)

# Simulate data frame of 20 observations for a gene across 4 groups of sample (5 each)
df <- data.frame(matrix(NA, nrow=20, ncol = 2))
colnames(df) <- c("Sample", "Gene_expr")
df$Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
df$Gene_expr <- c(rnorm(5, mean = 1, sd = 1),
                  rnorm(5, mean = 2, sd = 1.5),
                  rnorm(5, mean = 4, sd = 0.5),
                  rnorm(5, mean = 8, sd = 2))



# Comparisons I care about
my_comparisons <- list(c("Control", "Drug_1"),c("Control", "Drug_2"),c("Control", "Drug_3"))


# See all pair-wise comparisons
compare_means(Gene_expr~Sample, df, method = "t.test")
#> # A tibble: 6 x 8
#>   .y.       group1  group2        p   p.adj p.format p.signif method
#>   <chr>     <chr>   <chr>     <dbl>   <dbl> <chr>    <chr>    <chr> 
#> 1 Gene_expr Control Drug_1 0.425    0.425   0.42489  ns       T-test
#> 2 Gene_expr Control Drug_2 0.000521 0.00313 0.00052  ***      T-test
#> 3 Gene_expr Control Drug_3 0.00379  0.0190  0.00379  **       T-test
#> 4 Gene_expr Drug_1  Drug_2 0.0454   0.0952  0.04543  *        T-test
#> 5 Gene_expr Drug_1  Drug_3 0.00431  0.0190  0.00431  **       T-test
#> 6 Gene_expr Drug_2  Drug_3 0.0317   0.0952  0.03173  *        T-test


# This plots unadjusted p value (p.format column)
ggboxplot(df, x="Sample", y="Gene_expr", fill = "Sample")+
  stat_compare_means(mapping=aes(label=..p.adj..), comparisons=my_comparisons, method = "t.test")

Created on 2018-06-28 by the reprex package (v0.2.0).

kassambara · 2018-06-29T05:30:01Z

In your example, you are comparing each group against a reference, here "Control".

Load required packages and create a demo data set

library(ggpubr)
library(tidyverse)

# Create a demo gene expression data
#+++++++++++++++++++++++++++++++++++++
Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
gene1 <- c(
  rnorm(5, mean = 1, sd = 1),rnorm(5, mean = 2, sd = 1.5),
  rnorm(5, mean = 4, sd = 0.5), rnorm(5, mean = 8, sd = 2)
  )
gene2 <- 2*gene1
gene3 <- 1.5*gene1
df <- data_frame(Sample = Sample, gene1 = gene1, gene2 = gene2, gene3 = gene3)

# Inspect 6 random rows
#+++++++++++++++++++++++++++++++++++++
set.seed(123)
sample_n(df,  6)

# A tibble: 6 x 4
   Sample     gene1     gene2     gene3
                   
1  Drug_1  1.824137  3.648274  2.736206
2  Drug_2  3.475553  6.951107  5.213330
3  Drug_1  3.920832  7.841665  5.881248
4  Drug_3 10.589527 21.179053 15.884290
5  Drug_3  9.651080 19.302159 14.476620
6 Control  1.800554  3.601109  2.700831

Easy solution to visualize the gene expression with p-value (comparing to the reference group)

Adjusted p-values are shown:

ggboxplot(df, x="Sample", y="gene1", fill = "Sample")+
  stat_compare_means(
    mapping=aes(label = format.pval(..p.adj.., digits = 1)),
    method = "t.test", ref.group = "Control"
    )

Solution two

# Helper functions to visualize one gene expression with pvalue
#+++++++++++++++++++++++++++++++++++++
plot_expr <- function(.gene, .data, .grp = "Sample", .ref.group = "Control"){
  # Compute p-values
  comparison.formula <- paste0(.gene, "~", .grp) %>%
    as.formula()
  pvalues <- compare_means(
    comparison.formula,  data = .data,
    method = "t.test", ref.group = .ref.group
    )
  # P-value y coordinates
  y.max <- .data %>% 
    pull(.gene) %>% max(na.rm = TRUE) 
  p.value.y.coord <- rep(y.max, nrow(pvalues))
  
  step.increase <- (1:nrow(pvalues))*(y.max/10)
  p.value.y.coord <- p.value.y.coord + step.increase
  
  pvalues <- pvalues %>%
    mutate(
      y.coord =  p.value.y.coord, 
      p.adj = format.pval(p.adj, digits = 1)
      )
  
  ggboxplot(.data, x = .grp, y = .gene, fill = .grp)+
    ggsignif::geom_signif(
      data = pvalues, 
      aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y.coord), 
      manual= TRUE, tip_length = 0.03, size = 0.4
    )
}

# Visualize gene 1
plot_expr("gene1", .data = df, .grp = "Sample", .ref.group = "Control")

atakanekiz · 2018-06-29T06:18:19Z

Thanks for the insights. In my reprex, I just focused on the comparisons with control group as an example. I'm also interested in comparing the drug treatments with each other as well for my actual data analysis. How can I go about doing this in that case?

I can think of passing one of the groups (i.e. Drug_1) as the reference group, but then, I will have 3- way comparisons involving that group. Sometimes I may only want to show one of these.

My end goal is generating something like this (with p.adj values):

Created on 2018-06-29 by the reprex package (v0.2.0).

atakanekiz · 2018-06-30T05:08:31Z

(To answer my previous question)

With some help from Stack Overflow, I edited the plotting function that @kassambara wrote to include a comparisons arguments that can be specified during calling the function. Hopefully, it will be useful to others.

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr
library(tidyverse)

# Create a demo gene expression data

set.seed(1)
Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
gene1 <- c(
  rnorm(5, mean = 1, sd = 1),rnorm(5, mean = 2, sd = 1.5),
  rnorm(5, mean = 4, sd = 0.5), rnorm(5, mean = 8, sd = 2)
)
gene2 <- 2*gene1
gene3 <- 1.5*gene1
df <- data_frame(Sample = Sample, gene1 = gene1, gene2 = gene2, gene3 = gene3)

# Inspect 6 random rows
set.seed(123)
sample_n(df,  6)
#> # A tibble: 6 x 4
#>   Sample  gene1  gene2  gene3
#>   <chr>   <dbl>  <dbl>  <dbl>
#> 1 Drug_1  0.769  1.54   1.15 
#> 2 Drug_2  4.56   9.12   6.84 
#> 3 Drug_1  3.11   6.21   4.66 
#> 4 Drug_3  7.91  15.8   11.9  
#> 5 Drug_3  7.97  15.9   12.0  
#> 6 Control 0.374  0.747  0.560



plot_adjP <- function(.gene, .data, .grp = "Sample", comparisons = NULL){
  # Compute p-values
  comparison.formula <- paste0(.gene, "~", .grp) %>%
    as.formula()
  pvalues <- compare_means(
    comparison.formula,  data = .data,
    method = "t.test"
  )
  
  # If a comparison list is provided, extract the comparisons of interest for plotting
  if(!is.null(comparisons)){
  comparisons <- list(c("Control", "Drug_1"), c("Drug_2" ,"Drug_3"))
  pvalues <- map_df(comparisons, ~ pvalues %>% filter(group1 == .x[1] & group2 == .x[2]))
  
  }
  
  # P-value y coordinates
  y.max <- .data %>% 
    pull(.gene) %>% max(na.rm = TRUE) 
  p.value.y.coord <- rep(y.max, nrow(pvalues))
  
  step.increase <- (1:nrow(pvalues))*(y.max/10)
  p.value.y.coord <- p.value.y.coord + step.increase
  
  pvalues <- pvalues %>%
    mutate(
      y.coord =  p.value.y.coord, 
      p.adj = format.pval(p.adj, digits = 1)
    )
  
  ggboxplot(.data, x = .grp, y = .gene, fill = .grp)+
    ggsignif::geom_signif(
      data = pvalues, 
      aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y.coord), 
      manual= TRUE, tip_length = 0.03, size = 0.4
    )
}

comparisons <- list(c("Control", "Drug_1"), c("Drug_2", "Drug_3"))

# Visualize gene 1
plot_adjP("gene1", .data = df, .grp = "Sample", comparisons = comparisons)
#> Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

plot_adjP("gene1", .data = df, .grp = "Sample")
#> Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

kassambara · 2018-06-30T10:58:20Z

Oh great, thank you for your contribution.

I'll add new features in ggpubr, to simplify this process.

I think also that, when users provide a specific number of comparisons to be performed, the pvalues should be adjusted/re-adjusted according to that number of tests, but not according to the number of all possible comparisons.

So in the above helper function, after filtering (using map_df), one might need to re-adjust the p-values. I'll take this into account in the new features

atakanekiz · 2018-06-30T14:02:10Z

Perfect! Thanks! Ggpubr rocks

FFoQS90 · 2018-07-06T13:07:36Z

Hi,
I am trying to include p-values in my plot with stat_compare_means but I am having problems when I want to plot multiple groups.
So I want to do pairwise comparisons of 3 groups, which works out fine with only one column:

But when I want to do this for multiple columns I get this:

It is only plotting p-values from 2 comparisons for the first column and 1 for the second....What am I doing wrong here?

When I run compare_means it seems to calculate all the possible combinations just fine.

Here my code to reproduce the problem with some test data:

create test data.frame
Species <- c("a","b","c","d","e")
Location <- c("LocA","LocB","LocC")
values <- runif(60,min = 0, max = 150000)
data <- data.frame(Species,Location,values)
make list with comparisons
test_comparisons <- list(c("LocA","LocB"),c("LocB", "LocC"),c("LocA", "LocC"))
plot
testplot <- ggplot(data, aes(x=Location, y=values, fill=Species))+geom_bar(stat = "identity", position = "dodge")+scale_y_log10()+facet_grid(~Species)+stat_compare_means(comparisons = test_comparisons, label = "p.signif")
run compare means
test<-compare_means(values~Location,data = data, method = "wilcox.test",group.by = "Species")

kassambara · 2018-07-07T06:54:42Z

@FFoQS90: the problem is that you're trying to add p-values by comparing one numeric value vs one numeric value. You should compare groups of values.

The following R code works for me.

library(ggpubr)
ggboxplot(
  ToothGrowth, x = "dose", y = "len", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

ggbarplot(
  ToothGrowth, x = "dose", y = "len",
  add = "mean_se", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

FFoQS90 · 2018-07-09T08:49:59Z

Thank you for helping me out here.
Actually, I am comparing groups of values. Only 4 per group in the case of the test dataset but that should be sufficient for demonstration reasons.

I could get it to work by using ggbarplot or ggboxplot(...facet.by=..) instead of ggplot(data, aes..)+geom_bar()+facet_grid()!!

Is it possible that geom_bar() or facet_grid() are somehow causing the problem?

FFoQS90 · 2018-07-12T09:26:38Z

Thanks, the new function stat_pvalue_manual works great!!
I really appreciate your work on ggpubr. It is an awesome package.

kassambara · 2018-07-23T21:45:02Z

Now, the new function stat_pvalue_manual() [In ggpubr] can be used to add manually p-values/adjusted p-values generated elsewhere:

library(tidyverse)
library(rstatix)   
library(ggpubr)

# Pairwise t-test between groups
stat.test <- ToothGrowth %>%
  group_by(dose) %>%
  t_test(len ~ supp) %>%
  adjust_pvalue() %>%
  mutate(y.position = 35)
stat.test

# A tibble: 3 x 9
   dose   .y. group1 group2  statistic      p method  p.adj y.position
                         
1   0.5   len     OJ     VC  3.1697328 0.0064 T-test 0.0128         35
2   1.0   len     OJ     VC  4.0327696 0.0010 T-test 0.0030         35
3   2.0   len     OJ     VC -0.0461361 0.9600 T-test 0.9600         35

# Create a box plot and add the p-value
p <- ggboxplot(
  ToothGrowth, x = "supp", y = "len",
  color = "supp", palette = "jco",
  facet.by = "dose", ylim = c(0, 40)
)

p + stat_pvalue_manual(stat.test, label = "p.adj")

richardbagz · 2018-11-06T13:46:08Z

Hi, Is it possible to not have to specify the y.position i.e. coordinates for the absolute positioning of the p-value label?
I have facetted plots for which I used facet_wrap (~ data , scales = "free_y") to get the y-axis scale to be "free".

Looking forward to a work around this issue.
Thanks

audyavar · 2019-03-02T12:38:45Z

Is it possible to do stat_compare_means on a stacked barplot between 2-factor levels of genes (x-axis) vs 2-factor levels on y-axis?

bznes · 2019-03-15T21:47:48Z

Part of the issue may be the version of ggpubr. I posted a similar question earlier this week:
https://github.com/kassambara/ggpubr/issues/48#issue-272641556))

While I had just downloaded the package and thought I was running the most current package, sessionInfo() said I was running 0.1.5. After I updated to ggpubr_0.2, I was able to get stat_compare_means to recognize data=subset(data,by=Treatment) for my comparisons. One thing to note, use facet.by="Treatment" in the ggbarplot line to check that the pvalues shown when you do the combined plot are for the subsetted data. When it wasn't working in the older version, I would get a p-value for the pooled group (treated and untreated), ignoring the selected data. This may help part of your problem (and provide a work around if you put every comparison you want in the my_comparisons) but I have not found a way to do those comparisons automatically.
my_comparisons=list(c("C1","C2"), c("C1","D"),c("C2","D"),c("D","E"),c("D","F"))
d.treated <- d[d$Treatment == "treated",]
ggbarplot(d=d, x = "Group",y="response",color="Treatment",add = "mean_se",position = position_dodge(0.8))+ stat_compare_means(data=d.treated,comparisons=my_comparisons)

ggbarplot(d=d, x = "Group",y="response",color="Treatment",facet.by="Treatment",add = "mean_se",position = position_dodge(0.8))+ stat_compare_means(comparisons=my_comparisons)

My question now is if there is a way to move the brackets above the first plot to center over the data being compared, not centered over the group for both treatment levels?

xiaorenwu1111 · 2019-03-30T11:15:37Z

@FFoQS90: the problem is that you're trying to add p-values by comparing one numeric value vs one numeric value. You should compare groups of values.

The following R code works for me.

library(ggpubr)
ggboxplot(
  ToothGrowth, x = "dose", y = "len", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

ggbarplot(
  ToothGrowth, x = "dose", y = "len",
  add = "mean_se", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

Hello,
I met some questions when used this packages.Could you help me?
This is my code：
library(ggpubr)
ggboxplot(
distance, x="Group", y="Distance",
add = "mean_se",
facet.by = "Method"
) +
stat_compare_means(
comparisons = list(c("A", "B"), c("B", "C"), c("C", "D")),
label = "p.signif"
)
This is my data structures:
Group Distance Method
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
B 162 Hamming distance
B 151 Hamming distance
B 90 Hamming distance
B 150 Hamming distance
B 131 Hamming distance
B 107 Hamming distance
B 145 Hamming distance
B 87 Hamming distance
B 103 Hamming distance
B 96 Hamming distance
B 114 Hamming distance
B 102 Hamming distance
B 103 Hamming distance
B 91 Hamming distance
B 71 Hamming distance
B 77 Hamming distance
B 77 Hamming distance
B 67 Hamming distance
B 40 Hamming distance
C 179 Hamming distance
C 167 Hamming distance
C 109 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 129 Hamming distance
C 89 Hamming distance
C 86 Hamming distance
C 109 Hamming distance
C 86 Hamming distance
C 93 Hamming distance
C 89 Hamming distance
C 80 Hamming distance
C 55 Hamming distance
D 275 Hamming distance
D 250 Hamming distance
D 193 Hamming distance
D 241 Hamming distance
D 235 Hamming distance
D 186 Hamming distance
D 240 Hamming distance
D 174 Hamming distance
D 183 Hamming distance
D 193 Hamming distance
D 171 Hamming distance
D 182 Hamming distance
D 169 Hamming distance
D 159 Hamming distance
D 141 Hamming distance
D 131 Hamming distance
D 122 Hamming distance
D 111 Hamming distance
D 94 Hamming distance
I want to draw a picture like this:

But I used above data and code can't get similar picture.

And I got some warning messages:
Warning messages:
1: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
2: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
3: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
Could you help me ? Thank you !

youlijun · 2019-04-26T03:48:17Z

Hello everyone,
I am very interested in this, but I am in trouble.
I want to calculate the p values of gene1, gene2, and gene3 at one time instead of one by one. how to do ？I only want batch calculation p-value

youlijun · 2019-04-26T09:11:38Z

@kassambara

Payamdel · 2019-10-17T09:18:16Z

Hi
When using compare_means() to perform the test for multiple response variables at the same time, the p-values are not adjusted for multiple hypothesis testing across the variables!

Any idea to solve the issue?

Create some example data

set.seed(42) # to make the example reproducible
study_data <- data.frame(
  group = factor(c(rep("healthy", 50), rep("patient", 50))),
  responseA = c(rnorm(50, mean = 20, sd = 2), rnorm(50, mean = 22, sd = 3)),
  responseB = c(rnorm(50, mean = 12, sd = 1), rnorm(50, mean = 22, sd = 1)),
  responseC = c(rnorm(50, mean = 2.5, sd = 1), rnorm(50, mean = 3.5, sd = 1)))

First example:

  compare_means(
  c(responseA, responseB, responseC) ~ group, 
					study_data, 
					method = "wilcox.test", 
					p.adjust.method = "BH")

A tibble: 3 x 8

  .y.       group1  group2         p    p.adj p.format p.signif method  
  <fct>     <chr>   <chr>      <dbl>    <dbl> <chr>    <chr>    <chr>   
1 responseA healthy patient 1.57e- 5 1.60e- 5 1.6e-05  ****     Wilcoxon
2 responseB healthy patient 7.07e-18 7.10e-18 < 2e-16  ****     Wilcoxon
3 responseC healthy patient 1.47e- 5 1.50e- 5 1.5e-05  ****     Wilcoxon

Second Example:

  compare_means(
  c(responseA, responseB, responseC) ~ group, 
					study_data, 
					method = "wilcox.test", 
					p.adjust.method = "bonferroni")

A tibble: 3 x 8

  .y.       group1  group2         p    p.adj p.format p.signif method  
  <fct>     <chr>   <chr>      <dbl>    <dbl> <chr>    <chr>    <chr>   
1 responseA healthy patient 1.57e- 5 1.60e- 5 1.6e-05  ****     Wilcoxon
2 responseB healthy patient 7.07e-18 7.10e-18 < 2e-16  ****     Wilcoxon
3 responseC healthy patient 1.47e- 5 1.50e- 5 1.5e-05  ****     Wilcoxon

flimflamr · 2019-10-30T12:01:35Z

Hi,
silly little thing but driving me crazy: I can't get a "p=" in front of p-values when there are multiple comparisons in a plot, i.e. "stat_compare_means(aes(label = paste0("p = ", ..p.format..)),comparisons = Comps)", does not change the label and only gives the p-values. Nowhere online or in this forum have I seen "p=" in such cases, so I assume it isnt just me...

regan-hamel · 2020-06-19T09:26:57Z

if there is a way to move the brackets above the first plot to center over the data being compared, not centered over the group for both treatment levels?

Was this ever resolved?

kassambara · 2020-06-19T19:01:35Z

To add p-values onto grouped plots, please consider the following blog post:

How to Add P-Values onto a Grouped GGPLOT using the GGPUBR R Package

First, make sure you have installed the lates dev version of rstatix and ggpubr before following the tutorial:

devtools::install_github("kassambara/rstatix")
devtools::install_github("kassambara/ggpubr")

LaserKate · 2020-07-13T11:52:07Z

Could this be elaborated to multiple facets? For example:
df %>%
ggboxplot(x = "x", y = "y",
color = "y",
add = "jitter") +facet_grid(.~Species+Region, scales="free_x")

LaserKate · 2020-08-01T04:11:53Z

Potential solution to when you have multiple facets:
test< -df %>%
ggboxplot(x = "x", y = "y",
color = "y",
add = "jitter") +facet_grid(.~Species+Region, scales="free_x")

stat.test <- df %>%
group_by(Species,Region) %>%
t_test(y ~ x) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance("p.adj") %>%
add_xy_position(x = "x", dodge = 0.8)

test + stat_pvalue_manual(
stat.test, label = "{p.adj}", tip.length = 0
)

venkan · 2021-11-10T11:56:10Z

I am having an Error while using plot_adjP. I used this function on my data which is like below and see the following Error.

str(A4)
tibble [140 × 3] (S3: tbl_df/tbl/data.frame)
 $ NewGroup: chr [1:140] "EATL" "EATL" "EATL" "EATL" ...
 $ SMARCA4 : num [1:140] 6.34 6.51 7.63 7.06 6.49 ...
 $ SMARCB1 : num [1:140] 5.17 4.95 5.25 5.75 4.96 ...

Using the function on the above data.

plot_adjP("SMARCA4", .data = A4, .grp = "NewGroup")

This is the ERROR:

Error in t.test.default(xi, xj, paired = paired, alternative = alternative,  : 
  not enough 'x' observations
Called from: t.test.default(xi, xj, paired = paired, alternative = alternative, 
    ...)
Browse[1]>

HijaziHassan · 2022-08-03T21:17:02Z

Is there an argument that allow significance horizontal lines to be adjacent not above each other. Because as the number of categorical variables on x-axis increase the number of co

comparisons increase and thus the significance bar will take a lot of space. The figure makes my point clearer.

kassambara · 2022-11-24T00:00:45Z

@HijaziHassan using the argument step.increase might help:

suppressPackageStartupMessages(library(ggpubr))

# Pairwise comparisons: Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p <- ggboxplot(ToothGrowth, x = "dose", y = "len",
          color = "dose", palette = "npg")

  # Add pairwise comparisons p-value
p + stat_compare_means(comparisons = my_comparisons, step.increase = 0)

^{Created on 2022-11-24 by the reprex package (v0.3.0.9001)}

QBebel · 2023-03-03T10:40:41Z

Hi,
I am working on a paper and I m in charge to do the mat & meth. But a problem occrued while I was trying to perform 3 way ANOVA and to display p-value on my plot.

Here is my code

bxp_5 <- cf %>%
rowwise() %>%
mutate(Components = wrapit(Components, width = 25)) %>%
ungroup() %>%
ggboxplot(
x = "Indicators", y = "Scores",
color = "Systems"
)+
facet_grid( ~ Components, scales = "free", switch = "x", shrink = TRUE)+
scale_color_npg()+
scale_fill_npg()+
theme_light()+
theme(axis.title.x = element_blank())+
theme(axis.title.y = element_blank())+
stat_summary(geom = "point", aes(group=Systems),
fun = "mean", shape=10, size = 1, position=position_dodge(width=0.80))
print(bxp_5)

Here what R display

Then I run my test :
res.aov_5 <- cf %>% anova_test(Scores ~ Indicators * Systems)
res.aov_5

model_5 <- lm(Scores ~ Indicators * Systems, data = cf)
cf %>%
group_by(Indicators) %>%
anova_test(Scores ~ Systems, error = model_5)

pwc_5 <- cf %>%
group_by(Indicators) %>%
emmeans_test(Scores ~ Systems, p.adjust.method = "bonferroni")
pwc_5 %>% print(n = Inf)

pwc_5 <- pwc_5 %>% add_xy_position(x = "Indicators")
bxp_5 +
stat_pvalue_manual(
pwc_5, label = "p.adj.signif", tip.length = 0.01,
coord.flip = FALSE
) +
labs(
subtitle = get_test_label(res.aov_5, detailed = TRUE),
caption = get_pwc_label(pwc_5))+
ylim(0,NA)

And here is what the R console says Error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (285): group, step.increase, bracket.nudge.y and bracket.shorten
Here what is saying when I run rlang::last_error()

Backtrace:
▆

├─base (local) <fn>(x)
└─ggplot2:::print.ggplot(x)
├─ggplot2::ggplot_build(x)
└─ggplot2:::ggplot_build.ggplot(x)

└─ggplot2 (local) by_layer(function(l, d) l$compute_aesthetics(d, plot))

  └─ggplot2 (local) f(l = layers[[i]], d = data[[i]])

    └─l$compute_aesthetics(d, plot)

      └─ggplot2 (local) f(..., self = self)

        └─ggplot2:::check_aesthetics(evaled, n)

```
          └─rlang::abort(...)
```

Thank you a lot in advance for your countless help. I can provide the dataframe if needed.

phancanhtrinh · 2023-09-15T14:43:45Z

Does anyone know the method of multiple comparison procedure in stat_compare_means() - t-test? I used this for the graph; I need to write exactly what it is.
Thank you very much!

grst closed this as completed Jan 23, 2018

grst reopened this Jan 23, 2018

adomingues mentioned this issue Jun 14, 2018

How I keep the original test after transformed ylim #36

Closed

kassambara mentioned this issue Jun 29, 2018

p.adj was replaced by p.format in the figure #86

Closed

kassambara added a commit that referenced this issue Jul 10, 2018

new function to add p-value generated elsewhere #81 #65

bc495bf

kassambara closed this as completed Jul 23, 2018

kassambara mentioned this issue Jul 23, 2018

Does not plot adjusted p-values #59

Closed

lapotok mentioned this issue Dec 9, 2018

p.adj with stat_compare_means() #143

Closed

atakanekiz mentioned this issue May 26, 2019

meet some questions with this packages #167

Closed

ShixiangWang mentioned this issue Jun 3, 2019

boxplot多组比较p值矫正 ShixiangWang/sigminer#15

Closed

kassambara reopened this Jul 15, 2020

kassambara closed this as completed Nov 24, 2022

stat_compare_means comparisons with multiple groups #65

stat_compare_means comparisons with multiple groups #65

Comments

grst commented Jan 23, 2018

kassambara commented Jan 23, 2018

grst commented Jan 23, 2018

grst commented Jan 23, 2018

kassambara commented Jan 23, 2018

grst commented Jan 23, 2018

kassambara commented Jan 23, 2018

ishengtsai commented Jan 24, 2018

kylec commented Mar 14, 2018

dregele commented Jun 6, 2018

atakanekiz commented Jun 28, 2018 • edited

kassambara commented Jun 28, 2018

atakanekiz commented Jun 28, 2018 • edited

atakanekiz commented Jun 28, 2018

kassambara commented Jun 29, 2018 • edited

Load required packages and create a demo data set

Easy solution to visualize the gene expression with p-value (comparing to the reference group)

Solution two

atakanekiz commented Jun 29, 2018 • edited

atakanekiz commented Jun 30, 2018 • edited

kassambara commented Jun 30, 2018

atakanekiz commented Jun 30, 2018

FFoQS90 commented Jul 6, 2018

kassambara commented Jul 7, 2018

FFoQS90 commented Jul 9, 2018

FFoQS90 commented Jul 12, 2018

kassambara commented Jul 23, 2018

richardbagz commented Nov 6, 2018

audyavar commented Mar 2, 2019

bznes commented Mar 15, 2019

xiaorenwu1111 commented Mar 30, 2019

youlijun commented Apr 26, 2019

youlijun commented Apr 26, 2019

Payamdel commented Oct 17, 2019 • edited

Create some example data

First example:

A tibble: 3 x 8

Second Example:

A tibble: 3 x 8

flimflamr commented Oct 30, 2019

regan-hamel commented Jun 19, 2020

kassambara commented Jun 19, 2020

LaserKate commented Jul 13, 2020

LaserKate commented Aug 1, 2020

venkan commented Nov 10, 2021

HijaziHassan commented Aug 3, 2022

kassambara commented Nov 24, 2022

QBebel commented Mar 3, 2023

phancanhtrinh commented Sep 15, 2023

atakanekiz commented Jun 28, 2018 •

edited

atakanekiz commented Jun 28, 2018 •

edited

kassambara commented Jun 29, 2018 •

edited

atakanekiz commented Jun 29, 2018 •

edited

atakanekiz commented Jun 30, 2018 •

edited

Payamdel commented Oct 17, 2019 •

edited