Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat_compare_means comparisons with multiple groups #65

Closed
grst opened this issue Jan 23, 2018 · 41 comments
Closed

stat_compare_means comparisons with multiple groups #65

grst opened this issue Jan 23, 2018 · 41 comments

Comments

@grst
Copy link

grst commented Jan 23, 2018

I have the following plot and want to want to compare HER2+ to triple-negative
for each gene (41BB, CD8A, ...) using ggpubr's stat_compare_means.
In the end I want to have 6 p-values, one over each gene.

Is that possible currently with ggpubr? Otherwise I would highly appreciate that feature to be implemented.

enter image description here

What I tried:

  • simply adding + stat_compare_means() which obviously compares all groups
  • adding + stat_compare_means(comparisons = list(c("HER2+", "triple-negative")) which fails with Computation failed in stat_signif(): missing value where TRUE/FALSE needed
  • adding + stat_compare_means(comparisons = list(c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"), c("HER2+", "triple-negative"))) which fails with the same error message.

See also my question on stackoverflow

@kassambara
Copy link
Owner

Hi,

Groups to be compared should be on x-axis not in the legends.

I would go as follow: Plot y= expression by x = status and then facet by SYMBOL.

ggplot(mydata, aes(x = status, y = expression)) +
  facet_wrap(~SYMBOL)+
  stat_compare_means(comparisons = list(c("HER2+", "triple-negative")))

This will create a multi-panel plot. Each panel correspond to a gene

@grst
Copy link
Author

grst commented Jan 23, 2018

Thanks, that's a good workaround!

@grst grst closed this as completed Jan 23, 2018
@grst
Copy link
Author

grst commented Jan 23, 2018

One issue remains: p-value adjustment does not apply for facetted plots. I think it should.
See also #59

@grst grst reopened this Jan 23, 2018
@kassambara
Copy link
Owner

To display adjusted p-values, use : stat_compare_means(aes(label=..p.adj..))

@grst
Copy link
Author

grst commented Jan 23, 2018

Yes, but it does not adjust for multiple facets.

Take the example below, it displays exactly the same p-value both for adjusted and non-adjusted:

library(ggplot2)
library(ggpubr)
library(tidyverse)

a = rnorm(mean=1, 100)
b = rnorm(mean=1.5, 100)

data = rbind_all(list(
  data.frame(facet=1, group="A", value=a),
  data.frame(facet=2, group="A", value=a),
  data.frame(facet=1, group="B", value=b),
  data.frame(facet=2, group="B", value=b)
))

data %>%
  ggplot(aes(y=value, x=group)) + 
  geom_boxplot() + 
  facet_wrap(~facet) + 
  stat_compare_means(aes(label=..p..))

data %>%
  ggplot(aes(y=value, x=group)) + 
  geom_boxplot() + 
  facet_wrap(~facet) + 
  stat_compare_means(aes(label=..p.adj..)) 

@kassambara
Copy link
Owner

ok, I can see now:

The R code below returns the adjusted p-value:

compare_means(value ~ group, group.by = "facet", data = data)

But, the function stat_compare_means() does not display the adjusted p-value.

This issue is related to the way ggplot2 facet works. When using facet, statiscal computation is applied to each single panel independently. As for each panel we have only one single comparison, the adjusted p-value remains unchanged.

I will work on this. A solution would be to give an option to users to annotate the plot using the output of compare_means()

@ishengtsai
Copy link

Hi,

I have exactly the same the problem. Just leaving a note here if an update is available.

@kylec
Copy link

kylec commented Mar 14, 2018

A hack to annotate with adjusted pvals

library(ggplot2)
library(ggpubr)
library(tidyverse)
library(ggsignif)

df = ToothGrowth

# annotation table with adjusted pvals and y-position of the labels
anno_df = compare_means(len ~ supp, group.by = "dose", data = df) %>%
 mutate(y_pos = 40)

ggplot(df, aes(x=supp, y=len)) + 
  geom_boxplot(position=position_dodge()) + 
  geom_point(aes(color=supp), position=position_jitterdodge()) + 
  facet_wrap(~dose) + 
  ggsignif::geom_signif(
    data=anno_df, 
    aes(xmin=group1, xmax=group2, annotations=p.adj, y_position=y_pos), 
    manual=TRUE
  )

screen shot 2018-03-14 at 2 59 21 pm

@dregele
Copy link

dregele commented Jun 6, 2018

Another hacky solution by modifying the ggplot object itself. This keeps the placement and formatting of the labels created by ggpubr. Should also help #86 with some adjustments.

df <- ToothGrowth

## construct plot
boxp <- ggplot(df, aes(x=supp, y=len)) + 
  geom_boxplot(position=position_dodge()) + 
  geom_point(aes(color=supp), position=position_jitterdodge()) + 
  facet_wrap(~dose) +
  stat_compare_means()

## build plot
boxp2 <- ggplot_build(boxp)

## look at data and find the label
boxp2$data

## in this case, 'label'
##  replace with adjusted p-value
boxp2$data[[3]]$label <- p.adjust(boxp2$data[[3]]$p, method = "holm")

## plot anew
plot(ggplot_gtable(boxp2))

@atakanekiz
Copy link

atakanekiz commented Jun 28, 2018

I tried these hacks but couldn't get it to work. Is there any development on this issue? In my case, what I'm trying to achieve is slightly different, however. I have 4 groups plotted on the same graph (the same variable measured) and I'd like to add p.adj values on selected pairs of the data. Any help is appreciated.

@kassambara
Copy link
Owner

Please provide a reproducible R code.

Specify the argument aes(label = ..p.adj..) in the function stat_compare_means(). Read more: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/

The following solution works for me:

library(ggpubr)
library(ggsignif)
library(tidyverse)

df <- ToothGrowth

# annotation table with adjusted pvals and y-position of the labels
anno_df <- compare_means(len ~ supp, group.by = "dose", data = df, p.adjust.method = "holm") %>%
 mutate(y_pos = 40, p.adj = format.pval(p.adj, digits = 2))

# Create a boxplot and add custom p-value
ggboxplot(df, x = "supp", y = "len", ggtheme = theme_bw())+
  facet_wrap(~dose) + 
  geom_signif(
    data=anno_df, 
    aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y_pos), 
    manual= TRUE
  )

boxplot-with-custom-p-value

@atakanekiz
Copy link

atakanekiz commented Jun 28, 2018

Hi, please see my reprex below. I tried @dregele's solution and it worked for me. It is a nice workaround but involves few extra lines. The issue of that approach is scalability, I think. I want to create a loop function in which numerous graphs are generated by going through different genes. The main problem I'm running into in this implementation is the need of passing gene names (variable names in the data.frame) quoted (for ggboxplot() function) and unquoted (for compare_means() function).

I also tried to edit the function itself to replace p value with p.adj but I wasn't able to get it to work. If ggpubr can do this in a simple addition of a string, that would be amazing! Thanks!

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr

# Simulate data frame of 20 observations for a gene across 4 groups of sample (5 each)
df <- data.frame(matrix(NA, nrow=20, ncol = 2))
colnames(df) <- c("Sample", "Gene_expr")
df$Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
df$Gene_expr <- c(rnorm(5, mean = 1, sd = 1),
                  rnorm(5, mean = 2, sd = 1.5),
                  rnorm(5, mean = 4, sd = 0.5),
                  rnorm(5, mean = 8, sd = 2))



# Comparisons I care about
my_comparisons <- list(c("Control", "Drug_1"),c("Control", "Drug_2"),c("Control", "Drug_3"))


# See all pair-wise comparisons
compare_means(Gene_expr~Sample, df, method = "t.test")
#> # A tibble: 6 x 8
#>   .y.       group1  group2        p   p.adj p.format p.signif method
#>   <chr>     <chr>   <chr>     <dbl>   <dbl> <chr>    <chr>    <chr> 
#> 1 Gene_expr Control Drug_1 0.0143   0.0178  0.01431  *        T-test
#> 2 Gene_expr Control Drug_2 0.000441 0.00265 0.00044  ***      T-test
#> 3 Gene_expr Control Drug_3 0.00121  0.00483 0.00121  **       T-test
#> 4 Gene_expr Drug_1  Drug_2 0.000950 0.00475 0.00095  ***      T-test
#> 5 Gene_expr Drug_1  Drug_3 0.00297  0.00890 0.00297  **       T-test
#> 6 Gene_expr Drug_2  Drug_3 0.00891  0.0178  0.00891  **       T-test

# This plots unadjusted p value (p.format column)
ggboxplot(df, x="Sample", y="Gene_expr", fill = "Sample")+
  stat_compare_means(comparisons=my_comparisons, method = "t.test", aes(label=..p.adj..))

Created on 2018-06-28 by the reprex package (v0.2.0).

@atakanekiz
Copy link

Tried with that order but didn't work for me (please see below). Also, a point of curiosity: Are the p.signif symbols determined based on p value, or based on the value of p.adj?

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr

set.seed(123)

# Simulate data frame of 20 observations for a gene across 4 groups of sample (5 each)
df <- data.frame(matrix(NA, nrow=20, ncol = 2))
colnames(df) <- c("Sample", "Gene_expr")
df$Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
df$Gene_expr <- c(rnorm(5, mean = 1, sd = 1),
                  rnorm(5, mean = 2, sd = 1.5),
                  rnorm(5, mean = 4, sd = 0.5),
                  rnorm(5, mean = 8, sd = 2))



# Comparisons I care about
my_comparisons <- list(c("Control", "Drug_1"),c("Control", "Drug_2"),c("Control", "Drug_3"))


# See all pair-wise comparisons
compare_means(Gene_expr~Sample, df, method = "t.test")
#> # A tibble: 6 x 8
#>   .y.       group1  group2        p   p.adj p.format p.signif method
#>   <chr>     <chr>   <chr>     <dbl>   <dbl> <chr>    <chr>    <chr> 
#> 1 Gene_expr Control Drug_1 0.425    0.425   0.42489  ns       T-test
#> 2 Gene_expr Control Drug_2 0.000521 0.00313 0.00052  ***      T-test
#> 3 Gene_expr Control Drug_3 0.00379  0.0190  0.00379  **       T-test
#> 4 Gene_expr Drug_1  Drug_2 0.0454   0.0952  0.04543  *        T-test
#> 5 Gene_expr Drug_1  Drug_3 0.00431  0.0190  0.00431  **       T-test
#> 6 Gene_expr Drug_2  Drug_3 0.0317   0.0952  0.03173  *        T-test


# This plots unadjusted p value (p.format column)
ggboxplot(df, x="Sample", y="Gene_expr", fill = "Sample")+
  stat_compare_means(mapping=aes(label=..p.adj..), comparisons=my_comparisons, method = "t.test")

Created on 2018-06-28 by the reprex package (v0.2.0).

@kassambara
Copy link
Owner

kassambara commented Jun 29, 2018

In your example, you are comparing each group against a reference, here "Control".

Load required packages and create a demo data set

library(ggpubr)
library(tidyverse)

# Create a demo gene expression data
#+++++++++++++++++++++++++++++++++++++
Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
gene1 <- c(
  rnorm(5, mean = 1, sd = 1),rnorm(5, mean = 2, sd = 1.5),
  rnorm(5, mean = 4, sd = 0.5), rnorm(5, mean = 8, sd = 2)
  )
gene2 <- 2*gene1
gene3 <- 1.5*gene1
df <- data_frame(Sample = Sample, gene1 = gene1, gene2 = gene2, gene3 = gene3)

# Inspect 6 random rows
#+++++++++++++++++++++++++++++++++++++
set.seed(123)
sample_n(df,  6)
# A tibble: 6 x 4
   Sample     gene1     gene2     gene3
                   
1  Drug_1  1.824137  3.648274  2.736206
2  Drug_2  3.475553  6.951107  5.213330
3  Drug_1  3.920832  7.841665  5.881248
4  Drug_3 10.589527 21.179053 15.884290
5  Drug_3  9.651080 19.302159 14.476620
6 Control  1.800554  3.601109  2.700831

Easy solution to visualize the gene expression with p-value (comparing to the reference group)

Adjusted p-values are shown:

ggboxplot(df, x="Sample", y="gene1", fill = "Sample")+
  stat_compare_means(
    mapping=aes(label = format.pval(..p.adj.., digits = 1)),
    method = "t.test", ref.group = "Control"
    )

rplot27

Solution two

# Helper functions to visualize one gene expression with pvalue
#+++++++++++++++++++++++++++++++++++++
plot_expr <- function(.gene, .data, .grp = "Sample", .ref.group = "Control"){
  # Compute p-values
  comparison.formula <- paste0(.gene, "~", .grp) %>%
    as.formula()
  pvalues <- compare_means(
    comparison.formula,  data = .data,
    method = "t.test", ref.group = .ref.group
    )
  # P-value y coordinates
  y.max <- .data %>% 
    pull(.gene) %>% max(na.rm = TRUE) 
  p.value.y.coord <- rep(y.max, nrow(pvalues))
  
  step.increase <- (1:nrow(pvalues))*(y.max/10)
  p.value.y.coord <- p.value.y.coord + step.increase
  
  pvalues <- pvalues %>%
    mutate(
      y.coord =  p.value.y.coord, 
      p.adj = format.pval(p.adj, digits = 1)
      )
  
  ggboxplot(.data, x = .grp, y = .gene, fill = .grp)+
    ggsignif::geom_signif(
      data = pvalues, 
      aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y.coord), 
      manual= TRUE, tip_length = 0.03, size = 0.4
    )
}

# Visualize gene 1
plot_expr("gene1", .data = df, .grp = "Sample", .ref.group = "Control")

rplot26

@atakanekiz
Copy link

atakanekiz commented Jun 29, 2018

Thanks for the insights. In my reprex, I just focused on the comparisons with control group as an example. I'm also interested in comparing the drug treatments with each other as well for my actual data analysis. How can I go about doing this in that case?

I can think of passing one of the groups (i.e. Drug_1) as the reference group, but then, I will have 3- way comparisons involving that group. Sometimes I may only want to show one of these.

My end goal is generating something like this (with p.adj values):

Created on 2018-06-29 by the reprex package (v0.2.0).

@atakanekiz
Copy link

atakanekiz commented Jun 30, 2018

(To answer my previous question)

With some help from Stack Overflow, I edited the plotting function that @kassambara wrote to include a comparisons arguments that can be specified during calling the function. Hopefully, it will be useful to others.

library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr
library(tidyverse)

# Create a demo gene expression data
set.seed(1)
Sample <- rep(c("Control", "Drug_1", "Drug_2", "Drug_3"), each = 5)
gene1 <- c(
  rnorm(5, mean = 1, sd = 1),rnorm(5, mean = 2, sd = 1.5),
  rnorm(5, mean = 4, sd = 0.5), rnorm(5, mean = 8, sd = 2)
)
gene2 <- 2*gene1
gene3 <- 1.5*gene1
df <- data_frame(Sample = Sample, gene1 = gene1, gene2 = gene2, gene3 = gene3)

# Inspect 6 random rows
set.seed(123)
sample_n(df,  6)
#> # A tibble: 6 x 4
#>   Sample  gene1  gene2  gene3
#>   <chr>   <dbl>  <dbl>  <dbl>
#> 1 Drug_1  0.769  1.54   1.15 
#> 2 Drug_2  4.56   9.12   6.84 
#> 3 Drug_1  3.11   6.21   4.66 
#> 4 Drug_3  7.91  15.8   11.9  
#> 5 Drug_3  7.97  15.9   12.0  
#> 6 Control 0.374  0.747  0.560



plot_adjP <- function(.gene, .data, .grp = "Sample", comparisons = NULL){
  # Compute p-values
  comparison.formula <- paste0(.gene, "~", .grp) %>%
    as.formula()
  pvalues <- compare_means(
    comparison.formula,  data = .data,
    method = "t.test"
  )
  
  # If a comparison list is provided, extract the comparisons of interest for plotting
  if(!is.null(comparisons)){
  comparisons <- list(c("Control", "Drug_1"), c("Drug_2" ,"Drug_3"))
  pvalues <- map_df(comparisons, ~ pvalues %>% filter(group1 == .x[1] & group2 == .x[2]))
  
  }
  
  # P-value y coordinates
  y.max <- .data %>% 
    pull(.gene) %>% max(na.rm = TRUE) 
  p.value.y.coord <- rep(y.max, nrow(pvalues))
  
  step.increase <- (1:nrow(pvalues))*(y.max/10)
  p.value.y.coord <- p.value.y.coord + step.increase
  
  pvalues <- pvalues %>%
    mutate(
      y.coord =  p.value.y.coord, 
      p.adj = format.pval(p.adj, digits = 1)
    )
  
  ggboxplot(.data, x = .grp, y = .gene, fill = .grp)+
    ggsignif::geom_signif(
      data = pvalues, 
      aes(xmin = group1, xmax = group2, annotations = p.adj, y_position = y.coord), 
      manual= TRUE, tip_length = 0.03, size = 0.4
    )
}

comparisons <- list(c("Control", "Drug_1"), c("Drug_2", "Drug_3"))

# Visualize gene 1
plot_adjP("gene1", .data = df, .grp = "Sample", comparisons = comparisons)
#> Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

plot_adjP("gene1", .data = df, .grp = "Sample")
#> Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

@kassambara
Copy link
Owner

Oh great, thank you for your contribution.

I'll add new features in ggpubr, to simplify this process.

I think also that, when users provide a specific number of comparisons to be performed, the pvalues should be adjusted/re-adjusted according to that number of tests, but not according to the number of all possible comparisons.

So in the above helper function, after filtering (using map_df), one might need to re-adjust the p-values. I'll take this into account in the new features

@atakanekiz
Copy link

Perfect! Thanks! Ggpubr rocks

@FFoQS90
Copy link

FFoQS90 commented Jul 6, 2018

Hi,
I am trying to include p-values in my plot with stat_compare_means but I am having problems when I want to plot multiple groups.
So I want to do pairwise comparisons of 3 groups, which works out fine with only one column:

grafik

But when I want to do this for multiple columns I get this:

grafik

It is only plotting p-values from 2 comparisons for the first column and 1 for the second....What am I doing wrong here?

When I run compare_means it seems to calculate all the possible combinations just fine.

Here my code to reproduce the problem with some test data:

  • create test data.frame
    Species <- c("a","b","c","d","e")
    Location <- c("LocA","LocB","LocC")
    values <- runif(60,min = 0, max = 150000)
    data <- data.frame(Species,Location,values)

  • make list with comparisons
    test_comparisons <- list(c("LocA","LocB"),c("LocB", "LocC"),c("LocA", "LocC"))

  • plot
    testplot <- ggplot(data, aes(x=Location, y=values, fill=Species))+geom_bar(stat = "identity", position = "dodge")+scale_y_log10()+facet_grid(~Species)+stat_compare_means(comparisons = test_comparisons, label = "p.signif")

  • run compare means
    test<-compare_means(values~Location,data = data, method = "wilcox.test",group.by = "Species")

@kassambara
Copy link
Owner

@FFoQS90: the problem is that you're trying to add p-values by comparing one numeric value vs one numeric value. You should compare groups of values.

The following R code works for me.

library(ggpubr)
ggboxplot(
  ToothGrowth, x = "dose", y = "len", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

rplot

ggbarplot(
  ToothGrowth, x = "dose", y = "len",
  add = "mean_se", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

rplot32

@FFoQS90
Copy link

FFoQS90 commented Jul 9, 2018

Thank you for helping me out here.
Actually, I am comparing groups of values. Only 4 per group in the case of the test dataset but that should be sufficient for demonstration reasons.

I could get it to work by using ggbarplot or ggboxplot(...facet.by=..) instead of ggplot(data, aes..)+geom_bar()+facet_grid()!!

Is it possible that geom_bar() or facet_grid() are somehow causing the problem?

@FFoQS90
Copy link

FFoQS90 commented Jul 12, 2018

Thanks, the new function stat_pvalue_manual works great!!
I really appreciate your work on ggpubr. It is an awesome package.

@kassambara
Copy link
Owner

Now, the new function stat_pvalue_manual() [In ggpubr] can be used to add manually p-values/adjusted p-values generated elsewhere:

library(tidyverse)
library(rstatix)   
library(ggpubr)

# Pairwise t-test between groups
stat.test <- ToothGrowth %>%
  group_by(dose) %>%
  t_test(len ~ supp) %>%
  adjust_pvalue() %>%
  mutate(y.position = 35)
stat.test
# A tibble: 3 x 9
   dose   .y. group1 group2  statistic      p method  p.adj y.position
                         
1   0.5   len     OJ     VC  3.1697328 0.0064 T-test 0.0128         35
2   1.0   len     OJ     VC  4.0327696 0.0010 T-test 0.0030         35
3   2.0   len     OJ     VC -0.0461361 0.9600 T-test 0.9600         35
# Create a box plot and add the p-value
p <- ggboxplot(
  ToothGrowth, x = "supp", y = "len",
  color = "supp", palette = "jco",
  facet.by = "dose", ylim = c(0, 40)
)

p + stat_pvalue_manual(stat.test, label = "p.adj")

rplot

@richardbagz
Copy link

Hi, Is it possible to not have to specify the y.position i.e. coordinates for the absolute positioning of the p-value label?
I have facetted plots for which I used facet_wrap (~ data , scales = "free_y") to get the y-axis scale to be "free".

Looking forward to a work around this issue.
Thanks

@audyavar
Copy link

audyavar commented Mar 2, 2019

Is it possible to do stat_compare_means on a stacked barplot between 2-factor levels of genes (x-axis) vs 2-factor levels on y-axis?

@bznes
Copy link

bznes commented Mar 15, 2019

Part of the issue may be the version of ggpubr. I posted a similar question earlier this week:
https://github.com/kassambara/ggpubr/issues/48#issue-272641556))

While I had just downloaded the package and thought I was running the most current package, sessionInfo() said I was running 0.1.5. After I updated to ggpubr_0.2, I was able to get stat_compare_means to recognize data=subset(data,by=Treatment) for my comparisons. One thing to note, use facet.by="Treatment" in the ggbarplot line to check that the pvalues shown when you do the combined plot are for the subsetted data. When it wasn't working in the older version, I would get a p-value for the pooled group (treated and untreated), ignoring the selected data. This may help part of your problem (and provide a work around if you put every comparison you want in the my_comparisons) but I have not found a way to do those comparisons automatically.
my_comparisons=list(c("C1","C2"), c("C1","D"),c("C2","D"),c("D","E"),c("D","F"))
d.treated <- d[d$Treatment == "treated",]
ggbarplot(d=d, x = "Group",y="response",color="Treatment",add = "mean_se",position = position_dodge(0.8))+ stat_compare_means(data=d.treated,comparisons=my_comparisons)
image
ggbarplot(d=d, x = "Group",y="response",color="Treatment",facet.by="Treatment",add = "mean_se",position = position_dodge(0.8))+ stat_compare_means(comparisons=my_comparisons)
image

My question now is if there is a way to move the brackets above the first plot to center over the data being compared, not centered over the group for both treatment levels?

@xiaorenwu1111
Copy link

@FFoQS90: the problem is that you're trying to add p-values by comparing one numeric value vs one numeric value. You should compare groups of values.

The following R code works for me.

library(ggpubr)
ggboxplot(
  ToothGrowth, x = "dose", y = "len", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

rplot

ggbarplot(
  ToothGrowth, x = "dose", y = "len",
  add = "mean_se", 
  facet.by = "supp", ylim = c(0, 45)
  ) +
  stat_compare_means(
    comparisons = list(c("0.5", "1"), c("1", "2"),  c("0.5", "2")), 
    label = "p.signif"
    )

rplot32

Hello,
I met some questions when used this packages.Could you help me?
This is my code:
library(ggpubr)
ggboxplot(
distance, x="Group", y="Distance",
add = "mean_se",
facet.by = "Method"
) +
stat_compare_means(
comparisons = list(c("A", "B"), c("B", "C"), c("C", "D")),
label = "p.signif"
)
This is my data structures:
Group Distance Method
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
A 0 Hamming distance
B 162 Hamming distance
B 151 Hamming distance
B 90 Hamming distance
B 150 Hamming distance
B 131 Hamming distance
B 107 Hamming distance
B 145 Hamming distance
B 87 Hamming distance
B 103 Hamming distance
B 96 Hamming distance
B 114 Hamming distance
B 102 Hamming distance
B 103 Hamming distance
B 91 Hamming distance
B 71 Hamming distance
B 77 Hamming distance
B 77 Hamming distance
B 67 Hamming distance
B 40 Hamming distance
C 179 Hamming distance
C 167 Hamming distance
C 109 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 152 Hamming distance
C 129 Hamming distance
C 89 Hamming distance
C 86 Hamming distance
C 109 Hamming distance
C 86 Hamming distance
C 93 Hamming distance
C 89 Hamming distance
C 80 Hamming distance
C 55 Hamming distance
D 275 Hamming distance
D 250 Hamming distance
D 193 Hamming distance
D 241 Hamming distance
D 235 Hamming distance
D 186 Hamming distance
D 240 Hamming distance
D 174 Hamming distance
D 183 Hamming distance
D 193 Hamming distance
D 171 Hamming distance
D 182 Hamming distance
D 169 Hamming distance
D 159 Hamming distance
D 141 Hamming distance
D 131 Hamming distance
D 122 Hamming distance
D 111 Hamming distance
D 94 Hamming distance
I want to draw a picture like this:
42407858-33b43f68-81c3-11e8-8668-67c916b007d3

But I used above data and code can't get similar picture.
Rplot03

And I got some warning messages:
Warning messages:
1: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
2: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
3: Computation failed in stat_signif():
missing value where TRUE/FALSE needed
Could you help me ? Thank you !

@youlijun
Copy link

Hello everyone,
I am very interested in this, but I am in trouble.
I want to calculate the p values of gene1, gene2, and gene3 at one time instead of one by one. how to do ?I only want batch calculation p-value

@youlijun
Copy link

@kassambara

@Payamdel
Copy link

Payamdel commented Oct 17, 2019

Hi
When using compare_means() to perform the test for multiple response variables at the same time, the p-values are not adjusted for multiple hypothesis testing across the variables!

Any idea to solve the issue?

Create some example data

set.seed(42) # to make the example reproducible
study_data <- data.frame(
  group = factor(c(rep("healthy", 50), rep("patient", 50))),
  responseA = c(rnorm(50, mean = 20, sd = 2), rnorm(50, mean = 22, sd = 3)),
  responseB = c(rnorm(50, mean = 12, sd = 1), rnorm(50, mean = 22, sd = 1)),
  responseC = c(rnorm(50, mean = 2.5, sd = 1), rnorm(50, mean = 3.5, sd = 1)))

First example:

  compare_means(
  c(responseA, responseB, responseC) ~ group, 
					study_data, 
					method = "wilcox.test", 
					p.adjust.method = "BH")

A tibble: 3 x 8

  .y.       group1  group2         p    p.adj p.format p.signif method  
  <fct>     <chr>   <chr>      <dbl>    <dbl> <chr>    <chr>    <chr>   
1 responseA healthy patient 1.57e- 5 1.60e- 5 1.6e-05  ****     Wilcoxon
2 responseB healthy patient 7.07e-18 7.10e-18 < 2e-16  ****     Wilcoxon
3 responseC healthy patient 1.47e- 5 1.50e- 5 1.5e-05  ****     Wilcoxon

Second Example:

  compare_means(
  c(responseA, responseB, responseC) ~ group, 
					study_data, 
					method = "wilcox.test", 
					p.adjust.method = "bonferroni")

A tibble: 3 x 8

  .y.       group1  group2         p    p.adj p.format p.signif method  
  <fct>     <chr>   <chr>      <dbl>    <dbl> <chr>    <chr>    <chr>   
1 responseA healthy patient 1.57e- 5 1.60e- 5 1.6e-05  ****     Wilcoxon
2 responseB healthy patient 7.07e-18 7.10e-18 < 2e-16  ****     Wilcoxon
3 responseC healthy patient 1.47e- 5 1.50e- 5 1.5e-05  ****     Wilcoxon

@flimflamr
Copy link

Hi,
silly little thing but driving me crazy: I can't get a "p=" in front of p-values when there are multiple comparisons in a plot, i.e. "stat_compare_means(aes(label = paste0("p = ", ..p.format..)),comparisons = Comps)", does not change the label and only gives the p-values. Nowhere online or in this forum have I seen "p=" in such cases, so I assume it isnt just me...

@regan-hamel
Copy link

if there is a way to move the brackets above the first plot to center over the data being compared, not centered over the group for both treatment levels?

Was this ever resolved?

@kassambara
Copy link
Owner

To add p-values onto grouped plots, please consider the following blog post:

How to Add P-Values onto a Grouped GGPLOT using the GGPUBR R Package

First, make sure you have installed the lates dev version of rstatix and ggpubr before following the tutorial:

devtools::install_github("kassambara/rstatix")
devtools::install_github("kassambara/ggpubr")

@LaserKate
Copy link

Could this be elaborated to multiple facets? For example:
df %>%
ggboxplot(x = "x", y = "y",
color = "y",
add = "jitter") +facet_grid(.~Species+Region, scales="free_x")

@kassambara kassambara reopened this Jul 15, 2020
@LaserKate
Copy link

Potential solution to when you have multiple facets:
test< -df %>%
ggboxplot(x = "x", y = "y",
color = "y",
add = "jitter") +facet_grid(.~Species+Region, scales="free_x")

stat.test <- df %>%
group_by(Species,Region) %>%
t_test(y ~ x) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance("p.adj") %>%
add_xy_position(x = "x", dodge = 0.8)

test + stat_pvalue_manual(
stat.test, label = "{p.adj}", tip.length = 0
)

@venkan
Copy link

venkan commented Nov 10, 2021

I am having an Error while using plot_adjP. I used this function on my data which is like below and see the following Error.

str(A4)
tibble [140 × 3] (S3: tbl_df/tbl/data.frame)
 $ NewGroup: chr [1:140] "EATL" "EATL" "EATL" "EATL" ...
 $ SMARCA4 : num [1:140] 6.34 6.51 7.63 7.06 6.49 ...
 $ SMARCB1 : num [1:140] 5.17 4.95 5.25 5.75 4.96 ...

Using the function on the above data.

plot_adjP("SMARCA4", .data = A4, .grp = "NewGroup")

This is the ERROR:

Error in t.test.default(xi, xj, paired = paired, alternative = alternative,  : 
  not enough 'x' observations
Called from: t.test.default(xi, xj, paired = paired, alternative = alternative, 
    ...)
Browse[1]> 

@HijaziHassan
Copy link

Is there an argument that allow significance horizontal lines to be adjacent not above each other. Because as the number of categorical variables on x-axis increase the number of co
significancy
comparisons increase and thus the significance bar will take a lot of space. The figure makes my point clearer.

@kassambara
Copy link
Owner

@HijaziHassan using the argument step.increase might help:

suppressPackageStartupMessages(library(ggpubr))

# Pairwise comparisons: Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p <- ggboxplot(ToothGrowth, x = "dose", y = "len",
          color = "dose", palette = "npg")

  # Add pairwise comparisons p-value
p + stat_compare_means(comparisons = my_comparisons, step.increase = 0)

Created on 2022-11-24 by the reprex package (v0.3.0.9001)

@QBebel
Copy link

QBebel commented Mar 3, 2023

Hi,
I am working on a paper and I m in charge to do the mat & meth. But a problem occrued while I was trying to perform 3 way ANOVA and to display p-value on my plot.

Here is my code

bxp_5 <- cf %>%
rowwise() %>%
mutate(Components = wrapit(Components, width = 25)) %>%
ungroup() %>%
ggboxplot(
x = "Indicators", y = "Scores",
color = "Systems"
)+
facet_grid( ~ Components, scales = "free", switch = "x", shrink = TRUE)+
scale_color_npg()+
scale_fill_npg()+
theme_light()+
theme(axis.title.x = element_blank())+
theme(axis.title.y = element_blank())+
stat_summary(geom = "point", aes(group=Systems),
fun = "mean", shape=10, size = 1, position=position_dodge(width=0.80))
print(bxp_5)

Here what R display
image

Then I run my test :
res.aov_5 <- cf %>% anova_test(Scores ~ Indicators * Systems)
res.aov_5

model_5 <- lm(Scores ~ Indicators * Systems, data = cf)
cf %>%
group_by(Indicators) %>%
anova_test(Scores ~ Systems, error = model_5)

pwc_5 <- cf %>%
group_by(Indicators) %>%
emmeans_test(Scores ~ Systems, p.adjust.method = "bonferroni")
pwc_5 %>% print(n = Inf)

pwc_5 <- pwc_5 %>% add_xy_position(x = "Indicators")
bxp_5 +
stat_pvalue_manual(
pwc_5, label = "p.adj.signif", tip.length = 0.01,
coord.flip = FALSE
) +
labs(
subtitle = get_test_label(res.aov_5, detailed = TRUE),
caption = get_pwc_label(pwc_5))+
ylim(0,NA)

And here is what the R console says Error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (285): group, step.increase, bracket.nudge.y and bracket.shorten
Here what is saying when I run rlang::last_error()

Backtrace:

  1. ├─base (local) <fn>(x)
  2. └─ggplot2:::print.ggplot(x)
  3. ├─ggplot2::ggplot_build(x)
  4. └─ggplot2:::ggplot_build.ggplot(x)
  5. └─ggplot2 (local) by_layer(function(l, d) l$compute_aesthetics(d, plot))
    
  6.   └─ggplot2 (local) f(l = layers[[i]], d = data[[i]])
    
  7.     └─l$compute_aesthetics(d, plot)
    
  8.       └─ggplot2 (local) f(..., self = self)
    
  9.         └─ggplot2:::check_aesthetics(evaled, n)
    
  10.           └─rlang::abort(...)
    

Thank you a lot in advance for your countless help. I can provide the dataframe if needed.

@phancanhtrinh
Copy link

Does anyone know the method of multiple comparison procedure in stat_compare_means() - t-test? I used this for the graph; I need to write exactly what it is.
Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests