Project Name  Stars  Downloads  Repos Using This  Packages Using This  Most Recent Commit  Total Releases  Latest Release  Open Issues  License  Language 

Stan  2,411  2 days ago  167  bsd3clause  C++  
Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.  
Turing.jl  1,815  2 days ago  69  mit  Julia  
Bayesian inference with probabilistic programming.  
Pymc Resources  1,722  3 months ago  31  mit  Jupyter Notebook  
PyMC educational resources  
Pytorch Bayesiancnn  1,181  2 months ago  15  mit  Python  
Bayesian Convolutional Neural Network with Variational Inference based on Bayes by Backprop in PyTorch.  
Rstan  952  5 days ago  325  C++  
RStan, the R interface to Stan  
Bayesian Stats Modelling Tutorial  601  a year ago  14  mit  Jupyter Notebook  
How to do Bayesian statistical modelling using numpy and PyMC3  
Bayesian Analysis Recipes  510  a year ago  2  mit  Jupyter Notebook  
A collection of Bayesian data analysis recipes using PyMC3  
Soss.jl  401  a month ago  106  mit  Julia  
Probabilistic programming via source rewriting  
Rstanarm  341  2 months ago  148  gpl3.0  R  
rstanarm R package for Bayesian applied regression modeling  
Statsexpressions  295  1  2  3 days ago  26  August 11, 2022  17  other  R  
Tidy data frames and expressions with statistical summaries 📜 
{statsExpressions}
: Tidy dataframes and expressions with statistical detailsStatus  Usage  Miscellaneous 

The {statsExpressions}
package has two key aims:
Statistical packages exhibit substantial diversity in terms of their syntax and expected input type. This can make it difficult to switch from one statistical approach to another. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, different functions might expect data to be in wide or long format. Some functions can internally omit missing values, while other functions error in their presence. Furthermore, if someone wishes to utilize the objects returned by these packages downstream in their workflow, this is not straightforward either because even functions from the same package can return a list, a matrix, an array, a dataframe, etc., depending on the function.
This is where {statsExpressions}
comes in: It can be thought of as a
unified portal through which most of the functionality in these
underlying packages can be accessed, with a simpler interface and no
requirement to change data format.
This package forms the statistical processing backend for
ggstatsplot
package.
For more documentation, see the dedicated website.
Type  Source  Command 

Release  install.packages("statsExpressions") 

Development  pak::pak("IndrajeetPatil/statsExpressions") 
The package can be cited as:
citation("statsExpressions")
To cite package 'statsExpressions' in publications use:
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes
and Expressions with Statistical Details. Journal of Open Source
Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
A BibTeX entry for LaTeX users is
@Article{,
doi = {10.21105/joss.03236},
url = {https://doi.org/10.21105/joss.03236},
year = {2021},
publisher = {{The Open Journal}},
volume = {6},
number = {61},
pages = {3236},
author = {Indrajeet Patil},
title = {{statsExpressions: {R} Package for Tidy Dataframes and Expressions with Statistical Details}},
journal = {{Journal of Open Source Software}},
}
Summary of available analyses
Test  Function 

onesample ttest  one_sample_test() 
twosample ttest  two_sample_test() 
oneway ANOVA  oneway_anova() 
correlation analysis  corr_test() 
contingency table analysis  contingency_table() 
metaanalysis  meta_analysis() 
pairwise comparisons  pairwise_comparisons() 
Summary of details available for analyses
Analysis  Hypothesis testing  Effect size estimation 

(one/twosample) ttest  
oneway ANOVA  
correlation  
(one/twoway) contingency table  
randomeffects metaanalysis 
Summary of supported statistical approaches
Description  Parametric  Nonparametric  Robust  Bayesian 

Between group/condition comparisons  
Within group/condition comparisons  
Distribution of a numeric variable  
Correlation between two variables  
Association between categorical variables  
Equal proportions for categorical variable levels  
Randomeffects metaanalysis 
To illustrate the simplicity of this syntax, lets say we want to run a oneway ANOVA. If we first run a nonparametric ANOVA and then decide to run a robust ANOVA instead, the syntax remains the same and the statistical approach can be modified by changing a single argument:
mtcars %>% oneway_anova(cyl, wt, type = "nonparametric")
#> # A tibble: 1 15
#> parameter1 parameter2 statistic df.error p.value
#> <chr> <chr> <dbl> <int> <dbl>
#> 1 wt cyl 22.8 2 0.0000112
#> method effectsize estimate conf.level conf.low
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 KruskalWallis rank sum test Epsilon2 (rank) 0.736 0.95 0.624
#> conf.high conf.method conf.iterations n.obs expression
#> <dbl> <chr> <int> <int> <list>
#> 1 1 percentile bootstrap 100 32 <language>
mtcars %>% oneway_anova(cyl, wt, type = "robust")
#> # A tibble: 1 12
#> statistic df df.error p.value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.7 2 12.2 0.00102
#> method
#> <chr>
#> 1 A heteroscedastic oneway ANOVA for trimmed means
#> effectsize estimate conf.level conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Explanatory measure of effect size 1.05 0.95 0.843 1.50
#> n.obs expression
#> <int> <list>
#> 1 32 <language>
All possible output dataframes from functions are tabulated here: https://indrajeetpatil.github.io/statsExpressions/articles/web_only/dataframe_outputs.html
Needless to say this will also work with the kable
function to
generate a table:
set.seed(123)
# onesample robust ttest
# we will leave `expression` column out; it's not needed for using only the dataframe
mtcars %>%
one_sample_test(wt, test.value = 3, type = "robust") %>%
dplyr::select(expression) %>%
knitr::kable()
statistic  p.value  n.obs  method  effectsize  estimate  conf.level  conf.low  conf.high 

1.179181  0.275  32  Bootstrapt method for onesample test  Trimmed mean  3.197  0.95  2.854246  3.539754 
These functions are also compatible with other popular data manipulation packages.
For example, lets say we want to run a onesample ttest for all
levels of a certain grouping variable. We can use dplyr
to do so:
# for reproducibility
set.seed(123)
library(dplyr)
# grouped operation
# running onesample test for all levels of grouping variable `cyl`
mtcars %>%
group_by(cyl) %>%
group_modify(~ one_sample_test(.x, wt, test.value = 3), .keep = TRUE) %>%
ungroup()
#> # A tibble: 3 16
#> cyl mu statistic df.error p.value method alternative
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 4 3 4.16 10 0.00195 One Sample ttest two.sided
#> 2 6 3 0.870 6 0.418 One Sample ttest two.sided
#> 3 8 3 4.92 13 0.000278 One Sample ttest two.sided
#> effectsize estimate conf.level conf.low conf.high conf.method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Hedges' g 1.16 0.95 1.88 0.402 ncp
#> 2 Hedges' g 0.286 0.95 0.388 0.937 ncp
#> 3 Hedges' g 1.24 0.95 0.544 1.91 ncp
#> conf.distribution n.obs expression
#> <chr> <int> <list>
#> 1 t 11 <language>
#> 2 t 7 <language>
#> 3 t 14 <language>
Note that expression here means a preformatted intext statistical
result. In addition to other details contained in the dataframe, there
is also a column titled expression
, which contains expression with
statistical details and can be displayed in a plot.
For all statistical test expressions, the default template attempt to follow the gold standard for statistical reporting.
For example, here are results from Welchs ttest:
Lets load the needed library for visualization:
library(ggplot2)
Note that when used in a geometric layer, the expression need to be parsed.
# displaying mean for each level of `cyl`
centrality_description(mtcars, cyl, wt) >
ggplot(aes(cyl, wt)) +
geom_point() +
geom_label(aes(label = expression), parse = TRUE)
Here are a few examples for supported analyses.
The returned data frame will always have a column called expression
.
Assuming there is only a single result you need to display in a plot, to use it in a plot, you have two options:
results_data$expression[[1]]
) without parsingparse(text = results_data$expression)
)If you want to display more than one expression in a plot, you will have to parse them.
set.seed(123)
library(ggridges)
results_data < oneway_anova(iris, Species, Sepal.Length, type = "robust")
# create a ridgeplot
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges() +
labs(
title = "A heteroscedastic oneway ANOVA for trimmed means",
subtitle = results_data$expression[[1]]
)
set.seed(123)
library(WRS2)
library(ggbeeswarm)
results_data < oneway_anova(
WineTasting,
Wine,
Taste,
paired = TRUE,
subject.id = Taster,
type = "np"
)
ggplot2::ggplot(WineTasting, aes(Wine, Taste, color = Wine)) +
geom_quasirandom() +
labs(
title = "Friedman's rank sum test",
subtitle = parse(text = results_data$expression)
)
set.seed(123)
library(gghalves)
results_data < two_sample_test(ToothGrowth, supp, len)
ggplot(ToothGrowth, aes(supp, len)) +
geom_half_dotplot() +
labs(
title = "TwoSample Welch's ttest",
subtitle = parse(text = results_data$expression)
)
set.seed(123)
library(tidyr)
library(PairedData)
data(PrisonStress)
# get data in tidy format
df < pivot_longer(PrisonStress, starts_with("PSS"), names_to = "PSS", values_to = "stress")
results_data < two_sample_test(
data = df,
x = PSS,
y = stress,
paired = TRUE,
subject.id = Subject,
type = "np"
)
# plot
paired.plotProfiles(PrisonStress, "PSSbefore", "PSSafter", subjects = "Subject") +
labs(
title = "Twosample Wilcoxon paired test",
subtitle = parse(text = results_data$expression)
)
set.seed(123)
# dataframe with results
results_data < one_sample_test(mtcars, wt, test.value = 3, type = "bayes")
# creating a histogram plot
ggplot(mtcars, aes(wt)) +
geom_histogram(alpha = 0.5) +
geom_vline(xintercept = mean(mtcars$wt), color = "red") +
labs(subtitle = parse(text = results_data$expression))
Lets look at another example where we want to run correlation analysis:
set.seed(123)
# dataframe with results
results_data < corr_test(mtcars, mpg, wt, type = "nonparametric")
# create a scatter plot
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
title = "Spearman's rank correlation coefficient",
subtitle = parse(text = results_data$expression)
)
For categorical/nominal data  onesample:
set.seed(123)
# dataframe with results
results_data < contingency_table(
as.data.frame(table(mpg$class)),
Var1,
counts = Freq,
type = "bayes"
)
# create a pie chart
ggplot(as.data.frame(table(mpg$class)), aes(x = "", y = Freq, fill = factor(Var1))) +
geom_bar(width = 1, stat = "identity") +
theme(axis.line = element_blank()) +
# cleaning up the chart and adding results from onesample proportion test
coord_polar(theta = "y", start = 0) +
labs(
fill = "Class",
x = NULL,
y = NULL,
title = "Pie Chart of class (type of car)",
caption = parse(text = results_data$expression)
)
You can also use these function to get the expression in return without having to display them in plots:
set.seed(123)
# Pearson's chisquared test of independence
contingency_table(mtcars, am, vs)$expression[[1]]
#> list(chi["Pearson"]^2 * "(" * 1 * ")" == "0.91", italic(p) ==
#> "0.34", widehat(italic("V"))["Cramer"] == "0.00", CI["95%"] ~
#> "[" * "0.00", "1.00" * "]", italic("n")["obs"] == "32")
set.seed(123)
library(metaviz)
library(metaplus)
# dataframe with results
results_data < meta_analysis(dplyr::rename(mozart, estimate = d, std.error = se))
# metaanalysis forest plot with results randomeffects metaanalysis
viz_forest(
x = mozart[, c("d", "se")],
study_labels = mozart[, "study_name"],
xlab = "Cohen's d",
variant = "thick",
type = "cumulative"
) +
labs(
title = "Metaanalysis of Pietschnig, Voracek, and Formann (2010) on the Mozart effect",
subtitle = parse(text = results_data$expression)
) +
theme(text = element_text(size = 12))
Sometimes you may not wish include so many details in the subtitle. In that case, you can extract the expression and copypaste only the part you wish to include. For example, here only statistic and pvalues are included:
set.seed(123)
# extracting detailed expression
(res_expr < oneway_anova(iris, Species, Sepal.Length, var.equal = TRUE)$expression[[1]])
#> list(italic("F")["Fisher"](2, 147) == "119.26", italic(p) ==
#> "1.67e31", widehat(omega["p"]^2) == "0.61", CI["95%"] ~
#> "[" * "0.53", "1.00" * "]", italic("n")["obs"] == "150")
# adapting the details to your liking
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
labs(subtitle = ggplot2::expr(paste(
NULL, italic("F"), "(", "2", ",", "147", ") = ", "119.26", ", ",
italic("p"), " = ", "1.67e31"
)))
Here a goto summary about statistical test carried out and the returned
effect size for each function is provided. This should be useful if one
needs to find out more information about how an argument is resolved in
the underlying package or if one wishes to browse the source code. So,
for example, if you want to know more about how oneway
(betweensubjects) ANOVA, you can run ?stats::oneway.test
in your R
console.
centrality_description
Type  Measure  Function used 

Parametric  mean  datawizard::describe_distribution() 
Nonparametric  median  datawizard::describe_distribution() 
Robust  trimmed mean  datawizard::describe_distribution() 
Bayesian  MAP  datawizard::describe_distribution() 
oneway_anova
Hypothesis testing
Type  No.of groups  Test  Function used 

Parametric  > 2  Fishers or Welchs oneway ANOVA  stats::oneway.test() 
Nonparametric  > 2  KruskalWallis oneway ANOVA  stats::kruskal.test() 
Robust  > 2  Heteroscedastic oneway ANOVA for trimmed means  WRS2::t1way() 
Bayes Factor  > 2  Fishers ANOVA  BayesFactor::anovaBF() 
Effect size estimation
Type  No.of groups  Effect size  CI available?  Function used 

Parametric  > 2  partial etasquared, partial omegasquared  Yes 
effectsize::omega_squared() , effectsize::eta_squared()

Nonparametric  > 2  rank epsilon squared  Yes  effectsize::rank_epsilon_squared() 
Robust  > 2  Explanatory measure of effect size  Yes  WRS2::t1way() 
Bayes Factor  > 2  Bayesian Rsquared  Yes  performance::r2_bayes() 
Hypothesis testing
Type  No.of groups  Test  Function used 

Parametric  > 2  Oneway repeated measures ANOVA  afex::aov_ez() 
Nonparametric  > 2  Friedman rank sum test  stats::friedman.test() 
Robust  > 2  Heteroscedastic oneway repeated measures ANOVA for trimmed means  WRS2::rmanova() 
Bayes Factor  > 2  Oneway repeated measures ANOVA  BayesFactor::anovaBF() 
Effect size estimation
Type  No.of groups  Effect size  CI available?  Function used 

Parametric  > 2  partial etasquared, partial omegasquared  Yes 
effectsize::omega_squared() , effectsize::eta_squared()

Nonparametric  > 2  Kendalls coefficient of concordance  Yes  effectsize::kendalls_w() 
Robust  > 2  AlginaKeselmanPenfield robust standardized difference average  Yes  WRS2::wmcpAKP() 
Bayes Factor  > 2  Bayesian Rsquared  Yes  performance::r2_bayes() 
two_sample_test
Hypothesis testing
Type  No.of groups  Test  Function used 

Parametric  2  Students or Welchs ttest  stats::t.test() 
Nonparametric  2  MannWhitney U test  stats::wilcox.test() 
Robust  2  Yuens test for trimmed means  WRS2::yuen() 
Bayesian  2  Students ttest  BayesFactor::ttestBF() 
Effect size estimation
Type  No.of groups  Effect size  CI available?  Function used 

Parametric  2  Cohens d, Hedges g  Yes 
effectsize::cohens_d() , effectsize::hedges_g()

Nonparametric  2  r (rankbiserial correlation)  Yes  effectsize::rank_biserial() 
Robust  2  AlginaKeselmanPenfield robust standardized difference  Yes  WRS2::akp.effect() 
Bayesian  2  difference  Yes  bayestestR::describe_posterior() 
Hypothesis testing
Type  No.of groups  Test  Function used 

Parametric  2  Students ttest  stats::t.test() 
Nonparametric  2  Wilcoxon signedrank test  stats::wilcox.test() 
Robust  2  Yuens test on trimmed means for dependent samples  WRS2::yuend() 
Bayesian  2  Students ttest  BayesFactor::ttestBF() 
Effect size estimation
Type  No.of groups  Effect size  CI available?  Function used 

Parametric  2  Cohens d, Hedges g  Yes 
effectsize::cohens_d() , effectsize::hedges_g()

Nonparametric  2  r (rankbiserial correlation)  Yes  effectsize::rank_biserial() 
Robust  2  AlginaKeselmanPenfield robust standardized difference  Yes  WRS2::wmcpAKP() 
Bayesian  2  difference  Yes  bayestestR::describe_posterior() 
one_sample_test
Hypothesis testing
Type  Test  Function used 

Parametric  Onesample Students ttest  stats::t.test() 
Nonparametric  Onesample Wilcoxon test  stats::wilcox.test() 
Robust  Bootstrapt method for onesample test  WRS2::trimcibt() 
Bayesian  Onesample Students ttest  BayesFactor::ttestBF() 
Effect size estimation
Type  Effect size  CI available?  Function used 

Parametric  Cohens d, Hedges g  Yes 
effectsize::cohens_d() , effectsize::hedges_g()

Nonparametric  r (rankbiserial correlation)  Yes  effectsize::rank_biserial() 
Robust  trimmed mean  Yes  WRS2::trimcibt() 
Bayes Factor  difference  Yes  bayestestR::describe_posterior() 
corr_test
Hypothesis testing and Effect size estimation
Type  Test  CI available?  Function used 

Parametric  Pearsons correlation coefficient  Yes  correlation::correlation() 
Nonparametric  Spearmans rank correlation coefficient  Yes  correlation::correlation() 
Robust  Winsorized Pearsons correlation coefficient  Yes  correlation::correlation() 
Bayesian  Bayesian Pearsons correlation coefficient  Yes  correlation::correlation() 
contingency_table
Hypothesis testing
Type  Design  Test  Function used 

Parametric/Nonparametric  Unpaired  Pearsons chisquared test  stats::chisq.test() 
Bayesian  Unpaired  Bayesian Pearsons chisquared test  BayesFactor::contingencyTableBF() 
Parametric/Nonparametric  Paired  McNemars chisquared test  stats::mcnemar.test() 
Bayesian  Paired  No  No 
Effect size estimation
Type  Design  Effect size  CI available?  Function used 

Parametric/Nonparametric  Unpaired  Cramers V  Yes  effectsize::cramers_v() 
Bayesian  Unpaired  Cramers V  Yes  effectsize::cramers_v() 
Parametric/Nonparametric  Paired  Cohens g  Yes  effectsize::cohens_g() 
Bayesian  Paired  No  No  No 
Hypothesis testing
Type  Test  Function used 

Parametric/Nonparametric  Goodness of fit chisquared test  stats::chisq.test() 
Bayesian  Bayesian Goodness of fit chisquared test  (custom) 
Effect size estimation
Type  Effect size  CI available?  Function used 

Parametric/Nonparametric  Pearsons C  Yes  effectsize::pearsons_c() 
Bayesian  No  No  No 
meta_analysis
Hypothesis testing and Effect size estimation
Type  Test  Effect size  CI available?  Function used 

Parametric  Metaanalysis via randomeffects models  beta  Yes  metafor::metafor() 
Robust  Metaanalysis via robust randomeffects models  beta  Yes  metaplus::metaplus() 
Bayes  Metaanalysis via Bayesian randomeffects models  beta  Yes  metaBMA::meta_random() 
ggstatsplot
Note that these functions were initially written to display results from
statistical tests on readymade {ggplot2}
plots implemented in
{ggstatsplot}
.
For detailed documentation, see the package website: https://indrajeetpatil.github.io/ggstatsplot/
Here is an example from {ggstatsplot}
of what the plots look like when
the expressions are displayed in the subtitle
The hexsticker and the schematic illustration of general workflow were generously designed by Sarah Otterstetter (Max Planck Institute for Human Development, Berlin).
Bug reports, suggestions, questions, and (most of all) contributions are welcome.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.