Project Name  Stars  Downloads  Repos Using This  Packages Using This  Most Recent Commit  Total Releases  Latest Release  Open Issues  License  Language 

Ydata Profiling  10,386  2 days ago  149  mit  Python  
Create HTML profiling reports from pandas DataFrame objects  
Datascience  3,691  11 days ago  1  cc01.0  
Curated list of Python resources for data science.  
Machine Learning With Python  2,498  6 months ago  2  bsd2clause  Jupyter Notebook  
Practice and tutorialstyle notebooks covering wide variety of machine learning techniques  
Sweetviz  2,338  8  20 days ago  32  June 14, 2022  42  mit  Python  
Visualize and compare datasets, target values and associations, with one line of code.  
Pingouin  1,288  5  28  16 days ago  36  June 24, 2022  37  gpl3.0  Python  
Statistical package in Python based on Pandas  
Nba_py  971  8  3 years ago  3  December 01, 2016  49  bsd3clause  Python  
Python client for NBA statistics located at stats.nba.com  
Dataframe Go  642  14  a year ago  4  April 22, 2021  6  other  Go  
DataFrames for Go: For statistics, machinelearning, and data manipulation/exploration  
Fecon235  622  4 years ago  3  other  Jupyter Notebook  
Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt CaseShiller housing asset portfolio equities SPX bonds TIPS rates currency FX euro EUR USD JPY yen XAU gold Brent WTI oil HoltWinters timeseries forecasting statistics econometrics  
Scikit Mobility  589  2 months ago  4  June 15, 2022  38  bsd3clause  Python  
scikitmobility: mobility analysis in Python  
Stats Maths With Python  492  a year ago  mit  Jupyter Notebook  
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python 
Pingouin is an opensource statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. For a full list of available functions, please refer to the API documentation.
Pingouin is designed for users who want simple yet exhaustive statistical functions.
For example, the ttest_ind
function of SciPy returns only the Tvalue and the pvalue. By contrast,
the ttest
function of Pingouin returns the Tvalue, the pvalue, the degrees of freedom, the effect size (Cohen's d), the 95% confidence intervals of the difference in means, the statistical power and the Bayes Factor (BF10) of the test.
If you have questions, please ask them in GitHub Discussions.
The main dependencies of Pingouin are :
In addition, some functions require :
Pingouin is a Python 3 package and is currently tested for Python 3.73.10. It does not support Python 2.
Pingouin can be easily installed using pip
pip install pingouin
or conda
conda install c condaforge pingouin
New releases are frequent so always make sure that you have the latest version:
pip install upgrade pingouin
Click on the link below and navigate to the notebooks/ folder to run a collection of interactive Jupyter notebooks showing the main functionalities of Pingouin. No need to install Pingouin beforehand, the notebooks run in a Binder environment.
import numpy as np
import pingouin as pg
np.random.seed(123)
mean, cov, n = [4, 5], [(1, .6), (.6, 1)], 30
x, y = np.random.multivariate_normal(mean, cov, n).T
# Ttest
pg.ttest(x, y)
T  dof  alternative  pval  CI95%  cohend  BF10  power 

3.401  58  twosided  0.001  [1.68 0.43]  0.878  26.155  0.917 
pg.corr(x, y)
n  r  CI95%  pval  BF10  power 

30  0.595  [0.3 0.79]  0.001  69.723  0.950 
# Introduce an outlier
x[5] = 18
# Use the robust biweight midcorrelation
pg.corr(x, y, method="bicor")
n  r  CI95%  pval  power 

30  0.576  [0.27 0.78]  0.001  0.933 
The pingouin.normality function works with lists, arrays, or pandas DataFrame in wide or longformat.
print(pg.normality(x)) # Univariate normality
print(pg.multivariate_normality(np.column_stack((x, y)))) # Multivariate normality
W  pval  normal 

0.615  0.000  False 
(False, 0.00018)
# Read an example dataset
df = pg.read_dataset('mixed_anova')
# Run the ANOVA
aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True)
print(aov)
Source  SS  DF  MS  F  punc  np2 

Group  5.460  1  5.460  5.244  0.023  0.029 
Within  185.343  178  1.041  nan  nan  nan 
pg.rm_anova(data=df, dv='Scores', within='Time', subject='Subject', detailed=True)
Source  SS  DF  MS  F  punc  ng2  eps 

Time  7.628  2  3.814  3.913  0.023  0.04  0.999 
Error  115.027  118  0.975  nan  nan  nan  nan 
# FDRcorrected post hocs with Hedges'g effect size
posthoc = pg.pairwise_tests(data=df, dv='Scores', within='Time', subject='Subject',
parametric=True, padjust='fdr_bh', effsize='hedges')
# Pretty printing of table
pg.print_table(posthoc, floatfmt='.3f')
Contrast  A  B  Paired  Parametric  T  dof  alternative  punc  pcorr  padjust  BF10  hedges 

Time  August  January  True  True  1.740  59.000  twosided  0.087  0.131  fdr_bh  0.582  0.328 
Time  August  June  True  True  2.743  59.000  twosided  0.008  0.024  fdr_bh  4.232  0.483 
Time  January  June  True  True  1.024  59.000  twosided  0.310  0.310  fdr_bh  0.232  0.170 
# Compute the twoway mixed ANOVA
aov = pg.mixed_anova(data=df, dv='Scores', between='Group', within='Time',
subject='Subject', correction=False, effsize="np2")
pg.print_table(aov)
Source  SS  DF1  DF2  MS  F  punc  np2  eps 

Group  5.460  1  58  5.460  5.052  0.028  0.080  nan 
Time  7.628  2  116  3.814  4.027  0.020  0.065  0.999 
Interaction  5.167  2  116  2.584  2.728  0.070  0.045  nan 
import pandas as pd
np.random.seed(123)
z = np.random.normal(5, 1, 30)
data = pd.DataFrame({'X': x, 'Y': y, 'Z': z})
pg.pairwise_corr(data, columns=['X', 'Y', 'Z'], method='pearson')
X  Y  method  alternative  n  r  CI95%  punc  BF10  power 

X  Y  pearson  twosided  30  0.366  [0.01 0.64]  0.047  1.500  0.525 
X  Z  pearson  twosided  30  0.251  [0.12 0.56]  0.181  0.534  0.272 
Y  Z  pearson  twosided  30  0.020  [0.34 0.38]  0.916  0.228  0.051 
data.ptests(paired=True, stars=False)
X  Y  Z  

X 

0.226  0.165 
Y  1.238 

0.658 
Z  1.424  0.447 

pg.linear_regression(data[['X', 'Z']], data['Y'])
names  coef  se  T  pval  r2  adj_r2  CI[2.5%]  CI[97.5%] 

Intercept  4.650  0.841  5.530  0.000  0.139  0.076  2.925  6.376 
X  0.143  0.068  2.089  0.046  0.139  0.076  0.003  0.283 
Z  0.069  0.167  0.416  0.681  0.139  0.076  0.412  0.273 
pg.mediation_analysis(data=data, x='X', m='Z', y='Y', seed=42, n_boot=1000)
path  coef  se  pval  CI[2.5%]  CI[97.5%]  sig 

Z ~ X  0.103  0.075  0.181  0.051  0.256  No 
Y ~ Z  0.018  0.171  0.916  0.332  0.369  No 
Total  0.136  0.065  0.047  0.002  0.269  Yes 
Direct  0.143  0.068  0.046  0.003  0.283  Yes 
Indirect  0.007  0.025  0.898  0.069  0.029  No 
data = pg.read_dataset('chi2_independence')
expected, observed, stats = pg.chi2_independence(data, x='sex', y='target')
stats
test  lambda  chi2  dof  p  cramer  power 

pearson  1.000  22.717  1.000  0.000  0.274  0.997 
cressieread  0.667  22.931  1.000  0.000  0.275  0.998 
loglikelihood  0.000  23.557  1.000  0.000  0.279  0.998 
freemantukey  0.500  24.220  1.000  0.000  0.283  0.998 
modloglikelihood  1.000  25.071  1.000  0.000  0.288  0.999 
neyman  2.000  27.458  1.000  0.000  0.301  0.999 
Several functions of Pingouin can be used directly as pandas DataFrame methods. Try for yourself with the code below:
import pingouin as pg
# Example 1  ANOVA
df = pg.read_dataset('mixed_anova')
df.anova(dv='Scores', between='Group', detailed=True)
# Example 2  Pairwise correlations
data = pg.read_dataset('mediation')
data.pairwise_corr(columns=['X', 'M', 'Y'], covar=['Mbin'])
# Example 3  Partial correlation matrix
data.pcorr()
The functions that are currently supported as pandas method are:
Pingouin was created and is maintained by Raphael Vallat, a postdoctoral researcher at UC Berkeley, mostly during his spare time. Contributions are more than welcome so feel free to contact me, open an issue or submit a pull request!
To see the code or report a bug, please visit the GitHub repository.
This program is provided with NO WARRANTY OF ANY KIND. Pingouin is still under heavy development and there are likely hidden bugs. Always double check the results with another statistical software.
Contributors
If you want to cite Pingouin, please use the publication in JOSS:
Several functions of Pingouin were inspired from R or Matlab toolboxes, including: