Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Streamlit | 25,099 | 17 | 404 | 8 hours ago | 182 | July 27, 2022 | 628 | apache-2.0 | Python | |
Streamlit — A faster way to build and share data apps. | ||||||||||
Awesome Datascience | 21,312 | 8 days ago | mit | |||||||
:memo: An awesome Data Science repository to learn and apply for real world problems. | ||||||||||
Dash | 18,785 | 804 | 413 | 4 days ago | 154 | June 13, 2022 | 744 | mit | Python | |
Data Apps & Dashboards for Python. No JavaScript Required. | ||||||||||
Gradio | 18,377 | 1 | 21 | 7 hours ago | 355 | July 04, 2022 | 348 | apache-2.0 | Python | |
Create UIs for your machine learning model in Python in 3 minutes | ||||||||||
Best Of Ml Python | 13,727 | 4 days ago | 19 | cc-by-sa-4.0 | ||||||
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly. | ||||||||||
Facets | 7,131 | 3 | 1 | 12 days ago | 3 | July 24, 2019 | 82 | apache-2.0 | Jupyter Notebook | |
Visualizations for machine learning datasets | ||||||||||
Orange3 | 4,127 | 57 | 40 | 3 days ago | 56 | April 02, 2022 | 85 | other | Python | |
🍊 :bar_chart: :bulb: Orange: Interactive data analysis | ||||||||||
Machine_learning_complete | 3,985 | a month ago | mit | Jupyter Notebook | ||||||
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques. | ||||||||||
Aim | 3,760 | 8 hours ago | 172 | October 11, 2021 | 266 | apache-2.0 | Python | |||
Aim 💫 — An easy-to-use & supercharged open-source AI metadata tracker (experiment tracking, AI agents tracing) | ||||||||||
Datascience | 3,751 | 16 days ago | cc0-1.0 | |||||||
Curated list of Python resources for data science. |
Mental illness affects 1 out of every 5 adults. It costs the US about $1 trillion in lost annual productivity. Despite how common this condition is, people often avoid seeking support. This analysis examines the technology industry, a market worth $1.6 trillion to ask: what factors can contribute to creating an employee workplace that feels comfortable and receptive to mental health support?
The world is in the middle of a health care crisis. Unfortunately, the Covid 19 pandemic represents only part of a larger story. Anxiety and depression are also on the rise. People cannot find the help that they need to address these mental health conditions. According to an American Psychological Association poll of nearly 1,800 psychologists, 74 percent said more patients were seeking treatment for anxiety disorders than before the pandemic. Nearly 30 percent of providers reported seeing more patients overall (New York Times, 2021). Research that identifies the need for employers to step up to meet this crisis will affect policy and lives. This study is important for showing how and what we communicate regarding mental health services to staff can influence the maintenance and/ or improvement of business operations.
This data comes from Open Sourcing Mental Illness, a nonprofit dedicated to raising awareness, educating, and providing resources to support mental wellness in the tech and open-source communities. The survey for the year 2016 contains 1,434 responses, and measures attitudes towards mental health among tech workers with and without a mental health disorder. The analysis included 1,004 responses after data cleaning. Data was filtered or dropped by these criteria:
Pandas/Python supported this pre-processing of information.
To host information and ensure public access across 6 members, Amazon Web services hosts the information. An SQL relational database creates tables and links them by primary and foreign keys across years. Postgres and pgadmin best suited the project because the survey only contains about a 1000 records. Other benefits included:
Please visit Entity Relational Diagram for details about table structure.
To load the data into the tables, we used a python package, sqlalchemy, and modules, such as create engine/session. From the built database, Final_Project:_Mental_Health, it is possible to directly query the data into a pandas jupyter notebook using SQL code.
Datasource Link.
Characteristics of Respondents
Analysis examined gender, age, country-worked, and company sizes influence on peoples perception of mental health resources and stigma in the workplace. Most survey respondents identified as male. 722 respondents (72% of the total number of respondents) identified as male. 259 respondents (26% of the total number of respondents) identified as female. An exceedingly small group identified as non-binary (2%). Between 25%-75% of workers fell in the age range of 28-38. The United States and Britain accounted for 842(84%) of the survey respondents country of work. Respondents came from companies of various sizes; the most popular being companies of 26-100 people or 1000+ people.
Comfort Discussing Mental Health with a Supervisor
Willingness to discuss mental health issues with coworkers, by age.
to view additional analysis visualizations created so far, please visit our images folder.
Most respondents said they felt comfortable talking about mental health with their supervisors. Across gender, country, age and company size, yes, no, and maybe responses were in the exact same proportions, and almost evenly split into thirds (34% maybe, 28% no, and 38% yes). This demonstrates people, overall, had distributive approaches. The only variable that showed notable differences was country-worked. 56% of respondents who worked in the Netherlands said yes, they felt comfortable discussing mental health with a supervisor. Contrarily, only 16% of respondents who worked for companies in Germany felt similarly. Nevertheless, merely 31 respondents worked in the Netherlands and 44 respondents worked in Germany. Since the most respondents came from Britain and the US, these countries 842 respective responses of 40% and 38% yes affected the overall results for country-worked.
Figure A
Perception that Identification of Having a Mental Illness Would Hurt a Career
Despite some inclination to share about their mental health, almost half of all respondents believed that being identified as having a mental health diagnosis would hurt their career. Women feared retribution more than men in this scenario. 134 (52%) of women believed a mental health diagnosis either had or would hurt their career. Comparatively, 313 (43%) of men believed a mental health diagnosis either had or would hurt their career. Canadian workers also felt the most concerned in comparison to workers of other countries. 62% of the 60 workers in Canadian companies said they thought being identified with a mental illness had or would affect their career. On the other hand, only 11 respondents from the Netherland companies thought this. Size of company also made a small difference. Half of respondents from companies with more than 1000 workers thought being identified with a mental illness would hurt their careers. Conversely, only 44% of respondents thought this was true for a company of 100-500 people. Identification with a mental illness showed some of the greatest variances among subgroups (e.g. men vs women) responses even while the overarching categories (gender vs age) showed similarities in answers.
Figure B
It also appeared respondents did not know if the workplace treated physical health and mental health the same. Just over 40% of men and women said, I dont know. 80 women (31% of female respondents) and 219 men (30% of male respondents) commented, yes. Gender did not affect perception in this case. It could be implied that factors, such as uncertainty rather than observation of negative workplace practices, are influencing people and their choices about mental health in the workplace.
Observation of Negative Consequences for a Coworker Revealing Mental Illness
Nevertheless, when asked if they had observed coworkers receiving negative consequences for revealing a mental health condition, almost 90% of both men and women said no. 91% of people, regardless of country-worked also said no. 91% of people from every company size said no. For age, less than 10% of people also said yes to seeing consequences. Even though people feared for their careers if they were identified with a mental health condition, they also lacked any observable evidence of others receiving different treatment because of a mental health condition.
Belief that Mental Health and Physical Health Receive the Same Regard
It also appeared respondents did not know if the workplace treated physical health and mental health the same. Just over 40% of men and women said, I dont know. 80 women (31% of female respondents) and 219 men (30% of male respondents) commented, yes. Gender did not affect perception in this case. Younger workers seemed to show the most confusion around this subject with 48% saying I dont know in comparison to 39% of workers ages 38 or more. As people aged, they seemed to shift more towards no than yes, and away from I dont know. It could be implied that factors, such as uncertainty rather than observation of negative workplace practices, are influencing people and their choices about mental health in the workplace. Workers see this as they are in the workforce longer, but it is unclear to most people regardless of company size, country, etc.
Awareness of Mental Health Coverage and Employers Discussion of Mental Health
When asked about awareness of mental health coverage, 412 (41%) of respondents answered, I am not sure. Most people did not know about their workplaces coverage for mental health conditions regardless of gender, age, company size or country-worked. Employers also lacked efforts to spread awareness of available resources. 708 (71%) of respondents said their employers had not discussed mental health options. This perception also did not vary by a set of characteristics.
Figure C
Please see charts and results in an interactive form on our website here: https://www.mentalhealthintech.com/
Click here to see the interactive dashboard in Tableau.
Model Type and Goal
Our intention is to predict an output from a previous experience, to achieve it, we will use a supervised machine learning model.
This kind of model allow us to use training data to learn a link between the input, and the output. Compared to unsupervised learning, it is a more accurate and trustworthy method.
Code-link
Datasource Link.
Goal:
Our goal is to be able to classifies in an accurate manner if an individual is currently diagnosed with a Mental Health disorder according to each individual answers present in the dataset .
Interest here is to focus on individuals who work in a tech-company.
After training and testing our data, and if we add more answers, we will be able to predict an individual Mental Health disorder even if these entries are missing on the new data.
For cleaning and preprocessing step, see Data Source/ Pre-Processing module.>br>
Visualizations of cleaning step have been created for better understanding of our features.
Once we got all desired features cleaned up for our model, we encoded our data using a LabelEncoder from scikit-learn library. This step allows us to change categorical features into unique number identifier. LabelEncoder encode features with a value between 0 and n classes, where n is the number of distinct features. If a feature repeats it assigns the same value as assigned earlier.
Using LabelEncoder instead of a pandas get_dummies function for example creates a function which persists and can be applied to new datasets which use the same categorical variables, with consistent results.
Once encoded our dataset is ready to be used with a machine learning algorithm.
Our target is: "Do you currently have a mental health disorder?"
To predict it, we decided to use insights related to the target:
** Demographics information: Age / Gender / Country where an individual live and works.
** Company information: Size / Work position.
** Current and previous employers' information: Provide MH benefits / Current employer / Previous employer.
** Information about mental health disorder: Have been previously diagnosed with MH disorder / Able to take a leave if diagnosed with MH disorder / MH disorder from family history / Have been seeking help from MH professional.
We decide to split our entry data into 75% for training set and 25% testing set, because any train-test split which has more data in the training set will most likely give you better accuracy as calculated on that test set. like that the training dataset for the model can learn and effectively map input to output. When splitting the dataset, we stratify it so that each split is similar. In a classification setting, it is often chosen to ensure that the train and test sets have approximately the same percentage of samples of each target class as the complete set.
This analysis employed a Random Forest Classifier because of his versatility, it can be used for both classifications and regression task. It also provides higher accuracy through cross validation. Compared to simple decisions trees, instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features.
We will use it for his diagnostics classification abilities.
Benefits:
Limitations:
The main limitation of random forest is that many trees can make the algorithm too slow and ineffective for real-time predictions.
Since a random forest combines multiple decision trees, it becomes more difficult to interpret.
Model improvements, changes, additional training:
To improve our classification we also run an oversampling model using a logistic regression model. It adjusts the class distribution of a data set (the ratio between the different classes/categories represented) by randomly duplicating examples from the minority class and adding them to the training dataset. Doing so should help to achieve a better prediction's accuracy score.
To keep improving our model, we will keep refining our features. More have been added as a test, and the accuracy score improves (+2%) and will keep improving as more information are added. That said, the link between the new feature and the primary target is not as strong and while improving the performance of the model, it doesn't add more pertinence in comparison of the model with original feature.
We stop using under-sampling as it did not help to improve our model. Indeed, under-sampling can result in dropping a lot of information. Even if this dropped information belongs to the majority class, it is useful information for a modeling algorithm.
Our model is actually divided on 75% training data and 25% testing data. To train further our data, we are thinking about adding information into our training data out-of surveys answer's from a different year. More data will likely give us more accurate predictions since it will be trained on more information.
After using Random Forest Classifier to predict our target based on related features, our accuracy score is 78.6%, with a precision of 79%, a recall(sensitivity) of 79% and a F1 score of 0.79.
The high average F1 score tell us that sensitivity and precision are balanced in our model.
It can also be analysed that our model have better performance while predicting a positive MH diagnostic than a negative one. Looking at the confusion matrix for the random forest classifier, even if the accuracy score(~78%) could be better the model is performing well at finding individuals positively diagnose (True positive = 91) and negatively diagnose (True negative = 61).
After oversampling our accuracy score and all other parameters are better 83%, which is good and means our model is predicting more than 4 out of 5 times the correct output. In conclusion, as now, the model is successful at answering our question and will still be if more data is added through the model.
Using Supervised learning , we also attempt using Logistic Regression as a model to predict seeking treatment as a target. It works by training data and learning from a link between an input and output.
Target and Features Engineering
Our target is: "Have you ever sought treatment for a mental health issue from a mental health professional?"
To predict it, we identified insights related to the target:
Limitations:
Without accurate information, people make decisions based on personal biases, opinions or best guesses. Despite evidence to the contrary, workers believed that if an employer identified them with a mental health condition, then their careers would suffer. This felt more probable for women than men, a possible amplification of womens minority status in the tech industry. Representing only 1 out every 4 workers, women already face different characterizations than men, because of their poorer remonstrance. While workers had not seen negative consequences for a mental health diagnosis among coworkers, they also did not notice employers spreading awareness of mental health resources or coverage. Omission in discussions prevents normalization of the subject. This could be affecting peoples willingness to be identified as having a mental health condition. Even for this survey, only 420 (41%) of the total respondents were willing to answer if they had a current mental health diagnosis.
Presentation:
Please see more of our presentation and summary in Google Slides