DISCLAIMER: This application is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review. It is not intended to serve as a medical application. There is no representation as to the accuracy of the output of this application and it is presented without warranty.
This application was built to demonstrate IBM's Watson Natural Language Classifier (NLC). The data set we will be using, ICD-10-GT-AA.csv, contains a subset of ICD-10 entries. ICD-10 is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems. In short, it is a medical classification list by the World Health Organization (WHO) that contains codes for: diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. Hospitals and insurance companies alike could save time and money by leveraging Watson to properly tag the most accurate ICD-10 codes.
This application is a Python web application based on the Flask microframework, and based on earlier work done by Ryan Anderson. It uses the Watson Python SDK to create the classifier, list classifiers, and classify the input text. We also make use of the freely available ICD-10 API which, given an ICD-10 code, returns a name and description.
When the reader has completed this pattern, they will understand how to:
nlc-icd10-classifier repo locally. In a terminal, run:
git clone https://github.com/IBM/nlc-icd10-classifier cd nlc-icd10-classifier
Create the following service:
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
Create a new project by clicking
+ New project and choosing
Enter a name for the project name and click
NOTE: By creating a project in Watson Studio a free tier
Object Storage service and
Watson Machine Learning service will be created in your IBM Cloud account. Select the
Free storage type to avoid fees.
Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the
Settings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.
The data used in this example is part of the ICD-10 data set and a cleaned version we'll use is available in the repo under data/ICD-10-GT-AA.csv. We'll now train an NLC model using this data.
From the new project
Overview panel, click
+ Add to project on the top right and choose the
Natural Language Classifier asset type.
A new instance of the NLC tool will launch.
Add the data to your project by clicking the
Browse button in the right-hand
Upload to project section and browsing to the cloned repo. Choose the
ICD-10-GT-AA.csv file you just uploaded and choose
Add to model.
Train model button to begin training. The model will take around an hour to train.
To check the status of the model, and access it after it trains, go to your project in the
Assets tab of the
Models section. The model will show up when it is ready. Double click to see the
The first line of the
Overview tab contains the
Model ID, remember this value as we'll need it in the next step.
Follow the steps below for deploying the application:
Deploy to IBM Cloudbutton below.
From the IBM Cloud deployment page click the
From the Toolchains menu, click the Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking View app.
The app and service can be viewed in the IBM Cloud dashboard. The app will be named
nlc-icd10-classifier, with a unique suffix.
We now need to add a few environment variables to the application's runtime so the right classifier service and model are used. Click on the application from the dashboard to view its settings.
Once viewing the application, click the
Runtime option on the menu and navigate to the
Environment Variables section.
After saving the environment variables, the app will restart. After the app restarts you can access it by clicking the Visit App URL button.
The general recommendation for Python development is to use a virtual environment (venv). To install and initialize a virtual environment, use the
venv module on Python 3 (you install the virtualenv library for Python 2.7):
Create the virtual environment using Python. Use one of the two commands depending on your Python version.
Note: it may be named python3 on your system.
python -m venv mytestenv # Python 3.X virtualenv mytestenv # Python 2.X
Now source the virtual environment. Use one of the two commands depending on your OS.
source mytestenv/bin/activate # Mac or Linux ./mytestenv/Scripts/activate # Windows PowerShell
TIP 💡 To terminate the virtual environment use the
env.example file to
mv env.example .env
.env file with the NLC credentials for either username/password or API key
# Replace the credentials here with your own using either USERNAME/PASSWORD or IAM_APIKEY # Comment out the unset environment variables # Rename this file to .env before running app.py. CLASSIFIER_ID=<add_nlc_classifier_id> NATURAL_LANGUAGE_CLASSIFIER_APIKEY=<add_nlc_apikey>
Install the app dependencies by running:
pip install -r requirements.txt
Start the app by running
Open a browser and point to
The user inputs information into the Text to classify: text box and the Watson NLC classifier will return ICD10 classifications with confidence scores.
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.