Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Sense2vec | 1,450 | 4 months ago | 1 | July 14, 2021 | 20 | mit | Python | |||
🦆 Contextually-keyed word vectors | ||||||||||
Text Analytics With Python | 1,073 | 2 years ago | apache-2.0 | Jupyter Notebook | ||||||
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer. | ||||||||||
Adam_qas | 298 | 3 years ago | 8 | gpl-3.0 | Python | |||||
ADAM - A Question Answering System. Inspired from IBM Watson | ||||||||||
Textpipe | 290 | 1 | 2 years ago | 39 | January 25, 2021 | 24 | mit | Python | ||
Textpipe: clean and extract metadata from text | ||||||||||
Concise Concepts | 208 | 10 days ago | 34 | January 13, 2023 | 5 | mit | Python | |||
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring. | ||||||||||
Nlpbuddy | 82 | 4 years ago | 5 | agpl-3.0 | HTML | |||||
A text analysis application for performing common NLP tasks through a web dashboard interface and an API | ||||||||||
Nlp | 74 | a year ago | mit | HTML | ||||||
Free hands-on course with the implementation (in Python) and description of several Natural Language Processing (NLP) algorithms and techniques, on several modern platforms and libraries. | ||||||||||
Nlp_workshop_odsc_europe20 | 36 | 3 years ago | gpl-3.0 | Jupyter Notebook | ||||||
Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models. | ||||||||||
Stock Prediction | 33 | 3 months ago | 2 | Jupyter Notebook | ||||||
Technical and sentiment analysis to predict the stock market with machine learning models based on historical time series data and news article sentiment collected using APIs and web scraping. | ||||||||||
Pyresearchinsights | 15 | 7 months ago | 3 | mit | Python | |||||
End-to-end NLP tool to analyze research publications |
Generate topic models using open text automatically extracted from various file formats in disk images. This project uses The Sleuth Kit (sleuthkit/sleuthkit) to parse file systems in disk images, textract (https://textract.readthedocs.io/en/stable/) to extract text from common file formats, gensim to generate topic models (https://radimrehurek.com/gensim/), and pyLDAvis (bmabey/pyLDAvis) for visualization.
The topic model generation tool depends on a number of external natural language processing and digital forensics libraries. For convenience, we have included a script that will install all the required dependencies in Ubuntu 18.04LTS. This script will install certain tools (TSK, libewf, and several others) by compiling and installing from source.
In a Ubuntu host or a clean virtual machine, first make sure you have git installed:
$ sudo apt-get install git
Next, follow these steps:
$ git clone https://github.com/bitcurator/bitcurator-nlp-gentm
$ cd bitcurator-nlp-gentm
$ sudo ./setup.sh
This repository includes a sample Expert Witness Format disk image (govdocs45sampler.E01) in the disk_images directory. If you do not make any changes to the configuration file, the topic modeler and visualization tool will be run on text extracted from files discovered in this image.
To run the tool against other disk images (EWF or raw), simply copy those images into the disk_images directory and edit the [image_section] of the configuration file (config.txt) to include the relevant files. For example, if you had two images named testimage1.E01 and testimage2.dd, the section would be modified as follows:
# Disk images to process (the default location can be changed in the following section)
[image_section]
my-image-name1.E01 = 1
my-image-name2.dd = 1
Run the following command to extract text from the configured file types, start the topic modeling tool, and load the results into a browser window.
$ python bcnlp_tm.py
Depending on the size of your corpus, this may take some time. You will see a range of log output and (possibly) deprecation warnings related to the operation of gensim and other tools. The tool is operating normally unless it drops back to a terminal prompt with an error.
The results based on the text extracted from your specified file types and processed using pyLDAvis will appear automatically in a browser window. When finished viewing, you can terminate the server in the existing terminal by typing "Ctrl-X" followed by "Ctrl-C".
Additional adjustments can be performed with command-line flags.
$ Usage: python bcnlp_tm.py [--topics <10>] [--tm <gensim|graphlab>] [--infile </directory/path>] [--config </path/to/config-file/>]
Additional project information can be found on the BitCurator NLP wiki at https://github.com/BitCurator/bitcurator-nlp/wiki.
The BitCurator logo, BitCurator project documentation, and other non-software products of the BitCurator team are subject to the the Creative Commons Attribution 4.0 Generic license (CC By 4.0).
Unless otherwise indicated, software items in this repository are distributed under the terms of the GNU Lesser General Public License, Version 3. See the text file "COPYING" for further details about the terms of this license.
In addition to software produced by the BitCurator team, BitCurator packages and modifies open source software produced by other developers. Licenses and attributions are retained here where applicable.
If your Ubuntu VM does not already have a desktop (graphic UI), you will need to install one in order to view the results in a browser:
$ sudo apt-get update
$ sudo apt-get install ubuntu-desktop