Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Tensorflow Examples | 42,312 | 7 months ago | 218 | other | Jupyter Notebook | |||||
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2) | ||||||||||
Nlp Progress | 21,600 | 23 days ago | 49 | mit | Python | |||||
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. | ||||||||||
Datasets | 16,323 | 9 | 208 | 13 hours ago | 52 | June 15, 2022 | 623 | apache-2.0 | Python | |
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools | ||||||||||
Vision | 14,042 | 2,306 | 1,413 | 14 hours ago | 32 | June 28, 2022 | 908 | bsd-3-clause | Python | |
Datasets, Transforms and Models specific to Computer Vision | ||||||||||
Tensor2tensor | 13,663 | 82 | 11 | 4 days ago | 79 | June 17, 2020 | 589 | apache-2.0 | Python | |
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research. | ||||||||||
Fashion Mnist | 9,856 | a year ago | 24 | mit | Python | |||||
A MNIST-like fashion product database. Benchmark :point_down: | ||||||||||
Doccano | 7,779 | 5 days ago | 28 | May 19, 2022 | 242 | mit | Python | |||
Open source annotation tool for machine learning practitioners. | ||||||||||
Facets | 7,131 | 3 | 1 | 6 days ago | 3 | July 24, 2019 | 82 | apache-2.0 | Jupyter Notebook | |
Visualizations for machine learning datasets | ||||||||||
Awesome Project Ideas | 6,856 | 3 months ago | 1 | mit | ||||||
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas | ||||||||||
Techniques | 6,342 | a month ago | apache-2.0 | |||||||
Techniques for deep learning with satellite & aerial imagery |
A Deep Graph-based Toolbox for Fraud Detection
Introduction
May 2021 Update: The DGFraud has upgraded to TensorFlow 2.0! Please check out DGFraud-TF2
DGFraud is a Graph Neural Network (GNN) based toolbox for fraud detection. It integrates the implementation & comparison of state-of-the-art GNN-based fraud detection models. The introduction of implemented models can be found here.
We welcome contributions on adding new fraud detectors and extending the features of the toolbox. Some of the planned features are listed in TODO list.
If you use the toolbox in your project, please cite one of the two papers below and the algorithms you used :
CIKM'20 (PDF)
@inproceedings{dou2020enhancing,
title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
year={2020}
}
SIGIR'20 (PDF)
@inproceedings{liu2020alleviating,
title={Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection},
author={Liu, Zhiwei and Dou, Yingtong and Yu, Philip S. and Deng, Yutong and Peng, Hao},
booktitle={Proceedings of the 43nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2020}
}
Useful Resources
Table of Contents
git clone https://github.com/safe-graph/DGFraud.git
cd DGFraud
python setup.py install
* python 3.6, 3.7
* tensorflow>=1.14.0,<2.0
* numpy>=1.16.4
* scipy>=1.2.0
* networkx<=1.11
We uses the pre-processed DBLP dataset from Jhy1993/HAN You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset. Unzip the archive before using the dataset:
cd dataset
unzip DBLP4057_GAT_with_idx_tra200_val_800.zip
We implement example graphs for SemiGNN, GAS and GEM in data_loader.py
. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.
For GraphConsis, we preprocessed Yelp Spam Review Dataset with reviews as nodes and three relations as edges.
The dataset with .mat
format is located at /dataset/YelpChi.zip
. The .mat
file includes:
net_rur, net_rtr, net_rsr
: three sparse matrices representing three homo-graphs defined in GraphConsis paper;features
: a sparse matrix of 32-dimension handcrafted features;label
: a numpy array with the ground truth of nodes. 1
represents spam and 0
represents benign.The YelpChi data preprocessing details can be found in our CIKM'20 paper. To get the complete metadata of the Yelp dataset, please email to [email protected] for inquiry.
You can find the implemented models in algorithms
directory. For example, you can run Player2Vec using:
python Player2Vec_main.py
You can specify parameters for models when running the code.
Have a look at the load_data_dblp() function in utils/utils.py for an example.
In order to use your own data, you have to provide:
You can specify a dataset as follows:
python xx_main.py --dataset your_dataset
or by editing xx_main.py
The repository is organized as follows:
algorithms/
contains the implemented models and the corresponding example code;base_models/
contains the basic models (GCN);dataset/
contains the necessary dataset files;utils/
contains:
data_loader.py
);utils.py
).Model | Paper | Venue | Reference |
---|---|---|---|
SemiGNN | A Semi-supervised Graph Attentive Network for Financial Fraud Detection | ICDM 2019 | BibTex |
Player2Vec | Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework | CIKM 2019 | BibTex |
GAS | Spam Review Detection with Graph Convolutional Networks | CIKM 2019 | BibTex |
FdGars | FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System | WWW 2019 | BibTex |
GeniePath | GeniePath: Graph Neural Networks with Adaptive Receptive Paths | AAAI 2019 | BibTex |
GEM | Heterogeneous Graph Neural Networks for Malicious Account Detection | CIKM 2018 | BibTex |
GraphSAGE | Inductive Representation Learning on Large Graphs | NIPS 2017 | BibTex |
GraphConsis | Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection | SIGIR 2020 | BibTex |
HACUD | Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism | AAAI 2019 | BibTex |
Model | Application | Graph Type | Base Model |
---|---|---|---|
SemiGNN | Financial Fraud | Heterogeneous | GAT, LINE, DeepWalk |
Player2Vec | Cyber Criminal | Heterogeneous | GAT, GCN |
GAS | Opinion Fraud | Heterogeneous | GCN, GAT |
FdGars | Opinion Fraud | Homogeneous | GCN |
GeniePath | Financial Fraud | Homogeneous | GAT |
GEM | Financial Fraud | Heterogeneous | GCN |
GraphSAGE | Opinion Fraud | Homogeneous | GraphSAGE |
GraphConsis | Opinion Fraud | Heterogeneous | GraphSAGE |
HACUD | Financial Fraud | Heterogeneous | GAT |
You are welcomed to contribute to this open-source toolbox. The detailed instructions will be released soon. Currently, you can create issues or email to [email protected] for inquiry.