Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for ai safety
ai-safety
x
40 search results found
Awesome Machine Learning Interpretability
⭐
3,241
A curated list of awesome responsible machine learning resources.
Giskard
⭐
2,509
🐢 The testing framework for ML models, from tabular to LLMs
Safe Rlhf
⭐
1,040
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Tiger
⭐
337
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
Guardrail
⭐
278
Build LLM apps safely and securely🛡️
Thought Cloning
⭐
217
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Llm_rules
⭐
172
RuLES: a benchmark for evaluating rule-following in language models
Make Safe Ai
⭐
143
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
Pretraining With Human Feedback
⭐
97
Code accompanying the paper Pretraining Language Models with Human Preferences
Ethics
⭐
94
Aligning AI With Shared Human Values (ICLR 2021)
Diffattack
⭐
92
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
Toolemu
⭐
73
A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
Awesome Ai Safety
⭐
64
A curated list of papers & technical articles on AI Quality & Safety 📚
Safenlp
⭐
57
Safety Score for Pre-Trained Language Models
Ais
⭐
55
Toolkit for research purposes in AIS. See the website for the paper.
Entropic Out Of Distribution Detection
⭐
47
Add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
Flat
⭐
46
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
Rain
⭐
40
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Awesome Ai Alignment
⭐
37
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
Distinction Maximization Loss
⭐
34
Improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.
Cat
⭐
30
[CoRL'23] Adversarial Training for Safe End-to-End Driving
Sparse Probing Paper
⭐
29
Sparse probing paper full code.
Stampy Ui
⭐
28
AI Safety Q&A web frontend
Ai Principles
⭐
19
Alpha principles for the ethical use of AI and Data Driven Technologies in Ontario | Proposition de principes pour une utilisation éthique des technologies axées sur les données en Ontario
Aiwatch
⭐
19
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
La Mbda
⭐
16
LAMBDA is a model-based reinforcement learning agent that uses Bayesian world models for safe policy optimization
Fssd_ood_detection
⭐
14
Feature Space Singularity for Out-of-Distribution Detection.
Contranet
⭐
14
This is the official implementation of ContraNet (NDSS2022).
Beavertails
⭐
12
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Promptinject
⭐
11
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks.
Avg Avg
⭐
11
[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection
Llm Cooperation
⭐
9
Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023
Safe Reward
⭐
8
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
Universal Neurons
⭐
8
Universal Neurons in GPT2 Language Models
Daisybell
⭐
8
Scan your AI/ML models for problems before you put them into production.
Lawlia
⭐
7
LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence
Amplification
⭐
7
An implementation of iterated distillation and amplification
Agi Safety Governance Practices
⭐
6
Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"
Mithridates
⭐
5
Measure and Boost Backdoor Robustness
Ais
⭐
5
Common repository for our readings and discussions
Related Searches
Python Ai Safety (17)
Jupyter Notebook Ai Safety (8)
Llm Ai Safety (8)
Artificial Intelligence Ai Safety (7)
Deep Learning Ai Safety (7)
Natural Language Processing Ai Safety (7)
Language Model Ai Safety (5)
Machine Learning Ai Safety (5)
Anomaly Detection Ai Safety (5)
1-40 of 40 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.