Shortcuts
General AI
AI Safety and Transparent and Explainable AI (T/XAI): Foundation Model Transparency Index (FMTI) (XAITK-Saliency) XNLP: XAI for Natural Language ProcessingAI Art Generators: AIGCBench Ziqi Huang's Awesome Evaluation of Visual Generation Open AI's DALL·E 2 Bing Image Creator Stability AI (Stable Diffusion, Visual ChatGPT) Midjourney Dream by WOMBO
Natural Language Processing and Computational Linguistics
LLMs: 🤗 Open LLM Leaderboard LLM-Leaderboard (community-based) Awesome Machine Generated Text A Survey on Language, Multimodal, and Scientific GPT Models: Examing User-Friendly and Open-Sourced Large GPT Models Prompt Engineering Guide Awesome papers on LLMs detection A Survey of Attributions for Large Language Models (2023) Factcheck-GPT (2023) DetectGPT (2022) General Tools: NLTK (Natural Language Toolkit) spaCy () (GitHub) Natural CogCompNLP Hugging Face (datasets; Write With Transformer) Talk to Transformer (InferKit online demo) quanteda: Quantitative Analysis of Textual Data in R (GitHub) gensim – Topic Modelling in Python Transformer-XL bert-as-service BERTweet: A pre-trained language model for English Tweets (EMNLP 2020) RNNTagger TreeTagger Python Word Segmentation Word Ninja SymSpell (Python port: symspellpy) Language Style Transfer (NIPS 2017) GeoTxt (Transactions in GIS 2019) Edinburgh Geoparser GeoPy XAI for Natural Language Processing (AACL-IJCNLP 2020)Google's BERT (GitHub) (WuDaoCorpora; GitHub, GLM, CLM; BMInf) BERTective (EACL 2021) mauve-experiments (NeurIPS 2021)
Chinese NLP Resources: 预训练模型仓库 百度ERNIE Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab (鹏程.盘古α / PanGu-α) awesome-chinese-nlp (Guan Wang) “结巴”中文分词 THUAIPoet (九歌) research group (九歌V2.0; BERT-CCPoem, MixPoet @ AAAI 2020, Stylistic Poetry @ EMNLP 2018, WMPoetry @ IJCAI 2018; 中国古典诗歌匹配数据集 / CCPM = Chinese Classical Poetry Matching Dataset, Other datasets) 少女诗人小冰 tensorflow_poems / LiBai AI Composer / 中文古诗自动作诗机器人 中文语料小数据
Datasets: Nicolas Iderhoff's nlp-datasets WordNet Wikimedia Downloads (Frequency lists) WordNet Amazon MASSIVE dataset WebNLG Challenge Wiktextract (data @ kaikki.org) Use of corpora in translation studies @ Centre for Translation Studies, University of Leeds OpenLexicon Lexique (WorldLex: Blog, Twitter and Newspapers Word Frequencies for 66 languages) Datasets of Automatic Keyphrase Extraction @ LIAAD, INESCTEC KPTimes Corpus @ INLG 2019 dewiki-wordrank OAGSX Title Generation Dataset OAGKX Keyword Generation Dataset GeoNames Awesome LLM-generated Text Detection (2023) Awesome papers on LLMs detection (2023) M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection (2023)
Privacy-related resources: (PrivaSeer Corpus @ ACL 2021, PrivBERT @ ACL 2021)
Federated Learning
General Resources: Awesome-Federated-Learning The Federated Learning PortalOpen-source Tools: TensorFlow Federated (TFF) (GitHub) NVIDIA Clara FedML: A Research Library and Benchmark for Federated Machine Learning (GitHub) (Federated Learning Research at Webank AI)
Commercial Solutions: