Cyberspace of Shujun LI
Shortcuts
General AI
Transparent and Explainable AI (T/XAI):
Foundation Model Transparency Index (FMTI)
![XAITK XAITK](https://xaitk.org/assets/kitware/images/xaitk-wordmark-light.png)
(
XAITK-Saliency)
XNLP: XAI for Natural Language Processing
AI Art Generators:
Open AI's DALL·E 2
Bing Image Creator
Stability AI
(
Stable Diffusion,
Visual ChatGPT)
Midjourney
Dream by WOMBO
Natural Language Processing and Computational Linguistics
General Tools:
NLTK (Natural Language Toolkit)
spaCy
(
![textacy: NLP, before and after spaCy textacy: NLP, before and after spaCy](https://textacy.readthedocs.io/en/latest/_static/textacy_logo.png)
)
![PyTorch-NLP PyTorch-NLP](https://raw.githubusercontent.com/PetrochukM/PyTorch-NLP/master/docs/_static/img/logo.svg)
(
GitHub)
Natural
CogCompNLP
Hugging Face
(
datasets;
Write With Transformer)
Talk to Transformer (InferKit online demo)
quanteda: Quantitative Analysis of Textual Data in R
(
GitHub)
gensim – Topic Modelling in Python
Transformer-XL
bert-as-service
BERTweet: A pre-trained language model for English Tweets (EMNLP 2020)
RNNTagger
TreeTagger
Python Word Segmentation
Word Ninja
SymSpell
(
Python port: symspellpy)
Language Style Transfer (NIPS 2017)
GeoTxt (Transactions in GIS 2019)
Edinburgh Geoparser
GeoPy
XAI for Natural Language Processing (AACL-IJCNLP 2020)
A Survey of Attributions for Large Language Models (2023)
Factcheck-GPT (2023)
DetectGPT (2022)
BERTective (EACL 2021)
mauve-experiments (NeurIPS 2021)
Pretrained Models:
🤗 Open LLM Leaderboard
LLM-Leaderboard (community-based)
Awesome Machine Generated Text
A Survey on Language, Multimodal, and Scientific GPT Models: Examing User-Friendly and Open-Sourced Large GPT Models
Awesome papers on LLMs detection
预训练模型仓库
OpenAI's ChatGPT
Google's BERT
![Microsoft DeepSpeed Microsoft DeepSpeed](https://www.deepspeed.ai/assets/images/deepspeed-logo-uppercase-bold-white-1.15.svg)
(
GitHub)
![悟道 (Wudao) 悟道 (Wudao)](https://wudaoai.cn/assets/pc/wudao.svg)
(
WuDaoCorpora;
GitHub,
GLM,
CLM;
BMInf)
Chinese NLP Resources:
百度ERNIE
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab
(
鹏程.盘古α / PanGu-α)
awesome-chinese-nlp (Guan Wang)
“结巴”中文分词
THUAIPoet (九歌) research group
(
九歌V2.0;
BERT-CCPoem,
MixPoet @ AAAI 2020,
Stylistic Poetry @ EMNLP 2018,
WMPoetry @ IJCAI 2018;
中国古典诗歌匹配数据集 / CCPM = Chinese Classical Poetry Matching Dataset,
Other datasets)
少女诗人小冰
tensorflow_poems / LiBai AI Composer / 中文古诗自动作诗机器人
中文语料小数据
Datasets:
Nicolas Iderhoff's nlp-datasets
WordNet
Wikimedia Downloads
![Wiktionary Wiktionary](https://upload.wikimedia.org/wikipedia/meta/3/30/Wiktionary-logo-tiles_2x.png)
(
Frequency lists)
WordNet
Amazon MASSIVE dataset
WebNLG Challenge
Wiktextract
(
data @ kaikki.org)
Use of corpora in translation studies @ Centre for Translation Studies, University of Leeds
OpenLexicon
Lexique
(
WorldLex: Blog, Twitter and Newspapers Word Frequencies for 66 languages)
Datasets of Automatic Keyphrase Extraction @ LIAAD, INESCTEC
KPTimes Corpus @ INLG 2019
dewiki-wordrank
OAGSX Title Generation Dataset
OAGKX Keyword Generation Dataset
GeoNames
Awesome LLM-generated Text Detection (2023)
Awesome papers on LLMs detection (2023)
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection (2023)
Privacy-related resources:
![PrivaSeer PrivaSeer](https://privaseer.ist.psu.edu/static/privaseer-logo-blue.jpg)
(
PrivaSeer Corpus @ ACL 2021,
PrivBERT @ ACL 2021)
Federated Learning
General Resources:
Awesome-Federated-Learning
The Federated Learning Portal
Open-source Tools:
TensorFlow Federated (TFF)
(
GitHub)
NVIDIA Clara
FedML: A Research Library and Benchmark for Federated Machine Learning
![FedML-AI FedML-AI](https://avatars.githubusercontent.com/u/69099025?s=200&v=4)
(
GitHub)
![WeBank AI's Federated AI Ecosystem WeBank AI's Federated AI Ecosystem](https://avatars.githubusercontent.com/u/54675540?s=200&v=4)
(
Federated Learning Research at Webank AI)
Commercial Solutions:
Disclaimer
All information on this website is for personal use and Shujun Li is not responsible for any misuse of information provided. The listed links on any page do not indicate any personal recommendations for any purposes for the visitors of this website, as each link is included for a different reason meaningful for Shujun Li's personal use. Logo files of websites are used to facilitate recognition of the external links, and does not represent endorsement of the corresponding websites for the content of this website. If the use of any logo file violates the copyrights or policies of any individuals or organisations, please contact Shujun Li so that he can removes the logo file or the whole link. Please also help report broken links and broken images on this website.