nlp_chinese_corpus
The premium Open Source alternative to LDC
🎯 Best for:AI researchers training Chinese language models.
What is nlp_chinese_corpus?
Replaces proprietary Chinese language datasets for training machine learning models. Provides multi-gigabyte text corpora including news and social media for NLP tasks.
Tech Stack
UnknownAI, ML & Data
Why nlp_chinese_corpus?
- • Massive data volume
- • Diverse sources
- • Pre-cleaned formats
Limitations
- • Large storage requirement
- • Chinese language only
- • Static dataset
3/5/2026
Last Update
1,556
Forks
23
Issues
MIT
License
Financial Leak Detected
Stop the "SaaS Tax"
Your team could be burning cash. Switching to nlp_chinese_corpus instantly boosts your runway.
Competitor Cost
-$1,440
/ year (est. based on LDC)
Self-Hosted
$0
/ year
Team Size10 Users
150+
SAVE 100%