文本挖掘

Text Mining and Computational Text Analysis

This course introduces text preprocessing, keyword extraction, topic modeling, text classification, sentiment analysis, and LLM-assisted evidence extraction. It is designed for research tasks involving academic literature, policy documents, patents, online communities, and business texts.

TM Graduate students / text analysis beginners Chinese with English terms

Course Introduction

文本挖掘的核心目标，是把非结构化文本转化为可以分析、解释和建模的信息资源。它既可以服务于关键词识别、主题发现和情感分析，也可以进一步支持知识发现、文献综述、政策分析和用户画像构建。

This course connects technical procedures with research design. It pays attention not only to algorithms, but also to how textual evidence can support a convincing academic argument.

Outline

Course Directory

Click a section to open the Markdown-based teaching document.

Module 1

Text Data and Preprocessing

Learn how to transform raw textual materials into analyzable research data.

TM-01

文本数据与预处理

Learn how to collect, clean, segment, and structure text data for computational analysis.

→

Module 2

Topic Modeling and Text Classification

Understand classical and neural approaches to extracting semantic structures from texts.

TM-02

主题建模与语义结构发现

Understand topic modeling as a way to discover latent semantic structures in large-scale text collections.

→

Module 3

Research Design with Text Mining

Apply text mining to literature review, policy analysis, patent intelligence, and online community research.

TM-03

LLM 辅助证据抽取

Learn how to use large language models as theory-constrained evidence extractors rather than free-form summarizers.

→