Mining Structured Knowledge from Massive Text Data: A Data-Driven Approach

Jiawei Han | Tuesday, July 16

Michael Aiken Chair Professor, Computer Science, University of Illinois at Urbana-Champaign

The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. However, such approaches may not be scalable, especially considering that a lot of text corpora are highly dynamic and domain-specific. Fortunately, the massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. Equipped with domain-independent and domain-dependent knowledge bases, we can explore the power of massive data itself for turning unstructured data into structured knowledge.

In this lecture we introduce a data-driven approach and a set of methods developed recently on exploration of the power of big text data, including mining quality phrases, recognition and typing of entities and relations by distant supervision, pattern-based information extraction, multi-faceted taxonomy discovery, construction of multi-dimensional text cubes and networks, and their associated knowledge generation. We show that the massive text data can be powerful at disclosing patterns and structures, and it is promising to explore the power of massive text data to turn massive text data into structured knowledge.

Outline of the lecture:

PART 1: Introduction

Why is miningstructures in text a key problem for “turning big data to knowledge”?
Why data-driven approach to text mining?

PART 2: Automated Phrase Mining

Different approaches to mining quality phrases
AutoPhrase: Exploring the power of distant supervision

PART 3: Automated Entity/Relation Recognition

Entity/relation recognition: weakly/distantly supervised approaches
Meta-pattern discovery and embedding in entity recognition

PART 4: Text Classification and Text Cube Construction

Embedding and text similarity
Text classification: Doc2Cube and WeSTClass Approaches
Taxonomy generation: Set expansion, synonym discovery, and taxonomy mining
Textcube construction

PART 5: Exploring Multidimensional Structures for Knowledge Discovery

Multidimensional text analysis
User-guided topic mining

PART 6: Looking into the future

Multi-dimensional text-intensive knowledge network construction and exploration

Follow the latest tweets

Meet the Speakers

Contact Us

Reach out to us and we will respond as soon as possible.

You can send us an email to [email protected].