Quantifying Confounding Characteristics (of Language) in Unstructured Text

On Friday, April 21, 2017, Dr. Kobi Abayomi will give a presentation entitled “Quantifying Confounding Characteristics (of Language) in Unstructured Text” in AS 107.

Consistent and reliable methodologies to quantify confoundedness in unstructured text across language features is a relatively unexplored area in Natural Language Processing (NLP) work.
While automated measurement of syntactical features such as voice, tense, lexical difficulty is well understood, less is known about how to quantify these features in the presence of confoundation and less still across multivariate modes of confoundedness. This work quantifies the presence of confounding characteristics of four types:

Neologisms The presence of words nascent to common dictionaries.
Intentional Spelling Mistakes Spelling errors beyond and separate from ordinary error.
Grammatical Errors Syntactical errors.
Sarcasm Dissonance between written and received intent.

We demonstrate ability to nd and measure confoundedness of each of these four types. A fortiori, for each factor, we present univariate & multivariate formulae which map the measurements of confoundedness into a single number which we call a Q-score which reflects the overall confoundedness of that factor. We conclude optimistically: we find reliable methods for quantifying confoundedness which illustrate valid variations across type and within sources. We believe more work at the balance of rule-based tokenization and statistical classification is needed, especially for sarcasm quantification, to advance the art.

Date: April 21, 2017
Time: 1:15-2:15 p.m.
Location: AS 107

Quantifying Confounding Characteristics (of Language) in Unstructured Text

Submit a Comment

Tags