Mapping Texts

donderdag 4 april 2024

Ik ben net begonnen met het lezen van Mapping Texts: computational text analysis for the social sciences. (Stolz & Taylor. 2024. Oxford University Press. [te bol.com]). Ik had het gekocht omdat het op een feed ergens langskwam, met lof over de nieuwe metafoor die ze gebruiken (p. xiii):

Metaphors orient us. Many guides to text analysis use the mining metaphor. When mining, we search for a valuable vein to extract from valueless gangue. [...] Mapping, by contrast, is not about extraction. It is about reduction to aid interpretation. When mapping texts, we simplify their information, but always for particular uses. Many useful cartographies are based on the same territory: road maps, contour maps, political maps, to name a few. Wraning, pruning, stopping, and transforming text all involve a decision informed by a particular goal. To put it plainly: there is not a sole kernel of truth to be extracted, but rather a range of empirical patterns. [...] While scale can undoubtedly be useful, iteration is the unsung hero of computational methods.

Deze gedachte deed mij "text mining" meteen in een ander licht zien, want ja: wat ik met computationele tekstanalyse wil doen, is het simplificeren van informatie, het in kaart brengen van referenties.

Doorlezend, geven Stolz en Taylor ook de volgende interessante definitie van wat tekstdata en -metadata zijn (p. 13):

[...] sources of variation may be a principle for delimiting a corpus or a principle for balancing a corpus. [...] These sources of variation may be more internal to a text's content of more external to this content. We use this observation to organize our sources of variation. Text metadata is information associated with a text, but perhaps only indirectly derived from the text's contents, (emph. Red.) including: authors and audiences, publication location and date, and domain and media. Text data is information derived from the text's contents, including: languages and dialects, genres and topics, registers and styles.

Ik had topic en taal als metadata gecategoriseerd, maar hun indeling baseert zich dus op wat tekst-intern afleidbaar is (data) en wat meer contextueel is (metadata).

Een interessant definitieverschil!

Ik ben dus pas op pagina 13 en ga deze post later nog verder updaten met nieuwe inzichten.