Show simple item record

dc.contributor.authorKANTNER, Cathleen
dc.contributor.authorKUTTER, Amelie
dc.contributor.authorHILDEBRANDT, Andreas
dc.contributor.authorPÜTTCHER, Mark
dc.date.accessioned2016-03-15T13:46:23Z
dc.date.available2016-03-15T13:46:23Z
dc.date.issued2011
dc.identifier.issn2192-7278
dc.identifier.urihttps://hdl.handle.net/1814/40289
dc.description.abstractLarge digital text samples are promising sources for text-analytical research in the social sciences. However, they may turn out to be very troublesome when not cleaned of the 'noise' of doublets and sampling errors that induce biases and distort the reliability of content-analytical results. This paper claims that these problems can be remedied by making innovative use of computational and corpus-linguistic procedures. Automatic pairwise document comparison based on a vector space model will bring doublets to light, while sampling errors can be discerned with the help of textmining procedures that measure the 'keyness' of a document, i.e. the degree to which it contains or does not contain keywords representing research topic.
dc.language.isoen
dc.relation.ispartofseriesInternational Relations Online Working Paperen
dc.relation.ispartofseries2011/2en
dc.relation.urihttp://www.uni-stuttgart.de/soz/ib/forschung/IRWorkingPapers/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.titleHow to get rid of the noise in the corpus : cleaning large samples of digital newspaper texts
dc.typeWorking Paper


Files associated with this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record