A Statistical Investigation into the Cross-Linguistic Distribution of Mass and Count Nouns: Morphosyntactic and Semantic Perspectives

Ritwik Kulkarni, Susan Rothstein, Alessandro Treves


We collected a database of how 1,434 nouns are used with respect to the mass/count distinction in six languages; additional informants characterized the semantics of the underlying concepts. Results indicate only weak correlations between semantics and syntactic usage. In five out of the six languages, roughly half the nouns in the database are used as pure count nouns in all respects; the other half differ from pure counts over distinct syntactic properties, with fewer nouns differing on more properties, and typically very few at the pure mass end of the spectrum. Such a graded distribution is similar across languages, but syntactic classes do not map onto each other, nor do they reflect, beyond weak correlations, semantic attributes of the concepts. Considerable variability is seen even among speakers of the same language. These findings are in line with the hypo-thesis that much of the mass/count syntax emerges from language- and even speaker-specific grammaticalization.


mass count; entropy; variance; distribution;

