Monojit Choudhury, Diptesh Chatterjee, and Animesh Mukherjee
Word co-occurrence networks are one of the most common linguistic networks studied in the past and they are known to exhibit several interesting topological characteristics. In this article, we investigate the global topological properties of word co-occurrence networks and, in particular, present a detailed study of their spectrum. Our experiments reveal certain universal trends found across the networks for seven different languages from three different language families, which are neither reported nor explained by any of the previous studies and models of word-cooccurrence networks. We hypothesize that since word co-occurrences are governed by syntactic properties of a language, the network has much constrained topology than that predicted by the previously proposed growth model. A deeper empirical and theoretical investigation into the evolution of these networks further suggests that they have a core-periphery structure, where the core hardly evolves with time and new words are only attached to the periphery of the network. These properties are fundamental to the nature of word co-occurrence across languages.
|Published in||Coling 2010|