Descrição
© 2015 Elsevier Ltd.With increasing amount of information, mainly due to the explosive growth of Internet, the demand for applications of automatic text analysis has also grown. One of the tools that has increased in importance in the understanding of problems related to this area are complex networks. This tool merges graph theory and statistical methods for modeling important problems. In several research fields, complex networks are studied from the various points of view, such as: topology of networks, extraction of physical features and statistics, specific applications, comparison of metrics and study of physical phenomena. Linguistic is one area that has received great attention, particularly due to its close relationship with issues arising from the emergence of large text databases. Thus, many studies have emerged for modeling of complex networks in this area, increasing the demand for efficient algorithms for feature extraction, network dynamic observation and comparison of behavior for different types of languages. Some works for specific languages such as English, Chinese, French, Spanish, Russian and Arabic, have discussed the semantic aspects of these languages. On the other hand, as an important feature of a network we can highlight the computation of average clustering coefficient. This measure has a physical impact on the network topology studies and consequently on the conclusions about the semantics of a language. However its computational time is of O(n3), making its computing prohibitive for large current databases. This paper presents as main contribution a modeling of two complex networks: the first one, in English, is constructed from a specific medical database; the second, in Portuguese, from a journalistic manually annotated database. Our paper then presents the study of the dynamics of these two networks. We show their small-world behavior and the influence of hubs, suggesting that these databases have a high degree of Modularity, indicating specific contexts of words. Also, a method for efficient clustering coefficient computation is presented, and can be applied to large current databases. Other features such as fraction of reciprocal connections and average connection density are also calculated and discussed for both networks.