Rank-frequency distribution of natural languages: a difference of probabilities approach.

2018 
The time variation of the rank $k$ of words for six Indo-European languages is obtained using data from Google Books. For low ranks the distinct languages behave differently, maybe due to syntaxis rules, whereas for $k>50$ the law of large numbers predominates. The dynamics of $k$ is described stochastically through a master equation governing the time evolution of its probability density, which is approximated by a Fokker-Planck equation that is solved analytically. The difference between the data and the asymptotic solution is identified with the transient solution, and good agreement is obtained.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    1
    Citations
    NaN
    KQI
    []