Abstract
Indic heritage knowledge is embedded in millions of manuscripts at various stages of digitization and analysis. Though numerous powerful tools have been developed for linguistic analysis of Samskrit texts, employing them together on large document collections and building end-user applications is a challenge due to non-standard interfaces. This paper examines the architectural needs of scalable Indic document analytics, and presents our experience in building an actual system. Though it is a work in progress, we demonstrate how careful metadata design enabled us to rapidly develop useful applications via extensive reuse of state-of-the-art analysis tools. This
paper offers an approach to standardization of linguistic analysis output, and lays out guidelines for Indic document metadata design and storage.
For more insights, check out the PDF document below.