Similar passage comparison
Authority term statistics
The Academia Sinica Digital Humanities Research Platform develops digital tools to meet the demands of humanities research, assisting scholars in upgrading the quality of their research. We hope to integrate researchers, research data, and research tools to broaden the scope of research and cut down research time. The Platform provides a comprehensive research environment with cloud computing services, offering all the data and tools scholars require. Researchers can upload texts and authority files, or use others’ open texts and authority files available on the platform. Authority terms possess both manual and automatic text tagging functions, and can be hierarchically categorized. Once text tagging is complete, you can calculate authority term and N-gram statistics, or conduct term co-occurrence analysis, and then present results through data visualization methods such as statistical charts, word clouds, social analysis graphs, and maps. Furthermore, the platform provides functionality for similar-passage comparison, Boolean search, word proximity search, and statistical filtering, enabling researchers to easily carry out textual analysis.
Data Archives:Interface with the extensive text archives of the Academia Sinica Institute of History and Philology’s Scripta Sinica Database (about 260 million words), Kyoto University’s Kanseki Repository (about 1.3 billion words), and Harvard University’s Chinese Text Project (about 5.1 billion words), and Chinese Buddhist Electronic Text Association (CBETA) (about 0.6 billion words), along with open authority files for names of place, dynasties, and people – the fundamental sources required for your research.
Shared Editing:Groups of users can jointly edit texts, authority files, and tagged content, enabling researchers to assemble different communities to tackle different research subjects.
Content Search: The structure and content of texts are open for browsing, Boolean search, multiple-term proximity search, and statistical filtering of search results, enabling researchers to quickly find keyword distributions and refine results.
Data Analysis: Analysis of authority terms, N-gram statistics, word frequency variation, and term co-occurrence across texts enables researchers to quickly find underlying relationships within big data. 。
VisualizationPie charts, line charts, word clouds, social network analysis graphs, and geographic information system tools present the results of textual analysis in graphic form, enabling researchers to more intuitively survey phenomena concealed within the data.
Main Functions of the Digital Humanities Research Platform:
1. Upload texts and authority files, or import data from other systems (Scripta Sinica Database, CText, Kanripo, CBETA)
2. Add open data from other platform users, or share your own data
3. Download statistics and analysis results
4. Flexible, customizable search (Boolean, multiple-term proximity search, etc.)
5. Text similarity comparison
6. Word frequency statistics (authority terms, N-grams)
7. Word co-occurrence statistics
8. Data visualization (histograms, network graphs, etc.)
9. Spatiotemporal display integration (GIS)
In addition, we also continue to develop the Digital Humanities Research Platform’s related tools and technologies, including: Linked Open Data (LOD), the International Image Interoperability Framework (IIIF), Optical Character Recognition (OCR) for Chinese characters, and Named Entity Recognition. Once these technologies and tools are ready, we will open their services for use by researchers.