Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction

Published in In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15), 2015

This paper addresses the task of source retrieval from a large textual documents corpus. It introduces two methods for extracting important terms: weighted noun phrases and keyword phrases from lengthy sentences based on word count. Queries are formed from top-ranked sentences, and the system collects a comprehensive dataset of downloaded sources for query filtering. Each query is divided into two sub-queries, and the system extracts one snippet for each sub-query for downloading.

Recommended citation: Javad Rafiei Asl, Salar Mohtaj, Vahid Zarrabi, and Habibollah Asghari. "Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction—Notebook for PAN at CLEF 2015." In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15). 2015. https://ceur-ws.org/Vol-1391/143-CR.pdf