Optimising and Automating the Choice of Search Strings when Investigating Possible Plagiarism

Fintan Culwin,Mike Child

Optimising and Automating the Choice of Search Strings when Investigating Possible Plagiarism

2010

This paper describes how to optimise the use of Internet search engines when investigating a document for possible non-original content. Services such as Turnitin do not guarantee to identify all non-original content, leading tutors to have to conduct manual searches when suspicion of non-originality remains. Previous studies have suggested that the investigator should manually select memorable phrases from the paper and submit them to a general search engine. The studies in this paper demonstrate that selecting phrases at random is just as effective. Several corpora of documents were obtained from a number of different academic areas, and several phrases were obtained from each. Strings, of increasing length starting with a single word, from these phrases were submitted to specialised and general search engines and the number of hits recorded. A common finding of these searches was that, in almost all cases, strings of six words were sufficiently distinct to uniquely identify the document that the string was taken from. One consequence of this is that totally automated tools are possible for this search-engine based non-originality detection

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations