Extracting Multiwords From Large Document Collection Based N-Gram

  IJPTT-book-cover
 
International Journal of P2P Network Trends and Technology (IJPTT)          
 
© 2013 by IJPTT Journal
Volume-3 Issue-3                           
Year of Publication : 2013
Authors : M. Nirmala , Dr.E.Ramaraj

Citation

M. Nirmala , Dr.E.Ramaraj."Extracting Multiwords From Large Document Collection Based N-Gram". International Journal of P2P Network Trends and Technology (IJPTT), V3(3):38 - 41  May - Jun 2013, ISSN:2249-2615, www.ijpttjournal.org. Published by Seventh Sense Research Group.

Abstract

Multiword terms (MWTs) are relevant strings of words in text collections. Once they are automatically extracted, they may be used by an Information Retrieval system, suggesting its users possible conceptual interesting refinements of their information needs. As a matter of fact, these multiword terms point to relevant information, often corresponding to topics and subtopics in the text collection, and maybe quite useful specially for highly refining generic queries. A new approach is proposed to find collocation from text document. As mentioned earlier, a collocation is just a set of words occurring together more often than by chance in a corpus. Collocations are extracted based on the frequency of the joint occurrence of the words as well as that of the individual occurrences of each of the words in the whole text. Intuitively, when a set of words is extracted as a collocation, then the joint occurrence of the words must be high in comparison to that of the constituent individual words.

References

[1]. Efficient in-memory data structures for n-grams indexing . Daniel Robenek, Jan Plato_s, and V_aclav Sn_a_sel, fdaniel.robenek.st, jan.platos, vaclav.snasel.
[2]. Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction. Su Nam Kim, Timothy Baldwin, Min-Yen Kan. This email address is being protected from spambots. You need JavaScript enabled to view it., This email address is being protected from spambots. You need JavaScript enabled to view it., This email address is being protected from spambots. You need JavaScript enabled to view it..
[3]. Automatic Keyword Extraction From Any Text Document Using N-gram Rigid Collocation,Bidyut Das, Subhajit Pal, Suman Kr. Mondal, Dipankar Dalui, Saikat Kumar Shome.International Journal of Soft Computing and Engineering (IJSCE)ISSN: 2231- 2307, Volume-3, Issue-2.
[4]. Advanced Information Extraction with n-gram based LSI,Ahmet Güven, Ö. Özgür Bozkurt, and Oya Kal?ps?z.World Academy of Science, Engineering and Technology 17 2008.
[5]. Evaluating N-gram based Evaluation Metrics for automatic Keyphrase Extraction ,Su Nam Kim, Timothy Baldwin,CSSE University of Melbourne,This email address is being protected from spambots. You need JavaScript enabled to view it., This email address is being protected from spambots. You need JavaScript enabled to view it. Kan School of Computing . National University of Singapore ,This email address is being protected from spambots. You need JavaScript enabled to view it.

Keywords

-Multiword terms (MWTs), Information, Collocations, Extraction , Text Document.