loformation Retrieval - using Porter Stemming Algorithm
Stemming is a process of removing or transforming endings (suffixes) when they are found on a word; inflectional endings (-s, -ing, -ed, etc) and derivational endings (-ion, - ative, -ity, -ment, -less, etc) and prefixes (un-, in-, etc). The rationale for using stemming is that similar words usua...
| Main Author: | Zulkifly, Zurida Azita |
|---|---|
| Format: | Final Year Project |
| Language: | English |
| Institution: | Universiti Teknologi Petronas |
| Record Id / ISBN-0: | utp-utpedia.7082 / |
| Published: |
Universiti Teknologi Petronas
2006
|
| Subjects: | |
| Online Access: |
http://utpedia.utp.edu.my/7082/1/2006%20-%20loformation%20Retrieval%20-%20using%20Porter%20Stemming%20Algorithm.pdf http://utpedia.utp.edu.my/7082/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: |
Stemming is a process of removing or transforming endings (suffixes) when they are
found on a word; inflectional endings (-s, -ing, -ed, etc) and derivational endings (-ion, -
ative, -ity, -ment, -less, etc) and prefixes (un-, in-, etc). The rationale for using
stemming is that similar words usually have similar meanings, so including words that
are similar in meaning to those originally contained within it will increased the retrieval
process effectiveness. There are many stemming method that have been developed.
However, the main focus of this project is on Porter Stemming Algorithm which has
been developed by M.F Porter in 1980. The objective of this project is to develop a
system that will demonstrate the information retrieval using Porter Stemming
Algorithm. Problem with information retrieval is to get document that relevant to users
query. To measure the performance, there are two measurement, which are precision
and recall. The scope of the project is to implement the original Porter Stemming
Algorithm in the application to improved the precision and recall in the retrieving
document process. Even though there are many improvements have been made to the
Porter Algorithm, we will focus on the original algorithm in this project. The Porter
Stemming algorithm had five phases, which in every phase have it owns rules to
stripping the suffixes. By implementing the algorithm, it is expected from the
application to retrieve only documents that relevant to the users query. |
|---|