Showing posts with label web mining. Show all posts
Showing posts with label web mining. Show all posts

Tuesday, August 26, 2008

data preprocessing of web usage mining


One of the huge difficulties of web usage mining is data preprocessing. The most common form of input data is a web server log in CLF or ECLF format as above. It should end up with a list of server sessions. A lot of research has been done on data preprocessing. The well known doctoral thesis by Robert Walker Cooley presented a detailed process of web usage data preprocessing, including data cleaning, session identification, pageview identification, path completion and episode identification. But the programming is still a problem. Does anyone have the scource code or the ready made preprocessing tools? Please contact me through clearking@gmail.com.