The ‘Hepatitis’ Data set (provided in arff. format is available on the Blackboard) contains information about patients affected by the Hepatitis disease. The task is to predict if these patients have or have not hepatitis (Histology: Yes or No).
You should use the Weka data mining package, which is installed in the university computers and also available to download from: http://www.cs.waikato.ac.nz/~ml/weka/
You should hand in a report covering the following:
Select a suitable tree building algorithm and build a model. Describe how you split the data for training and testing purposes. Interpret the output results (the accuracy rate, which attributes were used to make the predictions, how many nodes and leaves you obtained).
Give a detailed technical description of the classification model (which algorithm is used, and what tree induction method is utilised, which attribute selection criteria is used). Include a diagram showing the structure of the model that you built.
If you vary the model parameters, show how this impacts the results:
Set the ‘REP’ parameter (Reduced Error Pruning) to ‘TRUE’. Explain the meaning of this operation. Report and explain any change in the model accuracy.
Set the parameter ‘unpruned’ to ‘TRUE’, Report and explain any change in the model accuracy and in the tree structure. Explain which pruning method for this algorithm is used.
Report on the model’s comparative ability to other 2 models of your choice (for example, neural networks or SVM) to predict the histology. Which model classified data most accurately and what are the possible reasons of its prevalence?
Show a confusion matrix for the model and interpret it. Show a ROC curve for the decision and interpret them.