3 Top Ways to Handle Missing Value Attributes Using Rapidminer Studio
Hello everyone, have a good day.
In large data sets, defects are often found, such as missing values or missing data. Missing Value is a data record where the value of a few or more attributes is unknown. In the case of missing values, research is often done by imputation or filling in the average values that often appear and also removing the attributes.
In data mining testing of missing value cases, the algorithm from the decision tree is categorized as being able to solve it without having to do data imputation, and there are three types of decisson trees, namely the CART algorithm, ID3 and C4.5 algorithm. As in research evaluating the performance of the decision tree algorithm on data that has missing values by classifying without imputation or filling in missing data
But besides that, there are 3 ways that are often used to overcome a case of missing data. Namely, deleting data or attributes that have missing values, filling them in by looking for them based on the average (replace missing values) and imputing data using the k-NN algorithm (Impute with k-nn).
these three methods will be explained using the Rapidminer Studio application
The data above is an example of a data that has problems because there are missing values. We will deal with it in all three ways
Okay, let's go straight to the first method
1. Removing Data or Attributes that Have Missing Values
- Enter local data to the process page
- Look for the Filter Example operator then connect with local data
- Click filter example, direct your view to the right side, on the condition class menu select no_missing_attributes
2. Using the Replace Missing Value Operator
- Enter local data to the process page
- Find and drag the "replace missing value" operator to the process page, then connect it to the local data.
- Click the "replace missing value" operator, then point your gaze to the right side bar
- "Attribute Filter Type" select a subset - then click "select attributes"
- Enter the attribute that has the missing value, move to the right using the right arrow in the middle of the dialog box, then click "apply"
- back again to the right side, in the "default" section select average
- Finally you run
3. Missing Value Imputation Using k-NN
- Prepare Operators named Impute Missing Value. Then double-click or drag it to the process page.
- Then on the operators double click 2x. After the new process page appears, you look for Operators k-NN then you connect it as shown below. And if you want to return to the start page, you just click the Process menu.
- Apart from that, you can also set the imputation level parameters. Like you want to imput only a few attributes, or you impute all attributes or based on the contents of the data in the Parameters dialog box on the right side of the process page.
- After that, the last step is that you connect all Operators, after that you just click Run or run the process. So then Rapidminer will display the results of the imputation process using k-NN.