Feature Selection

After reducing the gene list to a considerable smaller factor, the accuracy could be still developed if you could further narrow down the gene list and select only the few sets of genes that have a very significant impact on the target variable. For that, we make use of feature selection methods. We use 3 feature selection methods that will select the best n number of genes according to their algorithms. You can select the number n by using the scroll bar at the bottom of the page.

Principal Component Analysis (PCA)

Is a technique that uses an orthogonal transformation to convert a set of observations from correlated features into a set of linearly uncorrelated variables called principal components. GeNet selects n genes by selecting the highest scored feature in each component.

Random Forest

Random Forest consists of a number of decision trees. Every node in the decision tree is a condition on a single feature, designed to split the dataset into two so that similar response values end up in the same set. The optimal condition is based on a measure called impurity. When training the tree, it can be computed how much each feature decreases the weighted impurity. For forest, the impurity decrease from each feature can be averaged and the features are ranked according to this measure.

Extra Tree Classifier

Feature importance by extra tree classifier implements a meta estimator that fits a number of randomized extra trees on various sub-samples of the dataset and uses averaging to improve predictive accuracy and control over-fitting.