Gene expression
Gene expression is the process by which the information encoded in a gene is used to direct the assembly of a protein molecule. The transcription of genomic DNA to produce mRNA is the first step in the process of protein synthesis, and differences in gene expression are responsible for morphological and phenotypic differences.
Microarray technology
Microarray technology is one method to monitor the expression levels of thousands of genes under a particular condition. A microarray is typically a glass slide on to which DNA molecules are fixed in an orderly manner at specific locations called spots (or probes). These spots are later identified in this document by probe_IDs. A microarray may contain thousands of spots and each spot may contain a few million copies of identical DNA molecules which uniquely corresponds to a gene. There are many methods in microarray data analysis, but one of the most popular methods is to compare the expression levels of a set of genes under two conditions Condition A and a reference condition, Condition B. RNA is extracted from the two cells and is labeled with different dyes (red and green) during the synthesis of cDNA by reverse transcriptase. Following this step, cDNA is hybridized onto the microarray slide, where each cDNA molecule representing a gene will bind to the spot containing its complementary DNA sequence. The microarray slide is then excited with a laser at suitable wavelengths to detect the red and green dyes. The intensity of the colors of the final image will give a measurement of the amount of cDNA bound to a spot. The amount of cDNA bound to a certain spot is directly proportional to the initial number of RNA molecules present for that gene. The final image will undergo different image processing techniques to reduce background noise and the final expression level can be expressed as a ratio of the spot intensities between the gene set corresponding to condition A and the reference condition, condition B. The expression ratio T_k is defined as
T_k=R_k/G_k
where, for each gene k on the array, where R_k represents the spot intensity metric for the test sample and G_k represents the spot intensity metric for the reference sample.
If the intensity of the test condition is higher, which is if for a particular gene k if the spot intensity metric for the test sample is higher than the reference sample, it is up-regulated and vise versa. If there is no significant change in the expression levels, the expression ratio becomes 1. In this method the mapping interval of the up-regulated and down-regulated gene sets is different. Due to reasons like this, there are other transformation methods practiced to represent the expression ratio level as well. How an expression level is expressed depends on the dataset which has been used on each study.
High throughput and the noise of data in microarray gene expression data has made extracting useful information from them challenging. GeNet simplifies this task by providing various methods that can be used in the data pre-processing steps and selecting the most prominent genes affecting a target(feature selection) etc.
Since microarray technology can determine the expression levels of thousands of genes from a single array of chemical sensors, it has become a popular gene expression screening tool in the molecular investigation of various diseases. Gene expression data from both samples with the disease and without the disease can be used to identify which genes could have an impact on the disease. GeNet has provided a framework using machine learning techniques and statistical analysis that could identify the biomarker genes for a particular disease. GeNet also provides the facility to predict the status of the sample, whose current status is not known yet.