Molecular Cancer Therapeutics CTRC-AACR San Antonio Breast Cancer Symposium Tumor Immunology: New Perspectives
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kim, S.
Right arrow Articles by Zhang, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kim, S.
Right arrow Articles by Zhang, W.
Vol. 1, 1229-1236, November 2002     Molecular Cancer Therapeutics
© 2002 American Association for Cancer Research

Identification of Combination Gene Sets for Glioma Classification 1

Seungchan Kim, Edward R. Dougherty, Ilya Shmulevich, Kenneth R. Hess, Stanley R. Hamilton, Jeffrey M. Trent, Gregory N. Fuller and Wei Zhang2

Department of Electrical Engineering, Texas A&M University, College Station, Texas 77840 [S. K., E. R. D.]; Departments of Pathology [E. R. D., I. S., S. R. H., G. N. F., W. Z.] and Biostatistics [K. R. H.], The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030; and Cancer Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland 20892-4470 [S. K., J. M. T.]


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussions
 References
 
One goal for the gene expression profiling of cancer tissues is to identify signature genes that robustly distinguish different types or grades of tumors. Such signature genes would ideally provide a molecular basis for classification and also yield insight into the molecular events underlying different cancer phenotypes. This study applies a recently developed algorithm to identify not only single classifier genes but also gene sets (combinations) for use as glioma classifiers. Classifier genes identified by this algorithm are shown to be strong features by conservatively and collectively considering the misclassification errors of the feature sets. Applying this approach to a test set of 25 patients, we have identified the best single genes and two- to three-gene combinations for distinguishing four types of glioma: (a) oligodendroglioma; (b) anaplastic oligodendroglioma; (c) anaplastic astrocytoma; and (d) glioblastoma multiforme. Some of the identified genes, such as insulin-like growth factor-binding protein 2, have been confirmed to be associated with one of the tumor types. Using combinations of genes, the classification error rate can be significantly lowered. In many instances, neither of the individual genes of a two-gene set performs well as an accurate classifier, but the combination of the two genes forms a robust classifier with a small error rate. Two-gene and three-gene combinations thus provide robust classifiers possessing the potential to translate expression microarray results into diagnostic histopathological assays for clinical utilization.


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussions
 References
 
Current estimates suggest that there are approximately 30,000–40,000 genes in the human genome (1, 2), and subsets of those genes are expressed in different cell types and in different cellular states. The combination of expressed genes at different levels determines the overall physiology of the cell. Two primary goals of functional genomics are to screen for, from amid the massive amount of transcriptonomic data generated by high-throughput cDNA microarray technology, the key genes and gene combinations that explain specific cellular phenotypes (e.g., disease) on a mechanistic level and to use this data to classify diseases on a molecular level (37).

An important consideration is that the number of genes in such gene feature sets should be sufficiently small so as to be potentially useful for clinical diagnosis/prognosis or as candidates for functional analysis to determine whether they could serve as useful targets for therapy. A number of classification approaches have been used to exploit the class-separating power of expression data; however, the size of the gene sets (sometimes as large as 70) renders the construction of practical immunohistochemical diagnostic/prognostic panels and the experimental design for functional testing problematic (3, 8, 9).

We use a recently proposed algorithm to identify strong gene feature sets that are responsible for distinct patient groups (10). These gene sets are "strong" in the sense that the algorithm builds classifiers from a probability distribution resulting from spreading the mass of the sample points to make the classification more difficult, while maintaining sample geometry. In an effort to identify the strong feature genes among the different histological diagnoses in patients with gliomas, we applied this method, in a proof-of-principle study, to glioma tissue specimens from 25 patients with four different types of glioma: (a) GM;3 (b) AA; (c) AO; and (d) low-grade OL. After finding the sets of genes that are capable of accurately classifying the different types of glioma, we have also identified strong features (genes) that are seemingly responsible for the distinct phenotype of each type of cancer.

Gliomas are the most common malignant primary brain tumors (11, 12). These tumors are derived from neuroepithelial cells and can be divided into two principal lineages: astrocytomas and OLs. Current glioma classification schemes are based on morphological feature assessment and remain highly subjective and problematic for many atypical cases. Diagnoses are often dependent on the relative weighting of specific morphological features by individual pathologists. We reason that by identification of robust signature gene classifiers using typical cases, the atypical cases can be classified based on the signature classifier genes in the future.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussions
 References
 
Primary Glioma Tissues.
All primary glioma tissues were acquired from the Brain Tumor Center tissue bank of The University of Texas M. D. Anderson Cancer Center. Tissue bank specimens were quick-frozen shortly after surgical removal and stored at -80°C. Although it is not known whether or to what extent the time delay between tumor removal and tumor freezing affects gene expression, all of the tumor tissue samples used in this study were handled in an identical fashion and experienced a similar length of delay. Thus, the tumor-harvesting procedure would have affected all samples in a similar manner and would not be expected to have contributed to the difference in gene expression patterns seen among the samples. H&E-stained frozen tissue sections are routinely prepared from all tissue bank specimens for screening purposes. All tissue specimens for cDNA array analysis were screened by a neuropathologist (G. N. F.), and the diagnoses were independently confirmed by a second neuropathologist. The glioma tissue blocks were specifically selected for densest and purest tumor, and they were all comparatively and uniformly "pure." There was minimal contamination by normal brain parenchyma and minimal variation between samples in this regard. The tumors were diagnosed according to two commonly used criteria: (a) St. Anne-Mayo (11); and (b) the recently revised WHO Classification of Tumors of the Nervous System (WHO 2000; Ref. 12). In this study, the gliomas are termed according to the St. Anne-Mayo nomenclature as low-grade OL, AO, AA, and GM.

Isolation of Total RNA and mRNA from Tissues.
The tissues were ground to powder under frozen conditions, and tissue powder (0.3–1.5 g) was lysed in the lysis buffer TRI Reagent (Molecular Research Center, Cincinnati, OH). The RNA isolation was done as described previously (13).

Hybridization to the Human Atlas cDNA Expression Array Blots.
The cDNA microarray containing fragments representing 597 human genes with known functions and known tight transcriptional controls (Clontech Laboratories, Inc., Palo Alto, CA) was used for our experiments, as described previously (13). After a high-stringency wash, the hybridization pattern was analyzed by autoradiography and quantified by phosphorimaging.

Development of an Algorithm for Finding Strong Feature (Gene) Sets.
We desire classifiers that categorize sample tissues based on gene expression values. There are two reasons why we desire classifiers involving small numbers of genes: (a) the limited number of samples often available in clinical studies makes classifier design and error estimation problematic for large feature sets (14); and (b) small gene sets facilitate design of practical immunohistochemical diagnostic panels. Thus, we use a simple classifier and a small number of genes (at most three in this study) to form classifiers (10).

Given a set of features on which to base a classifier, two issues must be addressed: (a) design of a classifier from sample data; and (b) estimation of its error. When selecting features from a large class of potential features, the key issue is whether a particular feature set provides good classification. A key concern is the precision with which the error of the designed classifier estimates the error of the optimal classifier. When data are limited, an error estimator may be unbiased but may have a large variance and therefore may often be low. This can produce many feature sets and classifiers with low error estimates. The algorithm we use mitigates this problem by designing classifiers from a probability distribution resulting from spreading the mass of the sample points. The algorithm is parameterized by the variance of the distribution. The error gives a measure of the strength of the feature set as a function of the variance.

When the data are limited, and all of it is used to design the classifier, there are several ways to estimate the classifier error. We comment on two of these. The resubstitution estimate, {varepsilon}n, for a sample of size n is the fraction of errors made by the designed classifier on the sample. Typically, it is low-biased, meaning E[{varepsilon}n] <= E[{varepsilon}n], the expected value of the actual error. For LOO estimation, n classifiers are designed from sample subsets formed by leaving out one data point at a time. Each is applied to the left-out point, and the estimator n is 1/n times the number of errors made by the n classifiers. It is an unbiased estimator of {varepsilon}n-1, meaning that E[n] = E[{varepsilon}n-1]. This unbiasedness comes at a cost: the variance of the LOO estimator is greater than that of resubstitution (15).

For {varsigma} >= 0, the algorithm we use constructs from the sample data a linear classifier {Psi}{varsigma}, where {varsigma}2 gives the variance of the distribution used to spread the data. Both {Psi}{varsigma} and its error, {varepsilon}{varsigma}, are computed analytically. For {varsigma} = 0, which means there is no spreading of the sample mass, {varepsilon}{varsigma} is equal to the resubstitution error estimate for the sample. Thus, the standard theory informs us that the variance of {varepsilon}0 is less than that of the LOO estimator. Moreover, model-based studies indicate that the variance of {varepsilon}{varsigma} decreases as {varsigma} increases. To standardize the interpretation of the results, {varsigma} is normalized relative to the variance of the data. Under this normalization, simulation studies with Gaussian distributions show {varepsilon}{varsigma} to be an unbiased estimator of the optimal linear classifier for {varsigma} = 0.4 and to be increasingly high-biased for increasing {varsigma}. To obtain conservative estimates of the optimal error, we take {varsigma} >= 0.4. Moreover, for very small feature sets, we normalize by the maximum variance of the features. By being conservative, we reduce the chance that the resulting error estimate is optimistic. When considering a large number of potential feature sets in the presence of a small amount of data, the salient issue is one of data mining. Taking a conservative approach reduces the number of optimistic error estimates while at the same time selecting feature sets that perform well on a distribution that is significantly more dispersed than the actual data.

The concept of forming spread distributions from the data can be appreciated by reference to Fig. 1, which shows sample points from two classes (red and blue) based on measurements of genes g1 (horizontal axis) and g2 (vertical axis). Fig. 1a shows a linear classifier derived solely from the sample points. Fig. 1, b–d, shows samples constructed from the original sample points by deliberately adding artificial random noise of increasing variance to the original points to form larger samples that are spread about the original sample. A linear classifier has been derived for each synthetic sample. Increasing the variance increases the error. A classifier that has a small error for a large variance is desirable because its performance is more likely to be robust relative to new data. Because the implementation of this approach takes a long time if the Monte-Carlo method is used, the actual algorithm used does not use random synthetic data to find the classifier and its error but instead constructs class distributions from the sample data and then finds both the classifier and its error analytically via simple matrix operations (10).



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1. The concept of forming increasingly disperse distributions from the data can be appreciated in this figure, which shows sample points from two classes (red and blue) based on measurements of genes g1 (horizontal axis) and g2 (vertical axis). a, shown is a linear classifier derived solely from the sample points. b-d, shown are synthetic samples constructed from the original sample points by randomly adding noise of increasing variance to the original points to form larger samples that are spread about the original sample. Dotted circles are shown to represent a SD of spreading. A linear classifier has been derived for each synthetic sample. Increasing the variance increases the error. This method is called a Monte-Carlo simulation, but this simulation method is not used in the new method proposed. A new analytical method is developed to speed up the algorithm.

 

    Results and Discussions
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussions
 References
 
Biologists are often interested in finding individual genes that have some influence on the system under study. In the context of classification, this approach translates into finding single-gene classifiers. Indeed, in the case of glioma classification, there appear to be cases in which single genes can provide decent classification, but certainly not always—for instance, when several genetic variations interact to result in a phenotype. If we are interested in sets of genes that perform in a multivariate manner to provide strong classifiers, then we should look for pairs of genes that perform well and substantially better than either of the genes individually, triples of genes that perform well and substantially better than the best performing pair among the three, and so on. For any feature set, we let {varepsilon}{varsigma} denote the error of the optimal classifier for the feature set, and we let {Delta}({varepsilon}{varsigma}) denote the largest decrease in error for the full feature set relative to all of its subsets. The feature sets are first ranked based on the {varsigma}-error, and they are ranked again based on the improvement, {Delta}({varepsilon}{varsigma}). For multiple-gene classifiers, we will focus on feature sets with high rank in both lists. Indeed, this is our major focus: to find strong feature sets in which all genes contribute to glioma discrimination.

To aid in understanding the gene expression characteristics of the selected feature sets, all of the genes in the data set are clustered in such a way as to be close to other genes with similar expression. This is accomplished via hierarchical clustering using the Pearson correlation and average linkage. An added value to the clustering is that genes with known behavior can be used to analyze the results, and genes with unknown behavior can be placed into certain pathways for future functional testing.

Classification Analysis for Glioma Data.
We applied the algorithm (10), which was described briefly in "Materials and Methods," to a set of gene expression profile data derived from 25 human glioma surgical tissue samples. The cDNA microarray experiments were carried out to gain expression information for 597 known cellular genes.

We designed two-class classifiers for the classification of OL from others, AO from others, AA from others, and GM from others. We limited the number of genes for each classifier to only three, and the dispersion levels (amount of spread) of samples were varied from {varsigma} = 0.4 to {varsigma} = 0.8. We focus on {varsigma} = 0.6 because it provides conservative error estimation, but not too conservative (10). Even with analytic classifier design and error estimation, due to the number of potential feature sets and the various cases considered, the computations were done on a Beowulf-based supercomputer (16) at the Center for Information Technology at NIH.

Tables 1GoGo4 show the feature sets identified for each classification category. The tables are constructed so that feature sets ranked high in both {varsigma}-error, {varepsilon}{varsigma}, and improvement, {Delta}({varepsilon}{varsigma}), of {varsigma}-error are listed. This is accomplished according to the following scheme: (a) the top three single-gene classifiers for the category are listed in each table; (b) two-gene classifiers ranked in the top N2 pairs for both {varsigma}-error and improvement of {varsigma}-error are included (N2 table-dependent); and (c) three-gene classifiers included in the top N3 triples for both {varepsilon}{varsigma} and {Delta}({varepsilon}{varsigma}) are included (N3 table dependent). For comparison purposes, the LOO error estimate is also shown in the tables. As expected, overall the {varsigma}-error is more conservative, so that when the {varsigma}-error is very small, usually the LOO error is also very small or zero.


View this table:
[in this window]
[in a new window]
 
Table 1 Feature sets to discriminate OL from others

Only pairwise classifiers that ranked at higher than 100th in both lists are included. Triplet-wise classifiers are included only when they are ranked at higher than 50th in both lists. For any feature set, {varepsilon}{varsigma} denotes the error of the optimal classifier for the feature set, and {Delta}({varepsilon}{varsigma}) denotes the largest decrease in error for the full feature set relative to all of its subsets. LOO is computed by designing n classifiers from sample subsets formed by leaving out one data point at a time, and then each classifier is applied to the left-out point, and the estimator LOO is 1/n times the number of errors by the n classifiers.

 

View this table:
[in this window]
[in a new window]
 
Table 2 Feature sets to discriminate GM from others

Pairwise classifiers are selected when they are ranked at higher than 200th in both lists, and triplet-wise classifiers are selected only when ranked at higher than 50th

 

View this table:
[in this window]
[in a new window]
 
Table 3 Feature sets to discriminate AO from others

Pairwise classifiers are selected when they are ranked at higher than 10th in both lists, and triplet-wise classifiers are selected only when ranked at higher than 50th.

 

View this table:
[in this window]
[in a new window]
 
Table 4 Feature sets to discriminate AA from others

Pairwise classifiers are selected when they are ranked at higher than 100th in both lists, and triplet-wise classifiers are selected only when ranked at higher than 50th.

 
To illustrate interpretation of the tables, consider discrimination of OL in Table 1. When selecting multivariate classifiers, we have removed all classifiers that include transducin ß 2 subunit 2 from the list because this gene itself has discriminating power so great that no matter what gene (even a noninformative gene) is used with it, the pairwise {varsigma}-error is very low (at least as low as for the gene itself). Because of our desire to avoid this kind of redundancy in the tables, there are gene sets omitted from the two- or three-gene lists that possess smaller {varsigma}-errors than those shown in the table. For instance, in Table 1, the {varsigma}-error for the top-listed two-gene set is substantially greater than for any pair involving transducin ß2 subunit 2, simply because adjoining genes to transducin ß2 subunit 2 produce a {varsigma}-error less than that of transducin ß2 subunit 2 itself. The complete performance lists for both error and improvement in error can be found in the supplementary information.4

The advantage of reporting the results in the way we have is that multivariate discriminatory power is revealed. This is clearly demonstrated in Table 1 with regard to cell surface glycoprotein MUC18. The gene does not appear on the single-gene list, indicating that its {varsigma}-error exceeds 0.1115; however, it appears with clusterin (CLU) in the two-gene list and both with and without clusterin (CLU) in the three-gene list. The substantial improvement in each case demonstrates the significant contributions of the genes within each gene set.

There are other instances where the improvement of classification error is sufficient to warrant inclusion in a table. In Table 2, even though IGFBP2 is by itself a decent discriminator, when it is combined with others, such as ephrin type A receptor 1 (EPHA1), the error is significantly improved. The {varsigma}-error decreases by more than 0.05, from 0.1392 (data not shown) to 0.0862. The improvement for the LOO error is more significant, from 0.16 (4 of 25) to 0 (0 of 25). Because of this, feature sets including IGFBP2 are shown in the table. We recently studied IGFBP2 expression in 256 cases of gliomas of different grades using tissue microarray and found that IGFBP2 is overexpressed in 80% of GBMs (Ref. 17; data not shown). Further testing with suitable antibodies will be able to test whether combination of IGFBP2 and EPHA1 will provide more accurate classifications. Some of these multivariate discriminators are shown in Fig. 2.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 2. Multivariate discriminators for glioma classifications (examples): a, a strong multivariate (three-gene) discriminator of OL from other types of glioma. OL shows relatively low expression in all three genes. b, discriminator of GM from other types. GM shows relatively high expression of all three genes shown. c, discriminator of AO from others. d, discriminator of AA from others. Note that there is a clear separation between AO and others even though the hyperplane doesn’t discriminate them perfectly. This is because of the nature of the algorithm. The algorithm tries to find the best discriminator that is efficient not only on the data set given but also on prediction. This is confirmed when LOO error is computed. The LOO error for the feature set and the data are not 0 but 0.04 (1 of 25). This is important because it again shows that the designed classifier does not over-fit the data. Scale on each axis represents a log2-transformed normalized intensity, log2 (intensity/median intensity of an array).

 
Clustering.
In Fig. 3, the global clustering map is shown on the left for the hierarchical clustering analysis outlined in "Materials and Methods." Four clusters are interesting with regard to discriminating OL and GM. Most genes found to singly discriminate OL from other types of glioma appear in the first cluster extracted on the right, and they are underexpressed in OL. Most genes found to classify GM from other types lie in the other three extracted clusters. In the first cluster, most of the genes are overexpressed in GM and underexpressed in AO and OL; in the second cluster, they are overexpressed in GM and underexpressed in OL; and in the third cluster, they are slightly underexpressed in GM.



View larger version (68K):
[in this window]
[in a new window]
 
Fig. 3. Hierarchical clustering of gene expression profile on glioma data set. Top panel, a cluster of genes that discriminate OL from other types of glioma. Most of the genes are underexpressed in OL. Second panel, a cluster of genes that classify GM from other types where most of the genes are overexpressed in GM and underexpressed in AO and OL. Third panel, a cluster of genes that classify GM from other types where most of the genes are overexpressed in GM and underexpressed in OL. Bottom panel, a cluster of genes that classify GM from other types where most of the genes are slightly underexpressed in GM.

 
Most genes identified as singly but only marginally classifying AO from the others are not clustered together as well as in the OL and the GM cases, nor are those classifying AA from the others. We find this interesting because this is consistent with the fact AO and AA represent more heterogeneous characteristics of the cancer. This supports the usefulness of the multivariate approach. Had we tried only a univariate approach to identify a singleton discriminator, we would not have found feature sets that can discriminate these two classes from others.

Gliomas are very complex cancers involving different growth characteristics and cell lineage features (12). Because the original clone of tumor cells may exist at any stage of cell differentiation and may have different transformation events, the boundaries between tumor grades and tumor lineages can be blurred. This is reflected in the current morphologically based tumor classification schemes that often mix cell lineage features with tumor growth characteristics. The results are frequently subjective, and disagreements among pathologists regarding the identity of t4he tumor are not uncommon. The gene expression activities yielded by the study of molecular biology and genomic biology may provide a more objective method to classify diseases. This belief is based on the assumption that cell phenotypes have genotypic origins. Recent successes in subclassification of neoplasms within a disease group using gene expression profiles (37) provide support for such a belief.

Thus, the issue is how to best identify the strong feature genes that are closely linked to specific phenotypes from among the thousands of genes in gene expression profiles and how to determine whether this information really aids classification of tumors. There are many technical challenges in the path to accomplishing the task of finding the key links.

The first major roadblock is the small sample size issue inherent to microarray-based classification efforts (14). Contributing to this are the limited numbers of human tissues for study and the cost of such gene expression profiling projects. Because classifiers are designed from observed expression vectors that have randomness arising from biological and experimental variability, the design, performance evaluation, and application of classifiers must take this randomness into account, especially when the number of samples (tissue specimens) is small, which is the case in most human tissue-based microarray experiments.

Algorithms are therefore needed to identify robust classifiers from very limited data sets. Three criteria have to be met for an algorithm to be considered strong. First, given a set of variables, a classifier from the sample data should provide good classification over the general population. Second, the algorithm should be able to estimate the error of a designed classifier when data are limited. Third, given a large set of potential variables, the algorithm should be able to select a set of variables as inputs to the classifier.

Taking these issues into consideration, we used a recently developed method to find both strong classifiers and strong features (10). This algorithm considers the inherently variable or "high-noise" nature of microarray measurements. Using this algorithm, we have identified robust classifier gene sets containing one to three genes that distinguish each type of glioma from the other three. This provides guidance for the development of pathological assays using a reasonable number of markers for clinical use.

In a broader context, the approach applied in this study can be used to identify genes that contribute to the major differences between any two groups of samples analyzed, in the process of which some less understood phenotypes might be identified. For example, we might find strong feature gene sets that distinguish cancers with high metastatic potential from cancers with little or no metastatic potential or gene sets that identify cancers that will be sensitive to specific therapies versus those that will be resistant and continue to grow unabated through therapy. Current histology-based classification and grading systems can do neither of these. Identification of such strong feature genes may not only provide markers for diagnosis and disease management but may also provide novel potential targets for drug development. Cancers have complex features, but we cannot target all of these features for treatment. A method that could identify the strong features, both genotypically and phenotypically, would provide an ideal route to the heart of the problem. Future studies will tell whether the currently used algorithm or an improved one will achieve this goal.


    Acknowledgments
 
We thank Drs. Edward B. Suh and Robert L. Martino for providing the computational resource of the Beowulf clustered supercomputer at the Center for Information Technology of NIH for the heavy computation of the algorithm. We thank Beth Notzon for editorial assistance.


    Footnotes
 
1 Supported in part by the Tobacco Settlement Funds as appropriated by the Texas State Legislature, by a generous donation from the Michael and Betty Kadoorie Foundation, and by a grant from the Texas Higher Education Coordination Board (Grant 003657-0039-1999). Back

2 To whom requests for reprints should be addressed, at Cancer Genomics Core Laboratory, Department of Pathology, Box 85, The University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030. Phone: (713) 745-1103; Fax: (713) 792-5549; E-mail: wzhang{at}mdanderson.org Back

3 The abbreviations used are: GM, glioblastoma multiforme; OL, oligodendroglioma; AO, anaplastic oligodendroglioma; AA, anaplastic astrocytoma; IGFBP2, insulin-like growth factor-binding protein 2; LOO, leave-one-out. Back

4 Supplementary data is available at Molecular Cancer TherapeuticsOnline (http://mct.aacrjournals.org). Back

Received 2/14/02; revised 8/30/02; accepted 9/30/02.


    References
 Top
 Abstract
 Introduction
 Materials and Methods
 Results and Discussions
 References
 

  1. Hogenesh, J. B., Ching, K. A., Batalov, S., Su, A. I., Walker, J. R., Zhou, Y., Kay, S. A., Schultz, P. G., and Cooke, M. P. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes.Cell , 106:413 –415,2001 .[CrossRef][Medline]
  2. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. The sequence of the human genome.Science (Wash. DC) , 291:1304 –1351,2001 .[Abstract/Free Full Text]
  3. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science (Wash. DC) , 286:531 –537,1999 .[Abstract/Free Full Text]
  4. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J., Jr., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., and Staudt, L. M. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.Nature (Lond.) , 403:503 –511,2000 .[CrossRef][Medline]
  5. Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K., Beaudry, C., Berens, M., Alberts, D., and Sondak, V. Molecular classification of cutaneous malignant melanoma by gene expression profiling.Nature (Lond.) , 406:536 –540,2000 .[CrossRef][Medline]
  6. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O. P., Wilfond, B., Borg, A., and Trent, J. Gene expression profiles in hereditary breast cancer.N. Engl. J. Med. , 244:539 –548,2001 .
  7. Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A. L., Brown, P. O., and Botstein, D. Molecular portraits of human breast tumours.Nature (Lond.) , 406:747 –752,2000 .[CrossRef][Medline]
  8. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. Tissue classification with gene expression profiles.J. Comput. Biol. , 7:559 –583,2000 .[CrossRef][Medline]
  9. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nat. Med. , 7:673 –679,2001 .[CrossRef][Medline]
  10. Kim, S., Dougherty, E. R., Junior, B., Chen, Y., Bittner, M. L., and Trent, J. M. Strong feature sets from small samples.J. Comput. Biol. , 9:129 –148,2002 .
  11. Daumas-Duport, C., Scheithauer, B. W., and O’Fallon, J. Grading of astrocytomas. A simple and reproducible method.Cancer (Phila.) , 62:2152 –2165,1988 .[CrossRef][Medline]
  12. Kleihues, P., and Cavenee, W. K. (eds.). Pathology and Genetics of Tumours of the Nervous System, 2nd ed. (WHO Classification of Tumours of the Nervous System). New York: Oxford University Press,2000 .
  13. Fuller, G. N., Rhee, C. H., Hess, K. R., Caskey, L. S., Wang, R., Bruner, J. M., Yung, W. K., and Zhang, W. Reactivation of insulin-like growth factor-binding protein 2 expression during glioblastoma transformation revealed by parallel gene expression profiling.Cancer Res. , 59:4228 –4232,1999 .[Abstract/Free Full Text]
  14. Dougherty, E. R. Small sample issues for microarray-based classification.Comparative and Functional Genomics , 2:28 –34,2001 .
  15. Devroye, L., Gyorfi, L. and Lugosi, G. A Probabilistic Theory of Pattern Recognition. New York: Springer,1996 .
  16. Sterling, T. L., Salmon, J., Becker, D. J., and Savarese, D. F. How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. Cambridge, MA: The MIT Press,1999 .
  17. Wang, H. M., Wang, H., Zhang, W., and Fuller, G. N. Tissue microarrays: applications in neuropathology research, diagnosis, and education.Brain Pathol. , 12:95 –107,2002 .[Medline]



This article has been cited by other articles:


Home page
Molecular Cancer TherapeuticsHome page
L. P. Petalidis, A. Oulas, M. Backlund, M. T. Wayland, L. Liu, K. Plant, L. Happerfield, T. C. Freeman, P. Poirazi, and V. P. Collins
Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data
Mol. Cancer Ther., May 1, 2008; 7(5): 1013 - 1024.
[Abstract] [Full Text] [PDF]


Home page
Mol Cancer ResHome page
O.-H. Lee, J. Xu, J. Fueyo, G. N. Fuller, K. D. Aldape, M. M. Alonso, Y. Piao, T.-J. Liu, F. F. Lang, B. N. Bekele, et al.
Expression of the Receptor Tyrosine Kinase Tie2 in Neoplastic Glial Cells Is Associated with Integrin {beta}1-Dependent Adhesion to the Extracellular Matrix
Mol. Cancer Res., December 1, 2006; 4(12): 915 - 926.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Choudhary, M. Brun, J. Hua, J. Lowey, E. Suh, and E. R. Dougherty
Genetic test bed for feature selection
Bioinformatics, April 1, 2006; 22(7): 837 - 842.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
M. A. A. Koike Folgueira, D. M. Carraro, H. Brentani, D. F. da Costa Patrao, E. M. Barbosa, M. M. Netto, J. R. F. Caldeira, M. L. H. Katayama, F. A. Soares, C. T. Oliveira, et al.
Gene Expression Profile Associated with Response to Doxorubicin-Based Therapy in Breast Cancer
Clin. Cancer Res., October 15, 2005; 11(20): 7434 - 7443.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
D. W. Mount and R. Pandey
Using bioinformatics and genome analysis for new therapeutic interventions
Mol. Cancer Ther., October 1, 2005; 4(10): 1636 - 1643.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
S. A. Schwartz, R. J. Weil, R. C. Thompson, Y. Shyr, J. H. Moore, S. A. Toms, M. D. Johnson, and R. M. Caprioli
Proteomic-Based Prognosis of Brain Tumor Patients Using Direct-Tissue Matrix-Assisted Laser Desorption Ionization Mass Spectrometry
Cancer Res., September 1, 2005; 65(17): 7674 - 7681.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
T. W. Vogel, Z. Zhuang, J. Li, H. Okamoto, M. Furuta, Y.-S. Lee, W. Zeng, E. H. Oldfield, A. O. Vortmeyer, and R. J. Weil
Proteins and Protein Pattern Differences between Glioma Cell Lines and Glioblastoma Multiforme
Clin. Cancer Res., May 15, 2005; 11(10): 3624 - 3632.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Pal, A. Datta, A. J. Fornace Jr, M. L. Bittner, and E. R. Dougherty
Boolean relationships among genes responsive to ionizing radiation in the NCI 60 ACDS
Bioinformatics, April 15, 2005; 21(8): 1542 - 1549.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
W. A. Freije, F. E. Castro-Vargas, Z. Fang, S. Horvath, T. Cloughesy, L. M. Liau, P. S. Mischel, and S. F. Nelson
Gene Expression Profiling of Gliomas Strongly Predicts Survival
Cancer Res., September 15, 2004; 64(18): 6503 - 6510.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
L. K. Mell, J. J. Meyer, M. Tretiakova, A. Khramtsov, C. Gong, S. D. Yamada, A. G. Montag, and A. J. Mundt
Prognostic Significance of E-Cadherin Protein Expression in Pathological Stage I-III Endometrial Cancer
Clin. Cancer Res., August 15, 2004; 10(16): 5546 - 5553.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kim, S.
Right arrow Articles by Zhang, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kim, S.
Right arrow Articles by Zhang, W.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online