
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
1 Science Applications International Corp., National Cancer Institute; 2 Developmental Therapeutics Program, Screening Technologies Branch, National Cancer Institute at Frederick, Frederick, Maryland; and 3 Genetics Branch, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland
Requests for reprints: Ilan R. Kirsch, Amgen, 1201 Amgen Court West, AW1-J 4144, Seattle, WA 98119-3105. Phone: 206-265-7316; Fax: 206-216-5930. E-mail: lkirsch{at}amgen.com
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The individual immortalized cancer cell lines in the National Cancer Institute (NCI) drug discovery panel (NCI-60) can be characterized and distinguished by a variety of abnormal karyotypic features in both the number and the structure of their component chromosomes. We have quantified the chromosomal aberrations of this cell line panel using spectral karyotyping to delineate ploidy as well as structural and numerical karyotypic complexity and heterogeneity (5). Numerical complexity describes the change in chromosome number compared with the ploidy level of the cell line; structural complexity reflects translocations, deletions, duplications, amplifications, insertions, and inversions of the chromosomes. Ongoing instability is revealed by the heterogeneity variable, which measures metaphase-to-metaphase variation in numerical or structural complexity within a given cell line. These quantifiable spectral karyotyping variables can be used as a descriptor of the chromosomal "state" for each cell line.
The connection between chromosomal instabilities and cancer has focused attention on defects in chromosomal segregation, telomere stability, cell cycle checkpoint regulation, and repair of DNA damage (610). The specific components of such processes are not yet fully delineated, but if a protein could be identified as a gatekeeper to maintaining chromosomal stability, manipulation of the function of this protein with small-molecule drugs, either directly or indirectly, might provide a powerful weapon for cancer therapy.
There is no requirement, however, that a single protein be the sole determinant of a cellular karyotype. Although contrary to the current development of precise molecularly targeted drugs, it is also possible that a particular karyotypic phenotype itself could be drugable. In an initial effort to identify lead compounds whose activity could be related to particular karyotypic observables, we studied a subset of drugs (n = 1,429, including many standards of the chemotherapeutic armamentarium) from the Developmental Therapeutics Program data repository that had been repeatedly screened against the NCI-60 (11). This prior study explored correlations between karyotypic variables and growth inhibition for this relatively restricted set of agents. Although infrequent, positive correlations were found, but, in general, for current commonly used anticancer drugs this analysis did not find evidence for a direct positive (i.e., relatively increased sensitivity with increased karyotypic complexity) association between cytotoxic profiles and the experimentally determined variables of karyotypic state. These results suggested that, among other possibilities, the mechanisms of action for many well-known anticancer agents were most likely not associated with chromosomal abnormalities, consistent with their somewhat limited utility, for the most part, in epithelial cancers that reside at the more karyotypically complex end of the cancer spectrum.
In this current study, we have explored the utility of the full set of publicly available screening data, consisting of cell-based growth inhibition data for
30,000 potential anticancer compounds, to delineate aspects of karyotypic variability that are distinct and uniquely identifiable within this data set. Association between the screening data organized via a self-organizing map (SOM) and karyotypes can be used to distinguish unique mechanisms of action associated with chromosomal aberrations. We delineate groups of compounds heretofore less well-characterized relative to agents already commonly used in oncology clinical practice. These agents or derivatives of them would be candidates for drugs developed for the treatment of karyotypically complex cancers. In addition, exploration of the structures and mechanisms of action of these compounds may provide insight into the nature of karyotypic instability itself.
| Materials and Methods |
|---|
|
|
|---|
Karyotypic Variables
Details of the karyotypic analysis of the NCI-60 have been described previously (5).
Self-Organizing Map
To simultaneously describe similarities between all GI50 data vectors, we have used a SOM (16) to organize cellular growth inhibition data derived from the NCI-60 tumor cell panels (15). The SOM algorithm identifies cluster vectors in the 60-dimensional data space by minimizing the deviation between the GI50 data vectors and the cluster vectors. Regions in GI50 space that are dense with data vectors attract many cluster vectors, and regions with few data vectors attract fewer cluster vectors, resulting in a division of response space that mimics information content. An advantage of SOM reordered data is the ability to visualize the global clustering results in an interpretable manner. Our preferred method of display is the uniform projection of SOM clustering in high-dimensional space to a two-dimensional map. This mapping both is simple and retains a great deal of the original high-dimensional information. Additional details regarding the creation and access to the GI50 SOM are given (e.g., ref. 15).
Each GI50 data vector is thus uniquely assigned to a cluster vector on the SOM. Different compounds result in different profiles that are associated with different locations on the SOM. Occasionally, two compounds will generate similar profiles. The more similar, the closer they are then grouped, which can be visualized on the two-dimensional SOM where separate cluster vectors designate groupings of compounds demonstrating related profiles and cluster neighborhoods represent areas of associations between possible mechanisms of action.
Statistical Analysis and Correlations
The growth inhibition data vectors used for the SOM construction encode drug concentrations and were Z-score normalized before clustering (i.e., the mean GI50 was subtracted off each measurement and divided by the SD). Normalization to unitless data facilitates comparisons between other independently derived measures from these same tumor cells. The same normalization procedure was applied to construct karyotypic data vectors. Because our SOM is a representation of the normalized GI50 vector space, other data vectors may or may not be appropriately described by this same space. Of course, any data vector will have a minimum distance to one particular cluster vector (e.g., trivially, the null vector will find the cluster vector with the least variance). Quality-control checks of these results using standard measures of similarity revealed no anomalies (i.e., their profiles are similar). In addition, the capacity to find significant matches between cytotoxic profiles and karyotypic profiles is an indication that the karyotypic data can be described by the cytotoxicity-derived space encompassed by our SOM.
The next stage of our analysis addressed appropriate and robust means for assessing significance of similarity and possible means of data reduction to clarify the observables associated with the karyotypic state. The Pearson or sample correlation coefficient (PCC) between two vectors
and
is defined as

denotes the average of all elements in
. The correlation coefficient measures the fidelity of a linear fit of v(u) and takes on values between 1 and +1. A correlation coefficient of 1 indicates that each vector is linearly dependent on the other; it does not mean that the vectors are exactly the same. A measure of how similar two data vectors are to each other can be gauged using linear regression. Associated with the correlation (PCC) is a P derived from PCC and N (number of data points) that gives the probability that a correlative relationship exists (i.e., if P = 0.05, there is only a 1 in 20 chance that the observed correlation is due to random chance). Moderate values of PCC are sometimes thought of as indicating strong relationships, but this may produce misleading results. Instead, the real strength of the relationship is best indicated by PCC2. Technically, this is the proportion of variance in one vector "explained" by linear regression on the other vector. Thus, even if there exists a nonrandom correlation (e.g., moderate PCC), the strength of the correlation need not be great (e.g., low PCC2). The observation that a correlative relationship exists can then be used to construct testable hypotheses to verify the existence of a statistically supportable connection between the observables. Thus, although the magnitude of the r may be small, a correlative analysis can establish important connections between variables.
Singular Value Decomposition
Methods of singular value decomposition are used to investigate the properties of the karyotypic variable vectors themselves. Briefly, we can form a matrix K of N karyotypic variables as columns and M cell lines as rows. This matrix can always be decomposed into three matrices, U, S, and VT, to form the matrix equation:
![]() |
![]() |
| Results |
|---|
|
|
|---|
Karyotypic Observable Projection on GI50 SOM
A projection of data derived from all the NCI-60 cell lines on the SOM generated from GI50 cytotoxicity data proceeds by finding the smallest distance between this data vector and all the cluster vectors describing the map. The growth inhibition pattern of sensitivity and insensitivity in GI50 reflect the characteristic cellular differences in growth inhibition after drug exposure in the assay. Correspondingly, the normalized karyotypic observables reflect the characteristic karyotypic differences between cells. Linking karyotypic data to GI50 data via the SOM attempts to delineate hypotheses about relationships between karyotype and drug sensitivity. This linkage presumes that a statistically significant similarity between the karyotypic differential pattern and the GI50 pattern of a drug provides a basis for hypothesizing that cells displaying relatively higher karyotypic measures are more sensitive to that drug than other cells with relatively lower karyotypic measures.
Some typical correlation values and associated Ps between specific karyotypic observables and GI50 data vectors are given in Table 1. For these compounds, the mean GI50 values across the cell lines given in Table 1 varied between the least sensitive 104.3 mol/L to the most sensitive 107.3 mol/L with an average growth inhibition concentration of 104.9 mol/L, with individual cell GI50 values ranging from the highest test concentration of 104 mol/L down to 108 mol/L. As a reference for the GI50 values, it is worth noting that this same general range of mean sensitivities (based on GI50) is observed for compounds currently used as standard of care in clinical oncology practice (e.g., leucovorin 104.3 mol/L, 5-fluorouracil 104.6 mol/L, cisplatin 105.5 mol/L, gemcitabine 106.7 mol/L, and docetaxel 107.6 mol/L).
|
0.05 cut. In practical terms, this result indicates that the karyotypic profiles best match the cytotoxic profiles within the subregion P3. These compounds represent an unexplored set of chemical motifs whose activities correlate with the variability of the cellular karyotypes.
|
Independent assessments between each of our karyotypic measures find strong correlations between structural heterogeneity and numerical complexity as shown in columns 2 to 4 in Table 2, with a correlation coefficient of 0.66. Both of these karyotypic measures are then correlated to the numerical heterogeneity data vector with correlation coefficients of 0.50 and 0.56, respectively. The structural complexity variable contains the least information, as related to the other karyotypic measures, with correlation values ranging from 0.03 to 0.40.
|
|
|
0.01. These results are summarized in Table 3, and a more detailed list is available as supplementary documentation4. A total of 13 classes of potential agents are delineated as motifs A to L. The majority of these compounds has not been identified in the previous pilot study (11) on a much smaller group of compounds (n = 1,429).
|
No mechanisms of action are known for the bis-naphthylcarboxamides, bis-naphthylureas, and anilinomalonyl phenylazopyrazoles, shown in motifs C and D. These drugs are most closely associated with the P region and correlate most strongly with the numerical complexity, numerical heterogeneity, and structural heterogeneity patterns. Compounds of motif D have been found inactive in NCI's anti-HIV screen. One of the pyridinethione carbonitrile nucleosides listed as motif E in Table 3 is a P-glycoprotein antagonist (27).
The pentachlorophenyl polypeptide esters defined as motif F correlate specifically with the numerical complexity. One compound in this set has been identified as a potential modifier of the c-erbB2 pathway (28), which is intimately connected to cell cycle control (29) and is thus indirectly or directly related to the chromosomal state of the cell.
Motifs G to J encompassing thiazolyl coumarins, anilino/phenoxy-carboxy/phenyl-6(7)-substituted quinoxalines, 1,8-bis(5-aryloxymethyl-4-anilino-1,2,4-triazol-2-yl)octanes, and 3-alkylidene-5,5-disubstituted tetrahydro-2-furanones, listed in Table 3, are not associated with any known mechanism of action or target. They are associated with all karyotypic variables, except for structural complexity, and cluster mainly in the P region of the SOM. Motif I appears in the S6 region of the map, which is colocalized with an abundance of topoisomerase inhibitor GI50 data vectors. This motif carries specificity for the structural heterogeneity karyotypic variable.
The 2-substituted mercapto-3H-quinazolines listed as motif K and mainly found in the P region of the SOM were originally tested for antibacterial, antifungal, and antiacetylcholinesterase activities (30). Subsequent studies involving these compounds have identified them as kinesin inhibitors (31). Because kinesin is directly involved in the mitotic spindle function, these compounds have received attention as antimitotic agents.
The N-(p-(substituted azole)phenyl) benzenesulfonamides defined as motif L are largely uncharacterized; however, it is interesting to note that this motif and motifs E and J are the only ones indicating specificity toward structural complexity. The 1,1-dimethyl-3-phenyl-3-pyrrolidinyl/4-morpholinyl naphthalans (motif M) are again a relatively unexplored group of structures, although loosely related substructures have been shown to be inhibitors of thymidylate synthase (32), which is critical for DNA repair and replication.
The structurally very different compound classes identified above can be agents of similar target groups as well as different pathways that are common to the particular chromosomal state. Using the GI50 responses for these structures, we have organized these around the karyotypic features.
Exploration of Karyotypic States
The correlations between the data vector of the four karyotypic variables (Table 2) are not totally independent of each other; thus, the information content of each variable is not unique to itself. We therefore did a single value decomposition of the matrix formed by the four karyotypic data vectors and found w1 = 0.55, w2 = 0.25, w3 = 0.11, and w4 = 0.08. Applying the threshold of 0.7/4 = 0.175, w1 and w2, which accounts for 80% of the variance, indicates that there are truly only two independent biological processes represented in these data. The decomposition does not identify what these processes are. The bulk of the karyotypic data vectors can thus be reconstructed with only the data of the first two karyotypic base vectors Û1 and Û2. The correlation of all the karyotypic base vectors with the original karyotypic data vectors is given in Table 2. It is evident that Û1 carries the largest correlation with all four variables, each contributing differently to the base vector. On the other hand, Û2 is strongly positively correlated with the structural complexity variable, which was least related to the other three karyotypic variables.
The representation of the data in terms of the karyotypic base vectors encompasses a model of the data that can be used separately to explore to GI50 correlations. The strong similarity of Û1 to the numerical complexity, numerical heterogeneity, and structural heterogeneity karyotypic variables ensures that the same structural motif groups in Table 3 are retrieved using the karyotypic base vectors, albeit with different correlation strengths. Further analysis of the correlations among karyotypic base vectors with compound-induced growth inhibition patterns represented by the structural motifs in Table 3 confirms that these agents are mostly targeted toward the first and most important karyotypic base vector. Motifs E, J, and L are the only ones that seem to be targeting both the first and the second karyotypic base vectors. The other classes are to a greater or lesser degree correlated with Û2.
| Discussion |
|---|
|
|
|---|
To this aim, we have used SOMs (16) to investigate global trends in the NCI-60 growth inhibition data (15). A SOM attempts to describe the multidimensional space of growth inhibition patterns from all the screened compounds by assigning representative cluster vectors to describe this space. In essence, the algorithm is done via an iterative process designed to organize individual data vectors into clusters, where a single vector represents each cluster's members. The cluster vectors are then placed on a two-dimensional grid, organized to locate the most similar cluster vectors as nearest neighbors. The practical effect of this algorithm is first to cluster similar data vectors and second to display these results such that the most similar cluster vectors are close in this space (i.e., to provide a global perspective of the complete data set). The visualization of these results is conveniently done via a two-dimensional map that represents a significant reduction in dimensionality from the initial 60-dimensional space. An earlier analysis of the anticancer agents in this data set found that certain regions on the SOM could be associated with putative biological mechanisms of growth inhibition. In particular, regions on the SOM were delineated that account for agents described previously in the literature as active against DNA synthesis, mitosis, membranes, xenobiotic metabolism, etc. (19, 33). In addition to cataloging compounds according to a mechanism of action, the results revealed an inherent interconnectedness between various cellular processes and specific growth inhibition patterns.
We investigated the raw karyotypic variable data as well as linear combinations as needed to extract the most representative data set that could be associated with chromosomal aberrations. This strategy revealed many compound classes associated with cellular growth inhibition that have not been identified previously as potential effectors of the karyotype. The observation that two of the structural motifs, cytochalasins and 2-substituted mercapto-3H-quinazolines, are already known to act on mitotic function lends credence to this data mining effort. Some of the unexplored chemical motifs identified have been associated previously with a variety of cellular process, including signal transduction, drug efflux, and DNA maintenance that only circumstantially can be linked to the karyotype of a cell.
It is of some interest that the karyotypic variable whose relationships are the most unrelated to any of the others is structural complexity. It is tempting to conjecture that this might be expected given that, of all the variables, it is most difficult to imagine how a cell might detect the fact of an established chromosomal reconfiguration, which is what structural complexity measures. Excess or decreased numbers of chromosomes might be appreciated by a spindle or kinetochore sensor. Ongoing chromosomal gain or loss (numerical heterogeneity) or ongoing chromosomal breakage and rejoining (structural heterogeneity) might similarly be recognized by checkpoint or DNA repair mechanisms, but how a cell would recognize a reconfigured chromosome that contains a single centromere of one or another of the chromosomes involved in the reconfiguration is less easy to hypothesize given current knowledge about cellular function.
In summary, the set of drugs that have been identified via our karyotype/drug correlation analysis provides a set of lead compounds for further study and draws attention to several regions of the SOM. A striking correlation pattern indicates that the karyotypic observables are often correlated with a relatively unexplored region on the SOM. The SOM now provides the identity of compounds that share these growth inhibition patterns but until now have not been recognized in the literature as having an association with the karyotypic state of a cell. This provides several compounds that can be hypothesized to act, directly or indirectly, in a manner relevant to that state. Elucidation of the effect of these drugs is proposed for future assays using, for example, an interface with gene expression array analysis or, for a smaller set of representative compounds, investigation in the yeast haploid deletion system. If such screens identify genes or pathways implicated via the karyotype/drug correlations, we are in a position to use these discoveries to provide a novel set of cancer-relevant targets.
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government.
4 Supplementary material for this article is available at Molecular Cancer Therapeutics Online (http://mct.aacrjournals.org). ![]()
Received 7/ 5/05; accepted 8/10/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. G. Covell, R. Huang, and A. Wallqvist Anticancer medicines in development: assessment of bioactivity profiles within the National Cancer Institute anticancer screening data Mol. Cancer Ther., August 1, 2007; 6(8): 2261 - 2270. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. E. Blower, J. S. Verducci, S. Lin, J. Zhou, J.-H. Chung, Z. Dai, C.-G. Liu, W. Reinhold, P. L. Lorenzi, E. P. Kaldjian, et al. MicroRNA expression profiles for the NCI-60 cancer cell panel Mol. Cancer Ther., May 1, 2007; 6(5): 1483 - 1491. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |