|
Home
Papers
new
by category
by month
by journal
search
Journals
by date
by name
all
at UCSD libraries
Seminars
Conferences
by date
by category
Other
groups
Search
PubMed
ISI
UCSD library
Google
Citeseer
dblb
About
|
|
Search results
|
|
6 papers selected
|
| Categories: |
Functional genomics |
2003-04-18 [
Get PubMed ]
|
| Title: |
A classification-based machine learning approach for the analysis of genome-wide expression data |
| Authors: |
Lyons-Weiler J, Patel S, Bhattacharya S |
| Ref: |
Genome Res 2003 Mar;13(3):503-12 |
| Abstract: |
Three important areas of data analysis for global gene expression analysis are class discovery, class prediction, and finding dysregulated genes (biomarkers). The clinical application of microarray data will require marker genes whose expression patterns are sufficiently well understood to allow accurate predictions on disease subclass membership. Commonly used methods of analysis include hierarchical clustering algorithms, t-, F-, and Z-tests, and machine learning approaches. We describe an approach called the maximum difference subset (MDSS) algorithm that combines classification algorithms, classical statistics, and elements of machine learning and provides a coherent framework. By integrating prediction accuracy, the MDSS algorithm learns the critical threshold of statistical significance (the alpha or P-value), eliminating the arbitrariness of setting a threshold of statistical significance and minimizing the effect of the normality assumptions. To reduce the false positive rate and to increase external validity of the predictive gene set, a jackknife step is used. This step identifies and removes genes in the initial MDSS with low combined predictive utility. The overall MDSS provides a prediction that is less dependent on an arbitrary study design (sample inclusion or exclusion) and should thus have high external validity. We demonstrate that this approach, unlike other published methods, identifies biomarkers capable of predicting the outcome of anthracycline-cytarabine chemotherapy in cases of acute myeloid leukemia. By incorporating two criteria-statistical significance and predictive utility-the approach learns the significance level relevant for a given data set. The MDSS approach can be used with any test and classifier operator pair. |
|
| Categories: |
Sequence based |
2002-06-04 [
Get PubMed ]
|
| Title: |
Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine
learning in functional genomics |
| Authors: |
Lin K, Kuang Y, Joseph JS, Kolatkar PR |
| Ref: |
Nucleic Acids Res 2002 Jun 1;30(11):2599-607 |
| Abstract: |
Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on
inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences
with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition
signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from
all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an
implementation of support vector machines, SVM(light). Analysis of these codon composition signals is instructive in determining
features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small
hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these
signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an
obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a
non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein
classification by supervised machine learning algorithms. |
|
| Categories: |
Misc |
2002-02-28 [
Get PubMed ]
|
| Title: |
Machine learning of functional class from phenotype data |
| Authors: |
Clare A, King RD |
| Ref: |
Bioinformatics 2002 Jan;18(1):160-6 |
| Abstract: |
MOTIVATION: Mutant phenotype growth experiments are an important novel source of functional genomics data which have received
little attention in bioinformatics. We applied supervised machine learning to the problem of using phenotype data to predict the functional
class of Open Reading Frames (ORFs) in Saccaromyces cerevisiae. Three sources of data were used: TRansposon-Insertion
Phenotypes, Localization and Expression in Saccharomyces (TRIPLES), European Functional Analysis Network (EUROFAN) and
Munich Information Center for Protein Sequences (MIPS). The analysis of the data presented a number of challenges to machine
learning: multi-class labels, a large number of sparsely populated classes, the need to learn a set of accurate rules (not a complete
classification), and a very large amount of missing values. We modified the algorithm C4.5 to deal with these problems. RESULTS: Rules
were learnt which are accurate and biologically meaningful. The rules predict function of 83 ORFs of unknown function at an estimated
accuracy of > or = 80%. |
|
| Categories: |
Functional genomics |
2002-01-07 [
Get PubMed ]
|
| Title: |
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and
supervised machine learning |
| Authors: |
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS,
Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR |
| Ref: |
Nat Med 2002 Jan;8(1):68-74 |
| Abstract: |
Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid
malignancy in adults, is curable in less than 50% of patients. Prognostic
models based on pre-treatment characteristics, such as the International
Prognostic Index (IPI), are currently used to predict outcome in DLBCL.
However, clinical outcome models identify neither the molecular basis of
clinical heterogeneity, nor specific therapeutic targets. We analyzed the
expression of 6,817 genes in diagnostic tumor specimens from DLBCL
patients who received cyclophosphamide, adriamycin, vincristine and
prednisone (CHOP)-based chemotherapy, and applied a supervised
learning prediction method to identify cured versus fatal or refractory
disease. The algorithm classified two categories of patients with very
different five-year overall survival rates (70% versus 12%). The model
also effectively delineated patients within specific IPI risk categories who
were likely to be cured or to die of their disease. Genes implicated in
DLBCL outcome included some that regulate responses to
B-cell-receptor signaling, critical serine/threonine phosphorylation
pathways and apoptosis. Our data indicate that supervised learning
classification techniques can predict outcome in DLBCL and identify
rational targets for intervention. |
|
| Categories: |
Sequence based |
2001-12-03 [
Get PubMed ]
|
| Title: |
Computational identification of promoters and first exons in the human genome |
| Authors: |
Ramana V. Davuluri, Ivo Grosse, Michael Q. Zhang |
| Ref: |
Nat Genet volume 29 no. 4 pp 412 - 417 (2001) |
| Abstract: |
The identification of promoters and first exons has been one of the most
difficult problems in gene-finding. We present a set of discriminant
functions that can recognize structural and compositional features such
as CpG islands, promoter regions and first splice-donor sites. We explain
the implementation of the discriminant functions into a decision tree that
constitutes a new program called FirstEF. By using different models to
predict CpG-related and non-CpG-related first exons, we showed by
cross-validation that the program could predict 86% of the first exons
with 17% false positives. We also demonstrated the prediction accuracy
of FirstEF at the genome level by applying it to the finished sequences of
human chromosomes 21 and 22 as well as by comparing the predictions
with the locations of the experimentally verified first exons. Finally, we
present the analysis of the predicted first exons for all of the 24
chromosomes of the human genome. |
| Comment: |
A nice machine learning approach that appears to be quite
a bit more sensitive than exisiting methods. |
|
| Categories: |
Functional genomics - Databases |
2001-09-14 [
Get PubMed ]
|
| Title: |
Machine Learning for Science: State of the Art and Future Prospects |
| Authors: |
Eric Mjolsness, Dennis DeCoste |
| Ref: |
Science Volume 293, Number 5537, Issue of 14 Sep 2001, pp. 2051-2055. |
| Abstract: |
Recent advances in machine learning methods, along with successful applications across a wide variety of fields such as
planetary science and bioinformatics, promise powerful new tools for practicing scientists. This viewpoint highlights some
useful characteristics of modern machine learning methods and their relevance to scientific applications. We conclude
with some speculations on near-term progress and promising directions.
|
|
|