WO2007139777A2 - Methods for the diagnosis and prognosis of alzheimer's disease using csf protein profiling - Google Patents

Methods for the diagnosis and prognosis of alzheimer's disease using csf protein profiling Download PDF

Info

Publication number
WO2007139777A2
WO2007139777A2 PCT/US2007/012155 US2007012155W WO2007139777A2 WO 2007139777 A2 WO2007139777 A2 WO 2007139777A2 US 2007012155 W US2007012155 W US 2007012155W WO 2007139777 A2 WO2007139777 A2 WO 2007139777A2
Authority
WO
WIPO (PCT)
Prior art keywords
features
group
disease
samples
csf
Prior art date
Application number
PCT/US2007/012155
Other languages
French (fr)
Other versions
WO2007139777A3 (en
Inventor
Ronald C. Hendrickson
Jeffrey L. Seeburger
Matthew Wiener
Nathan A. Yates
Qinghua Song
Andy Liaw
Original Assignee
Merck & Co., Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merck & Co., Inc. filed Critical Merck & Co., Inc.
Publication of WO2007139777A2 publication Critical patent/WO2007139777A2/en
Publication of WO2007139777A3 publication Critical patent/WO2007139777A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Signal Processing (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention provides a novel and sensitive means of monitoring Alzheimer’s disease. The method consists of the construction of a statistically relevant classifier comprising a plurality of features, through the use of differential mass spectrometry of individual biomarkers to more accurately and objectively assess the status of an individual for the purposes of disease classification and predicting cognitive endpoints.

Description

TITLE OF THE INVENTION
METHODS FOR THE DIAGNOSIS AND PROGNOSIS OF ALZHEIMER'S DISEASE
USING CSF PROTEIN PROFILING
FIELD OF THE INVENTION
The present invention relates generally to the diagnosis and prognosis in the field of Alzheimer's disease. More specifically, it relates to biomarkers that can be used to classify Alzheimer's disease or to determine the efficacy of drugs given to treat Alzheimer's disease.
BACKGROUND OF THE INVENTION
Alzheimer's disease (AD) is a major neurodegenerative disease of unknown etiology that is characterized by the selective degeneration of basal forebrain cholinergic neurons. The degeneration of these cells leads to a secondary loss of neurons in the limbic system and cortex that control learning and memory. The consequent symptoms of the disease include a progressive loss of memory, the loss of the ability to communicate and the loss of other cognitive functions which occur over a course of approximately eight years. At some stage in this cognitive decline, patients often become bedridden and completely unable to care for themselves. Although several symptomatic therapies have been approved to provide some compensation for the cholinergic deficit, for example, Aricept (donepezil), the clinical effects of these are modest and none are able to significantly alter the course of the disease. Improving upon strategies for the treatment of AD has become a focus for the medical and scientific communities due to increases in the average age of the world population, the consequent increase in incidence and prevalence of age-related disorders such as AD, and the severe socioeconomic impact associated with supporting such cσgnitively impaired patients over the long term. Requisite to improving the treatment of AD is improving the ability of clinicians to accurately diagnose the disease early in its course and to accurately monitor the progression of the disease. Currently, a diagnosis of possible or probable AD is typically made based on clinical symptoms. A definitive diagnosis of AD can only be made post-mortem and requires a pathological examination of the affected brain tissue. The key pathological hallmarks of the disease are plaques consisting of deposited amyloid beta (A/8) protein and tangles consisting of degenerated neuronal cells and their cytoskeletal elements (neurofibrillary tangles). There are currently no tests that, in and among themselves, have been validated to identify AD and differentiate it from other diseases affecting cognition. Compared to the pathological diagnosis, the pre-mortem clinical diagnosis can achieve an accuracy of approximately 80% to 90% at the very best of centers. However, this level of diagnostic accuracy more commonly occurs at well- experienced AD centers and for patients who have been manifesting clinical symptoms for several years (Rasmusson, D. X., et al., Alzheimer Pis. Assoc. Disord., 10(4): 180-188 (1996); Frank, R.A. et al., Proceedings of the Biological Markers Working Group: NIA Initative on Neuroimaging in Alzheimer's Disease, Neurobiol. Aging, 24: 521-536 (2003V). Following the clinical diagnosis, the progression of the disease is typically monitored through cognitive testing and assessment of everyday function. The course is often variable across patients and may be influenced by both organic and environmental elements.
The last decade has seen an increase in efforts to identify and validate AD-related biomarkers that might increase the sensitivity and specificity of diagnosis and provide a convenient and objective measure of disease progression (Regan Research Institute and NIA Consensus Report of the Working Group on: 'Molecular and Biochemical Markers of Alzheimer's Disease,' NeurobioL Aging. 19(2): 109-116 (1998); Frank et al, 2003). Among the techniques that currently hold promise in this regard is the biochemical analysis of cerebrospinal fluid (CSF). The value of CSF analysis is based on the fact that the composition of this fluid may reflect brain biochemistry due to its direct contact with brain tissue.
The CSF proteins that have received the most attention are those thought to reflect key features of the disease pathogenesis, including Aβ deposition and neuronal degeneration. Studies have demonstrated reduced levels of the Aβ42 peptide in the CSF of clinically diagnosed AD patients compared to controls (Andreasen, N., et al., Arch. Neurol.. 58: 373-379 (2001); NIA Consensus Report, 1998; Frank et al, 2003, Andreasen, N., et al., Clin. Neurol. Neurosurg... 107: 165-173 (2005)). Aβ42 is a cleavage product of the amyloid precursor protein (APP) and is thought to be a major constituent of the senile plaque. One theory of disease progression is that reduced CSF levels in AD patients may be due to increased deposition of the peptide in the brain. In contrast, many studies have shown that the expression of the AB40 peptide, another APP cleavage product that is also a plaque component, may be similar in clinically diagnosed AD and control CSF (Frank et al, 2003).
Biomarkers can be used for multiple purposes including as diagnostic markers to classify and identify patients, as drug response markers to confirm target engagement, as disease predictive markers to predict who is likely to develop the disease and as disease progression markers to reflect the progression of the pathophysiology. A recent review article describes not only the status of biochemical biomarkers but also imaging biomarkers and their use in longitudinal clinical trials (Thai, L. J., et al, Alzheimer Pis. Assoc. Disord.. 20(1): 6-15 (2006)). Mass spectrometry is capable of detecting large numbers of analytes in complex mixtures. A wide range of different analytes can be detected including those of environmental and biological importance. Peptides are an example of biologically important analytes. Peptides and peptides derived from proteins, interact in complex ways to regulate cellular functions. Small changes in the abundance of particular proteins or their modifications can significantly alter the functioning of a cell, can impact on the overall health of an animal and can provide an indication as to the health of a cell or animal. Proteomic studies measuring peptide expression are increasingly making use of mass spectrometry, Smith, Richard D., Trends in Biotechnology. Vol. 20, No. 12 (SuppL), S3-S7 (2002).
SUMMARY OF THE INVENTION The present invention relates to compositions and methods for classifying disease states in Alzheimer's disease using mass spectrometry data analysis techniques that can be employed to selectively identify analytes differing in abundance between different sample sets. The sample sets comprise CSF samples from ante-mortem confirmed cases of Alzheimer's disease (AD) and normal or non-AD controls used to build a classifier that can be used to analyze additional unknown samples.
In one embodiment the invention comprises a method for classifying disease states in Alzheimer's disease ("AD") comprising a) selecting a statistically relevant set of mass spectrometric features from human ante- mortem and healthy control fluid samples in which a plurality of features are differentially expressed to form a reference AD and control panel; b)conducting a linear discriminate analysis on the mass spectrometric feature data from step (a); c) obtaining a test fluid sample from a patient; d) gathering mass spectrometric data on the test sample including the features in the set of step (a); e) applying the results of step (d) to the linear discriminate analysis of step (b) to obtain an output; and f) determining from the output of step (e) the classification of the disease state, where the output is either AD or control, hi another aspect of this embodiment, the set of mass spectrometric features is a plurality of features selected from the group of features listed in Table 4. In a more preferred aspect, the set of features comprises features selected from the group consisting of group 2 and group 51 of Table 4.
Another embodiment of the invention comprises a method for predicting cognition scores for Alzheimer's disease ("AD") patients comprising a) selecting a statistically relevant set of mass spectrometric features from human ante- mortem and healthy control fluid samples in which a plurality of features are differentially expressed to form a reference AD and control panel; b) conducting a random forest analysis on the multi-analyte data from step (a); c) obtaining a test fluid sample from a patient; d) gathering mass spectrometric data on the test sample including the features in the set of step (a); e) applying the results of step (d) to the random forest analysis of step (b) to obtain an output; and f) determining from the output of step (e), where the output is the assignment of the cognition score. In another aspect of this embodiment, the set of mass spectrometric features is a plurality of features selected from the group of features listed in Table 4. In a more preferred aspect, the set of features comprises biomarkers selected from the group consisting of the summed features from group 2, group 3, group 7, group 18, group 19, group 24, group 28, group 36, group 37, group 42, group 51 and group 52.
In still another embodiment the invention comprises disease status biomarkers selected from the group consisting of SMEl (group 2) and SME2 (group 51).
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a graphical depiction of the validation of FTMS and dMS platform using human CSF. The coefficient of variance (CV) was determined for 7,478 molecular ion features binned at 10 unit intervals from five technical replicate experiments. The average and median CV were 36.3% and 31.5%, respectively.
Figure 2 shows the estimate of false positive rate and robustness of false positive estimate by receiver-operator-curve (ROC curve) analysis.
Figure 3 shows a subset of differentially expressed peptides identified by proteomic profiling of the OPTIMA pilot human CSF samples. The first column of each pair represents the control, while the second column represents the AD sample.
Figures 4A and 4B show a linear discriminant classification of AD versus control samples (done on two samples each) based on two groups of MS features ("+" : control; "•" : AD). Figure 4 A shows the separation of two MS features; Figure 4B shows the separation of Aβ40 and Aβ42 done by ELISA.
Figures 5A and 5B show the cross-validated error rates verus the number of features for random forest classifiers. Figure 5A shows the cross-validated error rates verus number of features for random forest classifiers built from the irnmunodepletion data from the Blennow cross-sectional cohort. Figure 5B shows the cross-validated error rates versus number of features for random forest classifiers built from the combined immunodepletion and ultrafiltration data from the Blennow cross-sectional cohort.
Figures 6 A and 6B show the estimated trends over time for two Subject Matter Expert ("SME") features in the longitudinal data: SMEl (Fig. 6A) and SME2 (Fig. 6B). In each figure, the left panel shows the AD group as compared to the control group in the right panel. The lines represent the "population average" trends estimated from linear mixed effects models. DETAILED DESCRIPTION OF THE INVENTION Definitions
As used herein, the term "classifier" refers to a computational method that takes as input features and yields as output a state marker. Such a state marker may be binary (for example, "Alzheimer's disease" or "control"), have multiple discrete states (for example, the integers from 1 to 100 inclusive), or be drawn from a continuous range (for example, any number between 0 and 1).
As used herein, the term "feature" refers to the mass spectrometric signal characterized by a particular mass-to-charge ratio (m/z) or a range of such ratios and a range of retention times from a liquid chromatography column. Features may be grouped together based on evidence that they arise from a single analyte (as multiple isotopes and/or charge states). The ion intensity of a particular feature in a particular sample can be measured by a mass spectrometer. The ion intensity of a group is the sum of ion intensities for the features making up the group.
As used herein, "features" or "groups of features" maybe determined from proteins, peptides, metabolites, lipids, or other analytes in a sample being analyzed. "Features" or "groups of features" are used to define the panel of analytes that distinguish, on a statistical basis, a sample from an AD patient from that of a non-AD, control sample or to measure disease progression.
As used herein, the term "feature extraction algorithm" is a computational tool that measures the AUC of "features" present in one or more mass spectrometric files. Two examples of feature extraction algorithms used in this work include dMS (U.S. Patent 6,906,320) and PeakTeller (PCT/US06/044166 and U.S.S.N. 11/599,185 filed on November 13, 2006).
As used herein, the term "attribute" refers to a biological, chemical or physical property of a feature that is quantifiable, such as mass to charge ratio (m/z), charge state, elution time, or ion intensity.
As used herein, the term "immunodepletion" refers to a biochemical process used to isolate a sub-population of peptides, proteins, or metabolites from a biological fluid. A example of a suitable immunodepletion method for use herein is describe in Example 1.
As used herein, the term "ultrafiltration" refers to a biochemical process used to isolate a sub-population of peptides, proteins or metabolites from a biological fluid. Ultrafiltration was performed by Digilab-BioVisioN, Hannover, Germany under a master fee-for- service agreement as described in Example 6. As used herein, the term "LC-MS" refers to the tandem use of liquid chromatography ("LC") and mass spectrometyr ("MS") for detecting and quantifying one or more peptides or molecules from a biological sample for potential use as a biomarker.
As used herein, the term "profiling", "LC-MS profiling" or "protein profiling" refers to the measurement of analyte abundance in one or more samples.
As used herein, the term "AUC values" refers to area under the curve or other measure of the abundance of a feature.
As used herein, the term "sensitivity" refers to the ability of an individual marker or a composite of markers to correctly identify patients with the disease, i.e. Alzheimer's disease, which is the probability that the test is positive for a patient with the disease. The current clinical criterion on patients having a probability of AD is about 85% sensitive when compared to autopsy confirmed cases.
As used herein, the term "specificity" refers to the ability of an individual marker or a composite of markers to correctly identify patients that do not have the disease, that is, the probability that the test is negative for a patient without disease. The current clinical criterion is that the test should be 75% specific.
As used herein, the term "accuracy" refers to the overall ability of an individual marker or a composite of markers to correctly identify those patients with the disease and those without the disease. As used herein, the term "estimated effect of AD" refers to the estimated percentage change in a feature per year in the disease population. The current standard for dementia is a decrease of about 6% per year.
As used herein, the term "CERAD" refers to the Consortium to Establish a Registry for Alzheimer's Disease used in the neuropathological community. As used herein, the term "CSF" refers to cerebrospinal fluid.
The invention described herein relates to the use of proteomics to profile human CSF and to identify human CSF markers for Alzheimers Disease (AD). Applicants have obtained access to a well annotated longitudinal cohort of human CSF samples, OPTIMA (Oxford Project to Investigate Memory and Ageing). In the study Applicants used archived OPTIMA CSF samples from clinically diagnosed AD patients and age-matched non-demented controls.
The object of the study herein was to determine if unbiased proteomic profiling could identify protein markers that discriminate the samples from the ante-mortem diagnosed AD patients versus those from non-AD/non-demented (control) patients. After biochemical processing and proteomic profiling of the CSF samples on a fourier transform mass spectrometer (Finnigan LTQ-FT™, Thermo Electron Corporation, Waltham, MA) and differential mass spectrometry (dMS) platform, supervised differential mass spectrometry analysis (dMS) was performed to reveal approximately 250 distinct peptide signals that exhibit a statistically significant difference in abundance between AD and controls. Fifteen (15) peptide signals are increased in AD and 235 peptide signals are decreased in AD as compared to the control group. A selected set of the 250 peptides was targeted for protein identification. Proteins identified include prostaglandin-h2D isomerase, amyloid-like protein 1 precursor and ApoE. A classifier using twelve proteomics markers was created that distinguishes between the AD and control samples with high accuracy in cross validation tests. To date, nine of the twelve peptides in this classifier have been identified by tandem mass spectrometric analysis, and these are derived from four proteins, including, neurosecretory protein VGF (5 peptides), secretograninin-1 (2 peptides), neuronal pentraxin receptor (1 peptide), and cadherin-13 (1 peptide).
Use of proteomics to determine biomarkers
The use of proteomics, that is the systematic analysis of all of the proteins in a tissue or cell, to find biomarkers is based upon an assumption that protein markers of disease exist in plasma of cerebrospinal fluid ("CSF"). There are numerous difficulties associated with using proteomics in plasma and CSF including, but not limited to, the need for a highly sensitive method for protein detection, the high complexity of proteins in plasma, the dilution of proteins in extracellular fluids, that proteins may be present in differing concentrations and the difficulty with sample processing, such as protease activity, post translational modifications and the like.
There are numerous approaches to conducting proteomics using mass spectrometry anlaysis of complex peptide mixtures. Ih one approach, commonly known as shotgun proteomics, the digested samples are extensively fractionated and analyzed using a tandem mass spectrometry method (MS/MS), which results in an unselected cataloging of molecules. Other more targeted or differential methods focus on the identification and quantitation of a pre-selected set of proteins or focus on molecules which significantly differ between conditions in the broad analysis of proteome (pattern identification, definitive identification, quantitiation of molecules).
A pattern analysis approach typically uses two dimensions to describe a peptide ion, such as chromatographic elution time and mass. The advantage of such an approach is that all the features that are detected by mass spectrometry can be quantified. Conversely, the shotgun approach only quantifies identified peptides. To have any approach be practical it is essential that there be reproducible sample preparation methods, such as for sample collection, fractionation, and digestion, and reproducible liquid chromatograph-mass spectrometry (LC-MS) analysis, which requires the use of interwoven or random analysis order, sample injection, chromatographic separation and a mass spectrometer response. For review, see, Domon and Aebersol, Science. 312: 212-217, 2006. The differential mass spectrometry (dMS) methods used herein have been previously described, Wiener et al., Anal. Chem. 76 (20): 6085-6095, 2004 and U.S. Pat. No. 6,906,320. The key aspects of the dMS methodology that make it adaptable for use in defining biomarkers is that it maintains individual sample integrity (no need to mix or pool samples), it supports versatile experimental designs (time, dose, compound, treatment), it is sensitive and allows specific detection of protein differences, prior protein identification is not required and it is compatible with high and low resolution LC-MS data. dMS evalutates the mean and variance for each m/z, retention time across all samples in two or more treatment groups: 1) uses statistical test to determine the significance of the intensity differences between treatment groups, 2) keeps the significant point and looks for a persistence in time, and 3) generates a ranked list of (m/z, time) pairs that are statistically different across the two groups.
The use of differential mass spectrometry methods have been used to identify putative markers of Alzheimer's disease. Damian et al., WO 2006/099377, describes methods for screening individuals at risk of having or developing Alzheimer's disease using markers identified from urine using LC-MS analysis. Fourteen markers are described having characteristic liquid chromatography retention times and mass-to-charge ratios. These markers are endogenous metabolites extracted from urine samples from AD, MCI, and control patents. Selle et al., Combinatorial Chemistry & High Throughput Screening, 8: 801-806, 2005, describe methods to identify novel biomarker candidates by differential peptidomics. In their study, Using MALDI mass spectrometry, two candidate proteins, VGF and complement C3, were identified from CSF samples taken from purported AD and non-dementia subjects. In neither Damian nor Selle was the diagnosis of AD confirmed by ante-mortem tissues analysis.
Sample selection
Recently Applicants established a collaboration with OPTIMA, the Oxford Project to Investigate Memory and Ageing, to obtain access to a well annotated set of clinical
CSF samples. OPTIMA is a longitudinal cohort of human volunteers who participated in annual assessment of their memory and cognitive status and who also underwent a series of neuropsychological, radiological (CT and SPECT scans) and various biochemical tests on their blood at regular intervals. CSF samples were obtained from a subset of patients who consented specifically for this procedure. After death, an autopsy was performed and the brains were examined by a neuropathologist for clinical diagnosis. To date, the autopsy rate is 94%. Through this collaboration Applicants have obtained access to a limited volume of cerebrospinal fluid (1-3 ml) collected at each assessment with associated clinical and radiological data in approximately 100 AD cases and a similar number of controls.
CSF specimens from twenty subjects from the OPTIMA cohort, ten with clinically diagnosed AD according to the CERAD neurological criteria and ten age-matched controls, were chosen for proteomics analysis see (Table 1). Pathological data was available for all AD patients and control subjects. In the AD group, all ten subjects were considered to have CERAD definite AD by neuropathology. Although these samples were selected based on clinical and pathology, Applicants chose to perform the experiments on less valuable CSF sample, from which they do not have serial samples and volume of CSF greater than 3mL. This restriction criteria lead to a gender bias between the control and AD groups, but in that gender plays a minimal role in the incidence and progression of AD this was not considered significant. The demographic characteristics of the study population, at the time of CSF collection, are listed in the Table 1.
Table 1
Control AD
N = IO N = IO
Age (Years)
Mean ± SD 63 ±15 75±8 Range 36-80 57-90
Gender
Male 8 2 Female 2 8
Family History
Family History 2 6 No Family History 3 3 No Response 5 1
Cognitive status (MMSE)
Mean ± SD 30±0 3±3 Range 30-30 0-9
Applicants set out to validate a proteomics profiling capability with human CSF samples. Given that the most abundant proteins in CSF (albumin, transferrin, and haptoglobin) are also highly abundant in serum, Applicants approach was to remove these highly abundant proteins from the CSF samples to enrich for "brain-specific" proteins while reducing the overall complexity of the CSF proteome. To accomplish this goal Applicants chose to use an immunodepletion spin column (Agilent Technologies, Palo Alto, CA). These columns consist of affinity purified polyclonal antibodies that deplete the five highest abundant proteins in serum (albumin, transferring, haptoglobin, α-1 antitrypsin, IgA, and IgG), thus facilitating the rapid and simultaneous removal of about 83% of all serum protein components present in CSF prior to proteomics analysis.
Proteomic analysis
To investigate the overall reproducibility of the platform, Applicants analyzed the coefficient of variance (CV) of 7,478 molecular features identified and quantitated in five technical replicates of 400 μL immuno-depleted human CSF. The corresponding distribution of CVs is shown in Figure 1. The median and average CV were 31.5% and 36.3%, respectively indicating good reproducibility. Applicants then continued to investigate the limits of detection and reproducibility of a single analyte that is processed through the entire immuno-depletion process. For this study a spike-recovery experiment was performed using horse myoglobin as it has no identity with the proteins of human CSF. Horse myoglobin was spiked into human CSF at a concentration of 3 μg/ml. All samples were processed as described in the Examples that follow. A dMS analysis between spiked (3 μg/ml) and non-spiked CSF yielded 54 differences at p=0.05. The false positive rate was estimated by performing a differential analysis between self versus self (unspiked CSF versus unspiked CSF) and yielded only 6 feature differences. The limits of detection for this experiment have been estimated at 1 μg/ml based on this study and previous studies on Rhesus CSF by Applicants (data not shown) although a precise titration with horse myoglobin has not as yet been carried out.
Although the OPTIMA samples are archived samples and hence there was no opportunity to modify the collection protocol, Applicants set out to determine the amount of plasma contamination and red blood cell contamination due to hemolysis by performing ELISA measurement of alpha-2 macroglobin, IgM, and hemoglobin. This information allowed Applicants to perform a retrospective analysis to look for samples that may be outlyers and to look for any systematic bias that may be present in the archived samples due to uncontrolled pre- analytical variables. Alpha 2-macroglobin and IgM are major proteins known to be present in plasma that are not removed by use of an immunodepletion column and high-levels of each was evidence of plasma contamination. Hemoglobin is the major protein in red blood cell and high levels of hemoglobin was indicative of hememolysis that occurred prior to removal of any unwanted cells. A 150 μl aliquot of archived CSF from each sample was thawed on ice and the three commercially available ELISA assays were performed. Each assay was performed in duplicate with 15 μl for total protein measurement (BCA), 50 μl for IgM, 25 μl for hemoglobin, 25 μl for alpha-2 microglobulin. Results of these assays are shown in Table 2. Hemoglobin data are expressed as the mean value of absorbance at 450 nm (OD450) and samples with an "*" exceed 10 RBC/μl.
Table 2
Hemoglobin
A2M IgM Total protein
Sample# (OD450 (μg/ml) (ng/ml) (μg/ml) Value)
OPTIMA l 0.48 3.73 257.48 1023.10
OPTIMA 2 3.67* 2.60 91.03 697.71
OPTIMA 3 0.19 2.83 142.60 881.48
OPTIMA 4 0.00 2.98 53.31 640.39
OPTIMA 5 3.91* 3.11 195.49 926.25
OPTIMA 6 3.91 * 3.88 231.25 1074.16
OPTIMA 7 0.06 2.22 108.92 767.08
OPTIMA 8 0.00 4.11 195.02 915.44
OPTIMA 9 3.76* 3.96 751.72 828.58
OPTIMA 10 0.03 2.59 226.49 659.95
OPTIMA 11 0.35 2.72 46.88 795.23
OPTIMA 12 0.00 2.83 278.98 937.23
OPTIMA 13 1.47* 3.24 169.64 776.81
OPTIMA 14 3.91* 5.06 1725.22 879.74
OPTIMA 15 0.29 2.55 117.57 936.88
OPTIMA 16 3.12* 2.08 222.24 437.24
OPTIMA 17 1.16* 4.13 169.13 1033.94
OPTIMA 18 2.32* 3.53 168.98 1042.34
OPTIMA 19 0.03 2.14 133.27 859.20
OPTIMA 20 0.86 3.97 307.53 842.15
In the present invention, to estimate the number of false positives that might be included in the detected differences, Applicants randomly divided the data into two groups of 10, with 5 control and 5 disease samples in each group, and performed a similar calculation. For a given confidence threshold (similar to a p- value), Applicants determined the number of differences in the original comparison and the estimated number of false positive results (from the second comparison) at that confidence threshold. Plotting the number of differences found in the original analysis against the estimated number of false positives for different values of the confidence threshold traces out an ROC curve. Perfect experiments on identical samples would never yield any false positives, yielding an ROC curve going straight up the y axis. Real experiments on real samples (even those believed to be substantially identical) will yield some differences; the quality of results can be judged, in part, by how nearly vertical the resulting ROC curves are. To examine the robustness of the error measurement, three random splits of the data were created, giving 3 ROC curves (Figure 2), each shown using a different symbol (triangle, plus sign, and circle). The fact that all three resulting ROC curves are nearly vertical shows that the dMS analysis found many results that are readily distinguishable from possible false positives. See, Zweig and Campbell, "ROC plots: a fundamental evaluation tool in clinical medicine," Clin. Chem.. 39 (4): 561-577, 1993. After biochemical processing and proteomics profiling on an LTQ-FTMS and dMS platform, dMS analysis was performed to reveal approximately 250 peptides quantitatively different between the AD and control samples: 15 peptides which were increased in the AD samples and 235 peptides which were decreased in the AD samples as compared to the control group. A selected subset of these 250 peptides was targeted for protein identification. Targeted analysis was compared with the original dMS data to align and link the targeted MS/MS spectra with the dMS feature data. After determining the proper alignment, the appropriate MS/MS scans were exported to text files, corrected for mass accuracy, and submitted for Sequest protein database searching (internal database), using an approach that correlates tandem mass spectral data of peptides with known amino acid sequences in a protein database. See, Eng et al., J. Am. Soc. Mass Spectrom., 5:976-989, 1994. All Sequest results were verified manually. A total of thirteen proteins have been identified to date, including amyloid-like protein 1 and Apo E, A subset of differentially expressed peptides identified by proteomic profiling of the optima human CSF samples is shown in Figure 3.
It is contemplated that various alternative methods or protocols can be used to augment the detection of features that are useful for characterizing Alzheimer's disease as described herein. Methods that allow for the reproducible collection and processing of biological specimens may allow for the detection of features from various body fluids or tissue samples. Those skilled in the art would recognize that various methods for processing these samples for use in the methods described herein can produce a range of analyses and determines the value of the m/z and retention time for these analytes. For example, the use of protein and peptide isolation steps can shift the m/z ratios and retention time of the observed feature. In addition, the use of alternative digestive enzymes and/or chemical reaction that modify the chemical composition of the analytes can result in different m/z and retention time values. Chemical separations and methods of chromatographic analysis can alter the m/z and retention time value of an analyte. Various methods of ionization, detection and mass analysis also impact the absolute value of a features m/z and retention time. Examples include Matrix assisted laser desorption ionization, Electrospray ionization, Desorption electrospray ionization, atmospheric pressure chemical ionization, electron impact ionization, chemical ionization, tandem mass spectrometric analysis, time-of-flight mass analysis, ion-mobility analysis, quadrupole mass analysis, and Fourier transform mass analysis. Thus, one skilled in the art would recognize that while variation of the methods utilized to derive the proteomic features that comprise the invention described herein may alter the empirical values obtained, any such features obtained could by used to derive markers that can be use to classify disease states in AD or monitor disease progression as described by the inventive methods herein.
Development of classifier
To further evaluate the ability of the proteomic data to distinguish between AD and control samples, a classifier was built as described in the Examples that follow. Classifiers were based on mass spectrometric features determined to be significantly different between samples from AD and control samples in a single run using all samples. A feature is specified by an m/z and a time range, and for each LC-MS data set, the feature's value is the area under the curve for the feature, that is, the sum of measured intensities for that m/z in that time range. . Some signals are believed to arise as multiple charge states and/or isotopes of a single underlying analyte; these signals are said to form a group. For this analysis, area under curve was added together for all features in a group to create "group features."
One classifier, comprising twelve protein markers, distinguished between AD and control samples with high accuracy in cross validation test in this study (Table 3). To date, nine of the twelve peptides in the classifier have been identified by tandem mass spectrometric analysis. A total of four proteins have been identified that are associated with the twelve peptides, including, neurosecretory protein VGF (5 peptides), secretograninin-1 (2 peptides), neuronal pentraxin receptor (1 peptide), and cadherin-13 (1 peptide). Table 3.
Confusion matrix for resampled proteomics Predicted class test data (MS) AD Control
True Class AD 57 3
Control 0 60
Although mass spectrometry based proteomics does not have the sensitivity of classical ELISA or immunoassay techniques, it does provide an advantage as to an unbiased approach to identify and quantify protein changes without the need for antibody reagents. From the 250 peptides found to quantitatively distinguish AD from non-AD samples, a classifier was developed using the twelve protein markers that distinguished between AD and control samples with high accuracy in a cross validation test in this study. The proteins identified in these studies include Apo E, amyloid-like protein 1 precursor, and neuroendocrine factors, such as VGF and chromogranin. The fact that these proteins have previously been associated with various aspects of Alzheimer's disease demonstrates that the classifer developed herein is reflective of the pathology of Alzheimer's disease and, as such, can serve as a diagnostic and prognostic tool for the development of therapies to treat AD. Table 4 below is a list of the features that Applicant have found to be statistically relevant to distinguish AD from non-AD samples. These features can be used to construct a classifier as set forth below
Table 4
Feature Group # M/Z Charg Retention Retention Intensity Intensity ID e State Start End Time AUC AUC
Time IAD__CSF1 lControl_CSF
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000018_0001
Figure imgf000019_0001
Figure imgf000020_0001
Figure imgf000021_0001
Figure imgf000022_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Construction of a classifier
A classifier uses the values of a set of features from a sample as input and transform them into a prediction as to which group (or class) the sample belong to (or how likely that the sample belong to a group). Such a classifier is usually constructed by a "training" step, where a set of samples with known values of the input features and their group membership is given to an algorithm and the algorithm builds a mathematical relationship between the input features and the group membership by "learning" from the data. Algorithms in common use for building classifiers include, but are not limited to, random forests, support vector machines, neural networks, logistic regression, linear discriminant analysis, and the like. One skilled in the art would understand that all of the features in Table 4 could be used as input to construct a classifier. Alternatively, one can perform some form of feature selection, such as through the use of a selection algorithm, and keep only features that are sufficient to give good predictive performance of a classifier. Exemplary feature selection algorithms include principal components (The Element of Statistical Learning, Hastie T, Tibshirani R, and Friedman J, 2001, Springer), genetic algorithm (Pattern Classification, 2nd Ed., Duda R, Hart P, Stork D, 2001, John Wiley & Sons), and the like. A preferred feature selection procedure, and the one used herein, is the method proposed in Svetnik et al., "Application of Breiman's Random Forest to Modeling Structure- Activity Relationships of Pharmaceutical Molecules," Multiple Classifier Systems, F. RoIi, J. Kittler and T. Wϊndeatt (Eds.), pp.334-343, Springer- Verlag Berlin Heidelberg (2004). Within a 10- fold cross-validation, the feature importance ranking from a random forest classifier was generated, and then a series of random forest classifiers were constructed by successively removing less important features. The cross-validated error rates for the series of classifiers are then examined and the classifier that gives the lowest error rate and that also contains the fewest number of features is considered a preferred classifier. Applicants herein built the final classifier using this minimally sufficient number of features and all samples.
Classification of disease state and disease status
As shown in Example 6, Applicants used a second independent cohort of clinical CSF samples to demonstrate markers of disease state and disease status. The diagnosis for the patients in this cohort were provided by Dr. Kaj Blennow (Sahlgrenska University Hospital, Gδteborg, Sweden) using standard methodology and included a follow up visit approximately two years after the initial visit. Patients were categorized into five diagnostic categories: control (CTL), Alzheimer's (AD), stable mild cognitive impairment (MCI-MCI), mild cognitive impairment converted to Alzheimer's (MCI-AD) and vascular dementia (VaD). To identify markers of disease state and disease status, Applicants processed 400 μl of CSF as described in Example 1 and then performed LC-MS profiling as described in Example 2 to generate the immunodepletion data set. Applicants also processed 1 ml of CSF by ultrafiltration as described in Example 6. From each LC-MS data set Applicants extracted the AUCs for features found to differentiate AD from control CSF via the dMS feature extraction algorithm. In the immunodepletion set, 633 dMS groups were extracted having at least two features among the top 1000 dMS groups. From the ultrafiltration set, 267 groups were extracted having at least two features among the top 1000 dMS groups. From each extracted dMS group, the feature with the minimum mean log likelihood (product of p-values) was chosen as the most representative. Applicants used the selected features (633 for immunodepletion and 267 for ultrafiltration) and built a random forest classifier for the five diagnostic categories: control (CTL), Alzheimer's (AD), stable mild cognitive impairment (MCI-MCI), mild cognitive impairment converted to Alzheimer's (MCI-AD) and vascular dementia (VaD). The random forest classifier generates an importance ranking for the features. The features having importance measures deemed to be above noise are kept and a final classifier built from them. The methods described in Svetnik et al., ibid., were used to determine the number of features to retain in the final random forest classifier. From the 633 features for immunodepletion, 158 features were retained. From the 267 features for the combined immunodepletion and ultrafiltration set, 67 features were retained. Combining the two data sets, 225 features were retained out of the 900. A ten-fold cross-validation was used to evaluate the performance of the classifiers for the 5-class, 127-subject data sets. As shown in Table 5 A, for immunodepletion, the classifier has prediction accuracies of 93.3% (CTL), 93.3% (AD), 79.2% (MCI-MCI), 76.2% (MCI-AD), and 55% (VaD). For the ultrafiltration data (Table 5B), the classifier has prediction accuracies 93.3% (CTL), 96.7% (AD)5 54.2% (MCI-MCl), 42.9% (MCI-AD), and 60% (VaD). The classifier built on the combined data has prediction accuracies 100% (CTL), 96.7% (AD), 83.3% (MCI-MCI), 85.7% (MCI-AD), and 60% (VaD). Figures 5 A and 5B show the cross-validated error rates versus the number of features for the random forest classifier of the immunodepletion data (Fig. 5A) and the combined immunodepletion and ultrafiltration data (Fig. 5B).
Table 5 A shows the cross-validated confusion matrix of a random forest classifier from the cross-sectional Blennow cohort, in which the samples were processed by immunodepletion prior to LC-MS analysis. Each row represents the true diagnosis and each of the first five columns represents the diagnosis predicted by the classifier. The last column shows the prediction accuracy of the classifier for each diagnosis group. Table 5A
Figure imgf000030_0001
Similarly, as shown in Figure 5B the classifier has prediction accuracies of 100% for CTL, 96.7% for AD, 83.3% for MCI-MCI, 85.7% for MCI-AD and 60% for VaD. Table 5B shows the cross-validated confusion matrix of a random forest classifier from the cross-sectional Blennow cohort, based on data from the combined immunodepletion and ultrafiltration analysis. Each row represent the true diagnosis, and each of the first five columns the diagnosis predicted by the classifier. The last column shows the prediction accuracy of the classifier for each diagnosis group.
Table 5B
Figure imgf000030_0002
Disease Progression Markers
As shown in Example 7, Applicants used the identified features as disease status markers in a longitudinal CSF study to determine whether they were a valid measure of disease progression. The intensities (AUCs) of these two features (SME1/SME2) in the longitudinal CSF samples were quantified. A linear mixed effects model is fitted to the log AUC of each feature as described in Example 7. Table 6 shows the result of the mixed model analysis. The results indicate that both features (SMEl and SME2) show a statistically significant decline (- 8.8% to -12.3% change per year) in the AD patients (p-values in the fourth column all smaller than 0.05). The rate of change is statistically significantly different between the AD and control patients (p-values in the fifth column are all smaller than 0.05). Table 6
Figure imgf000031_0001
Identification of Disease Progression Markers
To identify potential markers for use in disease progress, Applicants extracted features from the longitudinal CSF samples as described in Example 8. The features from the longitudinal samples derived from dMS underwent the selection process described in Example 8. Those features that met the selection criteria are shown in Table 7. These features show statistically significant change in magnitude over time (as the disease progresses) for the patients in the AD group, but not for patients in the control group.
Table 7 m/z Estimated
Feature ID Start Time End Time -line P-value P-value for Q-value Q-value foi effect
Group Group
Change of AD Difference Difference
323557175 721.97 24.55 26.05 1.46 -16.30% 1.00E-04 2.00E-04 0.0402 0.0819
323557176 722.31 24.55 26.15 1.48 -16.60% 1.00E-04 l.OOE-04 0.0402 0.0601
323557178 722.66 24.8 25.8 1.52 -38.50% 0 0 0.0000 0.0000
323557179 1081.95 24.8 26.15 1.46 -20% 1.00E-04 0 0.0402 0.0000
323557180 1081.99 24.7 25.75 1.31 -30% 0 0 0.0000 0.0000
323557181 1082.46 24.55 26.2 1.53 -28.10% 0 2.00E-04 0.0000 0.0819
323557182 1082.49 24 26.3 1.42 -28.90% 0 0 0.0000 0.0000
323557183 1082.96 24.8 25.8 1.77 -38.70% 1.00E-04 l.OOE-04 0.0402 0.0601
323557184 1082.98 24.8 26.05 1.38 -44.80% 0 0 0.0000 0.0000
323558021 721.66 24.6 25.95 1.31 -25.70% 0 2.00E-04 0.0000 0.0819
323558627 589.12 19.05 20.95 1.66 -43.80% 0 0 0.0000 0.0000
323558629 589.32 19.05 20.9 1.25 -27.30% 1.00E-04 3.00E-04 0.0402 0.0932
323558632 589.71 19.55 20.85 1.64 -49.20% l.OOE-04 2.00E-04 0.0402 0.0819
323558633 589.91 19.75 20.4 2.19 -57.60% 0 0 0.0000 0.0000
323559338 722.64 24.55 26.05 1.57 -33.70% 0 0 0.0000 0.0000
323559560 1118.52 34 34.6 1.23 -62.90% 0 0 0.0000 0.0000
323560769 640.63 24.65 25.65 3.10 -58.60% 0 l.OOE-04 0.0000 0.0601
323561045 977.26 29.15 30.95 0.89 -36% 2.00E-04 3.00E-04 0.0688 0.0932
323562995 888.91 24.95 26.3 1.43 -42.20% 1.00E-04 3.00E-04 0.0402 0.0932
323564530 553.49 18.4 18.9 3.26 -56% 2.00E-04 l.OOE-04 0.0688 0.0601
323565674 939.79 37.2 38.3 1.03 -45.60% 1.00E-04 2.00E-04 0.0402 0.0819
323565757 589.92 19.35 20.9 2.22 -54% 0 0 0.0000 0.0000
323569419 600.7 19.65 20.6 1.90 -38.40% 2.00E-04 3.00E-04 0.0688 0.0932
Uses of proteomic features for Alzheimer's disease
In addition to the uses described herein, the proteomic features of the invention can be used in other way for Alzheimer's disease. The proteomic features described herein can be used to screen patients and characterize their disease state or for a primary diagnosis of AD. Disease state information from longitudinal studies with the features could be used for understanding as to how to treat the disease, for the enrolling and monitoring patients in clinical trials, or to correct an errant disease diagnosis. Those skilled in the art would understand that various measurement techniques can also be envisioned that would allow the proteomics features to be measured in a variety of research, clinical and day-to-day settings. For example, the development of antibody based tests or hand held analytical devices can be envisioned that would allow for routine AD screening, for improved diagnosis accuracy, and for the selection of appropriate treatments that the patient may respond to.
EXAMPLE 1
Sample preparation and immunodepletion
Clinical samples of human CSF (600 μl) were thawed on ice after which each sample was split into two aliquots: 150 μl for ELISA measurements for hemoglobin, IgM, and alpha-2-macroglobin (ELISA measurements are described in Example 3) and 450 μl for the subject protein profiling using immunodepletion as the biochemical sample processing method. For the protein profiling study, horse myoglobin was added to 400 μl human CSF so that the final concentration of horse myoglobin was 3 μg/ml. The sample (400 μl) of CSF was applied to an immunoaffinity depletion spin column (Agilent Technologies, Palo Alto, CA), spun at 700 X g, and the flow through was collected. 400 μl of additional PBS was added to the spin column, spun at 700 X g and the fractions were combined, desalted with a 5000 MWCO cutoff ultrafiltration filter, and reconstituted to 100 μL. The depleted proteins were reduced with 4 mM TCEP, alkylated with 10 mM iodoacetamide, and digested overnight with trypsin (1 :50 w/v). The immunoaffinity column was regenerated by eluting bound proteins of with 1.5 M KCl and 100 mM glycine-HCl, pH 2.8 and equilibrated with 10 column volumes of PBS. Digests were quenched with 10 μl of glacial acetic acid, desalted on a peptide trap column (Michrom
Bioresources, Auburn, CA), concentrated to dryness, and dissolved in 15 μl 0.1% acetic acid prior to LC-MS analysis.
EXAMPLE 2 Differential mass spectrometry analysis
Peptides were analyzed by a fourier transform mass spectrometer (FTMS) using a 65 minute gradient. The gradient had four distinct sections; a) 100% A [H2O 0.1 M acetic acid] at a flow rate of 3 μl/rain from 0 to 4 minutes, b) binary gradient from 0% solvent B [90% acetonitrile, 0.1 M acetic acid] to 30% B from 4.01 minutes to 29 minutes at a flow rate of 1 μl/min, c) 29.01 minutes to 39 minutes to 90% B at 1 μl /min, and d) 40 minutes to 65 minutes 100% A at 1 μl /min. Acquisition consisted of a cycle of a full scan ion trap mass spectrum, a full scan FT mass spectrum and 10 MS/MS spectra recorded sequentially on the most abundant ions detected in the initial ion trap scan. Differential mass spectrometry (dMS) analysis was carried out as described previously, Wiener et al., Anal. Chem.. 76: 6085-6096, 2004; U. S. Pat. 6,906,320, the disclosure of which is incorporated herein by reference as if set forth at length. Briefly, at each mass to charge ratio a t-test or non-parametric Kruskal test is performed. This produces a single p value for each data point. Results are further analyzed by looking for features that have consecutive statistically significant differences over a sufficiently long time range. For spike recovery studies myoglobin was added into neat CSF at concentrations ranging from 1 μg/mL to 30 μg/mL before immunodepletion.
EXAMPLE 3
Qualitative and Quantitative Analysis
ELISA kits for hemoglobin and A2M were obtained from ALPCO ( Windham, NH); the IgM ELISA kit was obtained from Bethyl Inc (Austin, TX). The quantitative sandwich ELISA for hemoglobin, IgM, and A2M were based on manufacturer's instruction with modifications on standard and sample dilution, HRP-conjugates dilution, and incubation condition. For the hemoglobin and IgM ELISA, assay volumes of 100 μl per well were used, with the exception of 100 μl of stopping solution for IgM and 50 μL of stopping solution for - hemoglobin.
Standard, CSF sample, capture antibody and HRP conjugated antibody were diluted in a dilution buffer containing 1% BSA and then directly applied to the designed wells. For the A2M ELISA, 10 μl of pre-diluted standard and CSF samples were transferred into 200 μl per well of 0.9% NaCl solution in a pre-coated plate; the HRP-conjugated antibody was diluted in wash buffer (PBS containing 0.05% Tween 20) without BSA. Except for the aforementioned specifications, the assay procedure for all ELISA was the same. CSF samples were thawed on ice and centrifuged at 4000 rpm for 4 minutes at 4°C prior to assay. Diluted standards, CSF and QC samples were added into the wells of a plate coated with an analyte-specific antibody. Following one hour incubation at room temperature, the wells were washed five times and HRP- conjugated antibody was then added. After the second incubation and wash, TMB substrate was added to the wells and incubated at room temperature for either 15 minutes (IgM and A2M) or 25 minutes (hemoglobin). The enzymatic reaction was then stopped by the addition of H2SO4.
After ten minutes, the absorbance was read at 450 nm (OD450) on the Spectra MAX 340P plate reader (Molecular Devices Co., Sunnyvale, CA). The concentration of the analyte was calculated against the standard curve which was generated by 4-parameter plotting with SOFTmax PRO (Molecular Devices Co., Sunnyvale, CA). The total protein concentration of CSF sample was determined with PIERCE microplate BCA protocol (Pierce Biotechnologies, Rockford, EL).
EXAMPLE 4
Data Analysis
To identify peptide features of interest, we used dMS, a label free LC-MS method for finding statistically significant differences in complex peptide and protein mixtures as previously described (Wiener et al., 2004). Differences were identified by performing a 10 x 10 group wise comparison of the disease samples versus control samples. To estimate the number of false positives that might be included in the detected differences, Applicants randomly divided the data into two groups of ten, with five control and five disease samples in each group, and performed a similar calculation. For a given confidence threshold (similar to ap-value), one can determine the number of differences (statistically significant features) in the original comparison and the estimated number of false positive results {from the second comparison) at that confidence threshold. Plotting the number of actual differences versus the estimated number of false positives for different values of confidence thresholds results in an ROC curve (receiver- operator curve). To examine the robustness of the error measurement, three random splits of the data were created, giving three ROC curves (Figure 2). The fact that all three resulting ROC curves are nearly vertical shows that the dMS analysis found many results that are readily distinguishable from possible false positives.
EXAMPLE 5 Development of Classifier To construct one classifier, the data was randomly split into a training set (in this instance containing three quarters of the spectra from each condition) and a test set (in this instance consisting of the remaining one quarter of the spectra from each condition). A classifier was trained based on the features corresponding to m/z ranges where statistically significant differences had been found {the training set was always perfectly classified), and tested to see how well it classified samples from the remaining one quarter of the data (using those features found in the training set). Those skilled in the art would know that this is a standard way of avoiding over-fitting when testing a classifier; if the training data are over-fit, prediction error for the test set is likely to rise (Ripley, B., Pattern Recognition and Neural Networks, 1996, Cambridge University Press, Cambridge.). Using features measured from the training sets, Applicants trained random forests
(Breiman, L., Random Forests, Machine Learning, 45: 5-32, 2001) to distinguish control from AD samples. Applicants tested the classifier using re-sampling cross-validation. Seven out of ten samples from each group were chosen to use to train the classifier. The remaining three samples from each group were used as a test set for that classifier. In each case, both mass spectrometry runs for each sample were used, giving training and test sets with 28 and 12 sets of mass spectrometric data, respectively. The entire procedure was repeated ten times, creating a new random split of the data each time. This was only a partial cross-validation in that Applicants did not run a new dMS analysis on each new training set to find a new set of differentially expressed features.
The random forest procedure produces a measure of each variable's importance. A variable is considered important if randomly shuffling that variable substantially decreases the random forest's ability to correctly classify. Because eliminating less important variables has been shown to improve classifier performance in some cases, we reduced the number of variables by half several times, training and testing a new random forest based on the remaining variables. A random forest classifier using only twelve features (the summed features from group 2, group 19, group 51, group 7, group 28, group 18, group 52, group 36, group 3, group 24, group 42 and group 37, in order of importance as estimated by random forest) automatically selected from the available features was able to perfectly distinguish control from AD samples in the training set, and also performed well in re-sampling tests (Table 3). A random forest classifier based on the six features measured by ELISA performed similarly (data not shown). In each case, a simple linear discriminant analysis using only two features (group 2 and group 51, for the mass spectrometry data, and Aβ-40 and A/3-42 measured by ELISA) could perfectly distinguish the two groups if given all the data for training. Notwithstanding, the cross- validation revealed that classifiers do not always generalize perfectly even on data collected at the same time; a totally separate data set would present an even greater challenge. Figures 4A and 4B show that the separation provided by two of the MS features in a linear discriminant analysis was similar to that provided by A/340 and Aj842 measured by ELISA.
EXAMPLE 6 Classification of disease state
Applicants used a second independent cohort of clinical CSF samples in this example. The diagnoses for the patients in this cohort were provided by Dr. Kaj Blennow
(Sahlgrenska University Hospital, Gδteborg, Sweden) using standard methodology and included a follow up visit approximately two years after the initial visit. Patients were categorized into five diagnostic categories: control (CTL), Alzheimer's (AD), stable mild cognitive impairment (MCI-MCI)5 mild cognitive impairment converted to Alzheimer's (MCI-AD) and vascular dementia (VaD). To identify markers of disease state and disease status, Applicants processed 400 μl of CSF as described in Example 1 and then performed LC-MS profiling as described in Example 2 to generate the immunodepletion data set. Applicants also processed 1 ml of CSF by ultrafiltration. In this method human CSF ultrafiltrate was prepared by Digilab BϊoVisioN AG, Hannover, Germany, Selle et ah, Combinatorial Chemistry & High Throughput Screening. 8:801-806, 2005 . One ml of ultrafiltrate was resuspended in 25 ul of 0.1 M acetic acid. The samples were then vortexed and 2 μl of the suspended ultrafiltrate was analyzed by a fourier transform mass spectrometer (FTMS) using a 65 minute gradient. The gradient of Solvent A (0.1M Acetic Acid in water) and Solvent B flowing at 1 μl/min had four distinct sections; a) 0- 10% B for 2 minutes, b) 10-25% B for 2 minutes, c) 25-40% B for 13 minutes and d) 40-90% B for 27 minutes. Acquisition consisted of a cycle of a full scan ion trap mass spectrum, a full scan FT mass spectrum and 3 MS/MS spectra recorded sequentially on the most abundant ions detected in the initial ion trap scan (the total cycle time is ~0.9 to 1.5 s). Fτom each LC-MS data set we extracted the AUCs for features found to differentiate AD from control CSF via the dMS feature extraction algorithm. In the immunodepletion set, 633 dMS groups were extracted having at least two features among the top 1000 dMS groups. From the ultrafiltration set, 267 groups were extracted having at least two features among the top 1000 dMS groups. From each extracted dMS group, the feature with the minimum mean log likelihood (product of p-values) was chosen as the most representative.
The selected features (633 for immunodepletion and 261 for ultrafiltration) were then used to build a random forest classifier for the five diagnostic categories: control (CTL), Alzheimer's (AD), stable mild cognitive impairment (MCI-MCI), mild cognitive impairment converted to Alzheimer's (MCI-AD) and vascular dementia (VaD). The random forest classifier generates an importance ranking for the features. The features having importance measures deemed to be above noise are kept and a final classifier built from them. The methods described in Svetnik et al., supra, were used to decide on the number of features to retain in the final random forest classifier. From the 633 features for immunodepletion, 158 features were retained. From the 267 features for the combined immunodepletion and ultrafiltration set, 67 features were retained. Combining the two data sets, 225 features were retained out of the 900.
A ten-fold cross-validation was used to evaluate the performance of the classifiers for the five-class, 127-subject data sets. As shown in Table 5A for immunodepletion the classifier has prediction accuracies of 93.3% (CTL), 93.3% (AD), 79.2% (MCI-MCI)576.2% (MCI-AD), and 55% (VaD). For the ultrafiltration data (Table 5B) the classifier has prediction accuracies 93.3% (CTL), 96.7% (AD), 54.2% (MCI-MCI), 42.9% (MCI-AD), and 60% (VaD). The classifier built on the combined data has prediction accuracies 100% (CTL), 96.7% (AD), 83.3% (MCI-MCI), 85.7% (MCI-AD), and 60% (VaD).
EXAMPLE 7
Progression of disease state The methods and the proteomic features described herein were used to measure the progression of Alzheimer's disease.
To demonstrate that one or more of the markers described in Table 4 provide useful measures of disease state or disease status, Applicants reviewed the features in Table 4 and chose two, SMEl and SME2. Applicants looked for features that showed clear signal to noise greater than 3 and uniform chromatographic peak shapes seen in LC-MS data. Additional characteristics included, but were not limited to, mass spectra quality, charge state, and changes in ion intensity between adjacent mass spectra as is typically done by one skilled in the art. SMEl was selected and is feature ID 8993803 (Table 4). SME2 was selected and is feature 3D 8994035 (Table 4). A third clinical cohort, OPTIMA longitudinal, was obtained from AD patients and aged and case-matched controls. The CSF samples were collected at approximately one year intervals to establish the change of disease status over time or longitudinally. Immunodepletion biochemical sample processing was performed on the CSF longitudinal samples using the method described in Example 1. LC-MS profiling was performed using the method described in Example 2, features were identified, and AUC values extracted from the LC-MS data using the feature extraction algorithm described previously. A linear mixed effects model that assumed a linear relationship between log AUC and time and was fitted to the data for each feature. The model allows for a "population average" relationship between log AUC and time and for trends between AD and control groups to be different (the fixed effects), as well as allowing each patient to have his/her own trend over time (random effects).
The results of such a model for SMEl and SME2 as disease status or progression markers are shown in Figures 6A and 6B. As shown in Table 6, Applicants saw an average annual decline of 12.3% and 8.8%, respectively, for SMEl and SME 2.
EXAMPLE 8
Identification of Disease Progression Markers
To identify disease progression markers of interest, Applicants used dMS to obtain a list of features and their corresponding AUC values in the OPTIMA longitudinal clinical data set as described in Example 7. From these features, selection was made based on those having at least 80% positive AUC values in both AD and control samples. A linear mixed effect model similar to those described in Example 7 was fitted to each of the features. Those features having Q-values for time effect in the AD group and the Q-values for the difference in time effect between the AD and control groups smaller than 0.1 are shown in Table 7.
Features such as these can be used individually or in groups of two or more as biochemical measure of disease status or disease state.

Claims

WHAT IS CLAIMED:
1. A method for classifying disease states in Alzheimer' s disease ("AD") comprising: a. selecting a statistically relevant set of mass spectrometric features from human ante- mortem and healthy control fluid samples in which a plurality of features are differentially expressed to form a reference AD and control panel; b. conducting a linear discriminate analysis on the mass spectrometric feature data from step (a); c. obtaining a test fluid sample from a patient; d. gathering mass spectrometric data on the test sample including the features in the set of step (a); e. applying the results of step (d) to the linear discriminate analysis of step (b) to obtain an output; and £ determining from the output of step (e) the classification of the disease state, where the output is either AD or control.
2. The method of claim 1 wherein the set of mass spectrometric features is a plurality of features selected from the group of features listed in Table 4.
3. The method of claim 2 wherein the set of features comprises features selected from the group consisting of group 2 and group 51 of Table 4.
4. A method for predicting cognition scores for Alzheimer's disease ("AD") patients comprising: a. selecting a statistically relevant set of mass spectrometric features from human ante- mortem and healthy control fluid samples in which a plurality of features are differentially expressed to form a reference AD and control panel; b. conducting a random forest analysis on the multi-analyte data from step (a); c. obtaining a test fluid sample from a patient; d. gathering mass spectrometric data on the test sample including the features in the set of step (a); e. applying the results of step (d) to the random forest analysis of step (b) to obtain an output; and f. determining from the output of step (e), where the output is the assignment of the cognition score.
5. The method of claim 4 wherein the set of mass spectrometric features is a plurality of features selected from the group of features listed in Table 4.
6. The method of claim 5 wherein the set of features comprises biomarkers selected from the group consisting of the summed features from group 2, group 3, group 7, group 18, group 19, group 24, group 28, group 36, group 37, group 42, group 51 and group 52.
7. Disease classification biomarkers selected from the group consisting of
SMEl (group 2), and SME2 (group 51)
PCT/US2007/012155 2006-05-26 2007-05-22 Methods for the diagnosis and prognosis of alzheimer's disease using csf protein profiling WO2007139777A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US80863906P 2006-05-26 2006-05-26
US60/808,639 2006-05-26

Publications (2)

Publication Number Publication Date
WO2007139777A2 true WO2007139777A2 (en) 2007-12-06
WO2007139777A3 WO2007139777A3 (en) 2008-02-28

Family

ID=38779165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/012155 WO2007139777A2 (en) 2006-05-26 2007-05-22 Methods for the diagnosis and prognosis of alzheimer's disease using csf protein profiling

Country Status (1)

Country Link
WO (1) WO2007139777A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2287604A1 (en) * 2008-06-13 2011-02-23 Eisai R&D Management Co., Ltd. Method for examination of alzheimer s disease
EP2304431A1 (en) * 2008-07-25 2011-04-06 Merck & Co., Inc. Csf biomarkers for the prediction of cognitive decline in alzheimer's disease patients

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107202833A (en) * 2017-06-21 2017-09-26 佛山科学技术学院 The quick determination method of copper ion pollution level in a kind of water body

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6114175A (en) * 1994-07-19 2000-09-05 University Of Pittsburgh Compound for the antemortem diagnosis of Alzheimer's Disease and in vivo imaging and prevention of amyloid deposition
US6875434B1 (en) * 1997-12-02 2005-04-05 Neuralab Limited Methods of treatment of Alzheimer's disease
US20050244890A1 (en) * 2003-11-07 2005-11-03 Davies Huw A Biomarkers for Alzheimer's disease

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6114175A (en) * 1994-07-19 2000-09-05 University Of Pittsburgh Compound for the antemortem diagnosis of Alzheimer's Disease and in vivo imaging and prevention of amyloid deposition
US6875434B1 (en) * 1997-12-02 2005-04-05 Neuralab Limited Methods of treatment of Alzheimer's disease
US20050244890A1 (en) * 2003-11-07 2005-11-03 Davies Huw A Biomarkers for Alzheimer's disease

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUNETTA K.L. ET AL.: 'Screening Large-Scale Association Study Data: Exploiting Interactions Using Random Forests' BMC GENETICS vol. 5, December 2004, pages 1 - 3 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2287604A1 (en) * 2008-06-13 2011-02-23 Eisai R&D Management Co., Ltd. Method for examination of alzheimer s disease
EP2287604A4 (en) * 2008-06-13 2011-06-08 Eisai R&D Man Co Ltd Method for examination of alzheimer s disease
JP5106631B2 (en) * 2008-06-13 2012-12-26 エーザイ・アール・アンド・ディー・マネジメント株式会社 Testing method for Alzheimer's disease
EP2304431A1 (en) * 2008-07-25 2011-04-06 Merck & Co., Inc. Csf biomarkers for the prediction of cognitive decline in alzheimer's disease patients
EP2304431A4 (en) * 2008-07-25 2011-11-02 Merck & Co Inc Csf biomarkers for the prediction of cognitive decline in alzheimer's disease patients

Also Published As

Publication number Publication date
WO2007139777A3 (en) 2008-02-28

Similar Documents

Publication Publication Date Title
US20240003913A1 (en) Biomarkers for cognitive dysfunction diseases and method for detecting cognitive dysfunction disease using biomarkers
Ganesalingam et al. Combination of neurofilament heavy chain and complement C3 as CSF biomarkers for ALS
US20220026448A1 (en) Circulating biomarker levels for diagnosis and risk-stratification of traumatic brain injury
Ottervald et al. Multiple sclerosis: Identification and clinical evaluation of novel CSF biomarkers
EP3255434B1 (en) Novel biomarkers for cognitive impairment and methods for detecting cognitive impairment using such biomarkers
JP2019200210A (en) Method and system for determining autism spectrum disorder risk
German et al. Serum biomarkers for Alzheimer's disease: proteomic discovery
US20050064516A1 (en) Biological markers for diagnosing multiple sclerosis
US20230238143A1 (en) Multimodality systems and methods for detection, prognosis, and monitoring of neurological injury and disease
US20110143380A1 (en) Alzheimer's disease biomarkers and methods of use
Liang et al. Metabolomics of alcoholic liver disease: a clinical discovery study
US8465727B2 (en) Biomarkers for the diagnosis of ALS
Ghidoni et al. Translational proteomics in Alzheimer's disease and related disorders
US20160195547A1 (en) Diagnostic tools for alzheimer's disease
US9678086B2 (en) Diagnostic assay for Alzheimer's disease
CA2781952A1 (en) Amyloid beta aggregates in cerebro spinal fluid as biomarkers for alzheimer's disease
US20160123997A1 (en) Materials and methods relating to alzheimer's disease
US20140228240A1 (en) Screening Blood for Protein Biomarkers and Uses Thereof in Alzheimer's Disease and Mild Cognitive Impairment
Liang et al. Novel liquid chromatography-mass spectrometry for metabolite biomarkers of acute lung injury disease
Tao et al. Alzheimer’s disease early diagnostic and staging biomarkers revealed by large-scale cerebrospinal fluid and serum proteomic profiling
WO2007139777A2 (en) Methods for the diagnosis and prognosis of alzheimer's disease using csf protein profiling
WO2021009074A1 (en) Novel markers as early predictors of alzheimer's pathology
JP2014071016A (en) Method of detecting nervous system degenerative disease
WO2022102654A1 (en) Combination of biomarkers, and method for detecting cognitive dysfunction or risk thereof by using said combination
WO2009156747A2 (en) Assay

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07777213

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07777213

Country of ref document: EP

Kind code of ref document: A2