Background Although acupuncture is widely used for chronic pain, there remains considerable controversy as to its value. We aimed to determine the effect size of acupuncture for 4 chronic pain conditions: back and neck pain, osteoarthritis, chronic headache, and shoulder pain.
Methods We conducted a systematic review to identify randomized controlled trials (RCTs) of acupuncture for chronic pain in which allocation concealment was determined unambiguously to be adequate. Individual patient data meta-analyses were conducted using data from 29 of 31 eligible RCTs, with a total of 17 922 patients analyzed.
Results In the primary analysis, including all eligible RCTs, acupuncture was superior to both sham and no-acupuncture control for each pain condition (P < .001 for all comparisons). After exclusion of an outlying set of RCTs that strongly favored acupuncture, the effect sizes were similar across pain conditions. Patients receiving acupuncture had less pain, with scores that were 0.23 (95% CI, 0.13-0.33), 0.16 (95% CI, 0.07-0.25), and 0.15 (95% CI, 0.07-0.24) SDs lower than sham controls for back and neck pain, osteoarthritis, and chronic headache, respectively; the effect sizes in comparison to no-acupuncture controls were 0.55 (95% CI, 0.51-0.58), 0.57 (95% CI, 0.50-0.64), and 0.42 (95% CI, 0.37-0.46) SDs. These results were robust to a variety of sensitivity analyses, including those related to publication bias.
Conclusions Acupuncture is effective for the treatment of chronic pain and is therefore a reasonable referral option. Significant differences between true and sham acupuncture indicate that acupuncture is more than a placebo. However, these differences are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to the therapeutic effects of acupuncture.
Acupuncture is the insertion and stimulation of needles at specific points on the body to facilitate recovery of health. Although initially developed as part of traditional Chinese medicine, some contemporary acupuncturists, particularly those with medical qualifications, understand acupuncture in physiologic terms, without reference to premodern concepts.1
An estimated 3 million American adults receive acupuncture treatment each year,2 and chronic pain is the most common presentation.3 Acupuncture is known to have physiologic effects relevant to analgesia,4,5 but there is no accepted mechanism by which it could have persisting effects on chronic pain. This lack of biological plausibility, and its provenance in theories lying outside of biomedicine, makes acupuncture a highly controversial therapy.
A large number of randomized controlled trials (RCTs) of acupuncture for chronic pain have been conducted. Most have been of low methodologic quality, and, accordingly, meta-analyses based on these RCTs are of questionable interpretability and value.6 Herein, we present an individual patient data meta-analysis of RCTs of acupuncture for chronic pain, in which only high-quality RCTs were eligible for inclusion. Individual patient data meta-analysis are superior to the use of summary data in meta-analysis because they enhance data quality, enable different forms of outcome to be combined, and allow use of statistical techniques of increased precision.
The full protocol of the meta-analysis has been published.6 In brief, the study was conducted in 3 phases: identification of eligible RCTs; collection, checking, and harmonization of raw data; and individual patient data meta-analysis.
DATA SOURCES AND SEARCHES
To identify articles, we searched MEDLINE, the Cochrane Collaboration Central Register of Controlled Trials, and the citation lists of systematic reviews (the full search strategy is shown in theeAppendix). There were no language restrictions. The initial search, current to November 2008, was used to identify studies for the individual patient data meta-analysis; a second search was conducted in December 2010 for summary data to use in a sensitivity analysis.
Two reviewers applied inclusion criteria for potentially eligible articles separately, with disagreements about study inclusion resolved by consensus. Randomized controlled trials were eligible for analysis if they included at least 1 group receiving acupuncture needling and 1 group receiving either sham (placebo) acupuncture or no-acupuncture control. The RCTs must have accrued patients with 1 of 4 indications—nonspecific back or neck pain, shoulder pain, chronic headache, or osteoarthritis—with the additional criterion that the current episode of pain must be of at least 4 weeks duration for musculoskeletal disorders. There was no restriction on the type of outcome measure, although we specified that the primary end point must be measured more than 4 weeks after the initial acupuncture treatment.
It has been demonstrated that unconcealed allocation is the most important source of bias in RCTs,7and, as such, we included only those RCTs in which allocation concealment was determined unambiguously to be adequate (further details are in the review protocol6). Where necessary, we contacted authors for further information concerning the exact logistics of the randomization process. We excluded RCTs if there was any ambiguity about allocation concealment.
DATA EXTRACTION AND QUALITY ASSESSMENT
The principal investigators of eligible studies were contacted and asked to provide raw data from the RCT. To ensure data accuracy, all results reported in the RCT publication, including baseline characteristics and outcome data, were then replicated.
Reviewers assessed the quality of blinding for eligible RCTs with sham acupuncture control. The RCTs were graded as having a low likelihood of bias if either the adequacy of blinding was checked by direct questioning of patients (eg, by use of a credibility questionnaire) and no important differences were found between groups, or the blinding method (eg, the Streitberger and Kleinhenz sham device8) had previously been validated as able to maintain blinding. Randomized controlled trials with a high likelihood of bias from unblinding were excluded from the meta-analysis of acupuncture vs sham; a sensitivity analysis included only RCTs with a low risk of bias.
DATA SYNTHESIS AND ANALYSIS
Each RCT was reanalyzed by analysis of covariance with the standardized principal end point (scores divided by pooled standard deviation) as the dependent variable, and the baseline measure of the principal end point and variables used to stratify randomization as covariates. This approach has been shown to have the greatest statistical power for RCTs with baseline and follow-up measures.9,10 The effect size for acupuncture from each RCT was then entered into a meta-analysis using the metan command in Stata software (version 11; Stata Corp): the meta-analytic statistics were created by weighting each coefficient by the reciprocal of the variance, summing, and dividing by the sum of the weights. Meta-analyses were conducted separately for comparisons of acupuncture with sham and no-acupuncture control, and within each pain type. We prespecified that the hypothesis test would be based on the fixed effects analysis because this constitutes a valid test of the null hypothesis of no treatment effect.
We identified 82 RCTs (Figure 1),11– 93 of which 31 were eligible (Table 1 and eAppendix). Four of the studies were organized as part of the German Acupuncture Trials (GERAC) initiative,11– 14 4 were part of the Acupuncture Randomized Trials (ART) group15– 18; 4 were Acupuncture in Routine Care (ARC) studies19– 22; 3 were UK National Health Service acupuncture RCTs.23,24,98 Eleven studies were sham controlled, 10 had no-acupuncture control, and 10 were 3-armed studies, including both sham and no-acupuncture control. The second search for subsequently published studies identified an additional 4 eligible studies,94– 97 with a total of 1619 patients.
An important source of clinical heterogeneity between studies concerns the control groups. In the sham RCTs, the type of sham included acupuncture needles inserted superficially,13 sham acupuncture devices with needles that retract into the handle rather than penetrate the skin,25 and nonneedle approaches, such as deactivated electrical stimulation26 or detuned laser.27 Moreover, cointerventions varied, with no additional treatment other than analgesics in some RCTs,15 whereas in other RCTs, both acupuncture and sham groups received a course of additional treatment, such as exercise led by physical therapists.24 Similarly, the no-acupuncture control groups varied among usual care, such as an RCT in which control group patients were merely advised to “avoid acupuncture”98; attention control, such as group education sessions28; and guidelined care, in which patients were given advice as to specific drugs and doses.13
DATA EXTRACTION AND QUALITY ASSESSMENT
Usable raw data were obtained from 29 of the 31 eligible RCTs, including a total of 17 922 patients from the United States, United Kingdom, Germany, Spain and Sweden. For 1 RCT, the study database had become corrupted29; in another case, the statisticians involved in the RCT failed to respond to repeated enquiries despite approval for data sharing being obtained from the principal investigator.30
The 29 RCTs comprised 18 comparisons with 14 597 patients of acupuncture with no-acupuncture group and 20 comparisons with 5230 patients of acupuncture and sham acupuncture. Patients in all RCTs had access to analgesics and other standard treatments for pain. Four sham RCTs were determined to have an intermediate likelihood of bias from unblinding13,27,31,32; the 16 remaining sham RCTs were graded as having a low risk of bias from unblinding. On average, dropout rates were low (weighted mean, 10%). Dropout rates were higher than 25% for only 4 RCTs: those by Molsberger et al30,97 (27% and 33%, respectively, but raw data were not received and neither RCT included in main analysis); Carlsson et al32 (46%, RCT excluded in a sensitivity analysis for blinding), and Berman et al28 (31%). This RCT had a high dropout rate among no-acupuncture controls (43%); dropout rates were close to 25% in the acupuncture and sham groups. The RCT by Kerr et al31 had a large difference in dropout rates between groups (acupuncture, 13%; control, 33%) but was excluded in the sensitivity analysis for blinding.
Forest plots for acupuncture against sham acupuncture and against no-acupuncture control are shown separately for each of the 4 pain conditions in Figure 2 and Figure 3. Meta-analytic statistics are shown in Table 2. Acupuncture was statistically superior to control for allanalyses (P < .001). Effect sizes are larger for the comparison between acupuncture and no-acupuncture control than for the comparison between acupuncture and sham: 0.37, 0.26, and 0.15 in comparison with sham vs 0.55, 0.57, and 0.42 in comparison with no-acupuncture control for musculoskeletal pain, osteoarthritis, and chronic headache, respectively.
For 5 of the 7 analyses, the test for heterogeneity was statistically significant. In the case of comparisons with sham acupuncture, the RCTs by Vas et al37,38,41 are clear outliers. For example, the effect size of the RCTs by Vas et al for neck pain is about 5 times greater than meta-analytic estimate. One effect of excluding these RCTs in a sensitivity analysis (Table 3 and Table 4) is that there is no significant heterogeneity in the comparisons between acupuncture and sham. Moreover, the effect size for acupuncture becomes relatively similar for the different pain conditions: 0.23, 0.16, and 0.15 against sham, and 0.55, 0.57, and 0.42 against no-acupuncture control for back and neck pain, osteoarthritis, and chronic headache, respectively (fixed effects; results similar for the random effects analysis).
To give an example of what these effect sizes mean in real terms, a baseline pain score on a 0 to 100 scale for a typical RCT might be 60. Given a standard deviation of 25, follow-up scores might be 43 in a no-acupuncture group, 35 in a sham acupuncture group, and 30 in patients receiving true acupuncture. If response were defined in terms of a pain reduction of 50% or more, response rates would be approximately 30%, 42.5%, and 50%, respectively.
The comparisons with no-acupuncture control show evidence of heterogeneity. This seems largely explicable in terms of differences between the control groups used. In the case of osteoarthritis, the largest effect was in the study by Witt et al,17 in which patients in the waiting list control received only rescue pain medication, and the smallest was in the study by Foster et al,24 which involved a program of exercise and advice led by physical therapists. For the musculoskeletal analyses, heterogeneity is driven by 2 very large RCTs19,20 (n = 2565 patients and n = 3118 patients, respectively) for back and neck pain. If only back pain is considered (Table 3 and Table 4), heterogeneity is dramatically reduced and is again driven by one RCT, by Brinkhaus et al,15 with waiting list control. In the headache meta-analysis, Diener et al13 had much smaller differences between groups. This RCT involved providing drug therapy according to national guidelines in the no-acupuncture group, including initiation of β-blockers as migraine prophylaxis. There was disagreement within the collaboration about whether this constituted active control. Excluding this RCT reduced evidence of heterogeneity (P = .04) but had little effect on the effect size (0.42-0.45).
Table 3 and Table 4 show several prespecified sensitivity analyses. Neither restricting the sham RCTs to those with low likelihood of unblinding nor adjustment for missing data had any substantive effect on our main estimates. Inclusion of summary data from RCTs for which raw data were not obtained (2 RCTs) or which were published recently (4 RCTs) also had little impact on either the primary analysis (Table 3 and Table 4) or the analysis with the outlying RCTs by Vas et al37,38,41excluded (data not shown).
To estimate the potential impact of publication bias, we entered all RCTs into a single analysis and compared the effect sizes from small and large studies.99 We saw some evidence that small studies had larger effect sizes for the comparison with sham (P = .02) but not no-acupuncture control (P = .72). However, these analyses are influenced by the outlying RCTs by Vas et al,37,38,41 which were smaller than average, and by indication, because the shoulder pain RCTs were small and had large effect sizes. Tests for asymmetry were nonsignificant when we excluded the RCTs by Vas et al37,38,41 and shoulder pain studies (n = 15; P = .07) and when small studies were also excluded (n < 100 and n = 12, respectively; P = .30). Nonetheless, we repeated our meta-analyses excluding RCTs with a sample size of less than 100. This had essentially no effect on our results. As a further test of publication bias, we considered the possible effect on our analysis if we had failed to include high-quality, unpublished studies. Only if there were 47 unpublished RCTs with n = 100 patients showing an advantage to sham of 0.25 SD would the difference between acupuncture and sham lose significance.
A final sensitivity analysis examined the effect of pooling different end points measured at different periods of follow-up. We repeated our analyses including only pain end points measured at 2 to 3 months after randomization. There was no material effect on results: effect sizes increased by 0.05 to 0.09 SD for musculoskeletal and osteoarthritis RCTs and were stable otherwise.
As an exploratory analysis, we compared sham control with no-acupuncture control. In a meta-analysis of 9 RCTs,11– 13,15– 18,24,28 the effect size for sham was 0.33 (95% CI, 0.27-0.40) and 0.38 (95% CI, 0.20-0.56) for fixed and random effects models, respectively (P < .001 for tests of both effect and heterogeneity).
OVERVIEW OF FINDINGS
In an analysis of patient-level data from 29 high-quality RCTs, including 17 922 patients, we found statistically significant differences between both acupuncture vs sham and acupuncture vs no-acupuncture control for all pain types studied. After excluding an outlying set of studies, meta-analytic effect sizes were similar across pain conditions.
The effect size for individual RCTs comparing acupuncture with no-acupuncture control did vary, an effect that seems at least partly explicable in terms of the type of control used. As might be expected, acupuncture had a smaller benefit in patients who received a program of ancillary care—such as physical therapist–led exercise24—than in patients who continued to be treated with usual care. Nonetheless, the average effect, as expressed in the meta-analytic estimate of approximately 0.5 SD, is of clear clinical relevance whether considered either as a standardized difference100 or when converted back to a pain scale. The difference between acupuncture and sham is of lesser magnitude, 0.15 to 0.23 SD.
Neither study quality nor sample size seems to be a problem for this meta-analysis, on the grounds that only high-quality studies were eligible and the total sample size is large. Moreover, we saw no evidence that publication bias, or failure to identify published eligible studies, could affect our conclusions.
Because the comparisons between acupuncture and no-acupuncture cannot be blinded, both performance and response bias are possible. Similarly, while we considered the risk of bias of unblinding low in most studies comparing acupuncture and sham acupuncture, health care providers obviously were aware of the treatment provided, and, as such, a certain degree of bias of our effect estimate for specific effects cannot be entirely ruled out. However, it should be kept in mind that this problem applies to almost all studies on nondrug interventions. We would argue that the risk of bias in the comparison between acupuncture and sham acupuncture is low compared with other nondrug treatments for chronic pain, such as cognitive therapies, exercise, or manipulation, which are rarely subject to placebo control.
Another possible critique is that the meta-analyses combined different end points, such as pain and function, measured at different times. However, results did not change when we restricted the analysis to pain end points measured at a specific follow-up time, 2 to 3 months after randomization.
COMPARISON WITH OTHER STUDIES
Many prior systematic reviews of acupuncture for chronic pain have had liberal eligibility criteria, accordingly included RCTs of low methodologic quality, and then came to the circular conclusion that weaknesses in the data did not allow conclusions to be drawn.101,102 Other reviews have not included meta-analyses, apparently owing to variation in study end points.103,104 We have avoided both problems by including only high-quality RCTs and obtaining raw data for individual patient data meta-analysis. Some more recent systematic reviews have published meta-analyses105– 108 and reported findings that are broadly comparable with ours, with clear differences between acupuncture and no-acupuncture control and smaller differences between true and sham acupuncture. Our findings have greater precision: all prior reviews have analyzed summary data, an approach of reduced statistical precision when compared with individual patient data meta-analysis.6,109 In particular, we have demonstrated a robust difference between acupuncture and sham control that can be distinguished from bias. This is a novel finding that moves beyond the prior literature.
We believe that our findings are both clinically and scientifically important. They suggest that the total effects of acupuncture, as experienced by the patient in routine clinical practice, are clinically relevant, but that an important part of these total effects is not due to issues considered to be crucial by most acupuncturists, such as the correct location of points and depth of needling. Several lines of argument suggest that acupuncture (whether real or sham) is associated with more potent placebo or context effects than other interventions.110– 113 Yet, many clinicians would feel uncomfortable in providing or referring patients to acupuncture if it were merely a potent placebo. Similarly, it is questionable whether national or private health insurance should reimburse therapies that do not have specific effects. Our finding that acupuncture has effects over and above those of sham acupuncture is therefore of major importance for clinical practice. Even though on average these effects are small, the clinical decision made by physicians and patients is not between true and sham acupuncture but between a referral to an acupuncturist or avoiding such a referral. The total effects of acupuncture, as experienced by the patient in routine practice, include both the specific effects associated with correct needle insertion according to acupuncture theory, nonspecific physiologic effects of needling, and nonspecific psychological (placebo) effects related to the patient’s belief that treatment will be effective.
In conclusion, we found acupuncture to be superior to both no-acupuncture control and sham acupuncture for the treatment of chronic pain. Although the data indicate that acupuncture is more than a placebo, the differences between true and sham acupuncture are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to therapeutic effects. Our results from individual patient data meta-analyses of nearly 18 000 randomized patients in high-quality RCTs provide the most robust evidence to date that acupuncture is a reasonable referral option for patients with chronic pain.
Correspondence: Andrew J. Vickers, DPhil, Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 E 63rd St, New York, NY 10065 (firstname.lastname@example.org).
Accepted for Publication: May 28, 2012.
Published Online: September 10, 2012. doi:10.1001/archinternmed.2012.3654
Author Contributions: Dr Vickers had full access to all of the data inthe study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors gave comments on early drafts and approved the final version of the manuscript. Study concept and design: Vickers, Lewith, Foster, Witt, and Linde. Acquisition of data: Vickers, Maschino, MacPherson, Foster, and Witt. Analysis and interpretation of data: Vickers, Cronin, Maschino, Lewith, MacPherson, Foster, Sherman, Witt, and Linde. Drafting of the manuscript: Vickers and Maschino. Critical revision of the manuscript for important intellectual content: Vickers, Cronin, Maschino, Lewith, MacPherson, Foster, Sherman, Witt, and Linde. Statistical analysis: Vickers, Cronin, and Maschino. Obtained funding: Vickers and Linde. Administrative, technical, and material support: Lewith. Study supervision: Vickers.
Financial Disclosure: None reported.
Funding/Support: The Acupuncture Trialists’ Collaboration is funded by an R21 (AT004189I from the National Center for Complementary and Alternative Medicine (NCCAM) at the National Institutes of Health (NIH) to Dr Vickers) and by a grant from the Samueli Institute. Dr MacPherson’s work has been supported in part by the UK National Institute for Health Research (NIHR) under its Programme Grants for Applied Research scheme (RP-PG-0707-10186). The views expressed in this publication are those of the author(s) and not necessarily those of the NCCAM NHS, the NIHR, or the Department of Health in England.
Role of the Sponsors: No sponsor had any role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
The Acupuncture Trialists’ Collaboration: Claire Allen, BS, Cochrane Collaboration Secretariat, Oxford, England; Mac Beckner, MIS, Information Technology and Data Management Center, Samueli Institute, Alexandria, Virginia; Brian Berman, MD, University of Maryland School of Medicine and Center for Integrative Medicine, College Park; Benno Brinkhaus, MD, Institute for Social Medicine, Epidemiology and Health Economics, Charité–University Medical Center, Berlin, Germany; Remy Coeytaux, MD, PhD, Department of Community and Family Medicine, Duke University, Durham, North Carolina; Angel M. Cronin, MS, Dana-Farber Cancer Institute, Boston, Massachusetts; Hans-Christoph Diener, MD, PhD, Department of Neurology, University of Duisburg-Essen, Germany; Heinz G. Endres, MD, Ruhr–University Bochum, Bochum, Germany; Nadine Foster, DPhil, BSc(Hons), Arthritis Research UK Primary Care Centre, Keele University, Newcastle-under-Lyme, Staffordshire, England; Juan Antonio Guerra de Hoyos, MD, Andalusian Integral Plan for Pain Management, and Andalusian Health Service Project for Improving Primary Care Research, Sevilla, Spain; Michael Haake, MD, PhD, Department of Orthopedics and Traumatology, SLK-Hospitals, Heilbronn, Germany; Richard Hammerschlag, PhD, Oregon College of Oriental Medicine, Portland; Dominik Irnich, MD, Interdisciplinary Pain Centre, University of Munich, Munich, Germany; Wayne B. Jonas, MD, Samueli Institute; Kai Kronfeld, PhD, Interdisciplinary Centre for Clinical Trials (IZKS Mainz), University Medical Centre Mainz, Mainz, Germany; Lixing Lao, PhD, University of Maryland and Center for Integrative Medicine, College Park; George Lewith, MD, FRCP, Complementary and Integrated Medicine Research Unit, Southampton Medical School, Southampton, England; Klaus Linde, MD, Institute of General Practice, Technische Universität München, Munich; Hugh MacPherson, PhD, Complementary Medicine Research Group, University of York, York, England; Eric Manheimer, MS, Center for Integrative Medicine, University of Maryland School of Medicine, College Park; Alexandra Maschino, BS, Memorial Sloan-Kettering Cancer Center, New York, New York; Dieter Melchart, MD, PhD, Centre for Complementary Medicine Research (Znf), Technische Universität München, Munich; Albrecht Molsberger, MD, PhD, German Acupuncture Research Group, Duesseldorf, Germany; Karen J. Sherman, PhD, MPH, Group Health Research Institute, Seattle, Washington; Hans Trampisch, PhD, Department of Medical Statistics and Epidemiology, Ruhr–University Bochum; Jorge Vas, MD, PhD, Pain Treatment Unit, Dos Hermanas Primary Care Health Center (Andalusia Public Health System), Dos Hermanas, Spain; Andrew J. Vickers (collaboration chair), DPhil, Memorial Sloan-Kettering Cancer Center; Norbert Victor, PhD (deceased), Institute of Medical Biometrics and Informatics, University of Heidelberg, Heidelberg, Germany; Peter White, PhD, School of Health Sciences, University of Southampton; Lyn Williamson, MD, MA (Oxon), MRCGP, FRCP, Great Western Hospital, Swindon, and Oxford University, Oxford, England; Stefan N. Willich, MD, MPH, MBA, Institute for Social Medicine, Epidemiology, and Health Economics, Charité University Medical Center, Berlin; Claudia M. Witt, MD, MBA, University Medical Center Charité and Institute for Social Medicine, Epidemiology and Health Economics, Berlin.