estimating sample size computer laboratories epidemiology and biostatistics department faculty of...
TRANSCRIPT
Estimating Sample SizeComputer Laboratories
Epidemiology and Biostatistics Department
Faculty of Medicine Universitas Padjadjaran
2013
Reference
• Dahlan, MS. Besar Sampel dan Cara Pengambilan Sampel dalam Penelitian Kedokteran dan Kesehatan. Edisi 3. Jakarta: Salemba Medika; 2008
• Hulley, SB et al. Designing Clinical Research. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2007
Introduction
Whom? what? Design?
↓
How many subjects to sample?
Introduction
• If the sample size is too small fail to answer its research question
• If the sample size is too large more difficult and costly than necessary
Introduction
• Goal to estimate an appropriate number of subjects for a given study design
• Should be estimated early in the design phase, when major changes are still possible– Research design is not feasible– Different predictor or outcome variables are
needed
Reasons for sampling
• Unable to perform total sampling
• Results from representative sample (appropriate number of subjects and sampling technique) can be generalized to population
• More efficient and ethical
Generalization
Study subjects
Intended sample
Accessible population
Study/Target population
Internal validity
External validity I
External validity II
Internal validity
• Representative actual sample/study subjects from intended sample– same characteristics with intended sample– problems: non-response, drop-out, loss to
follow-up
External validity I
• Representative intended sample from accessible population– Appropriate sample size– Probabilistic sampling method
External validity II
• Representative accessible population from target/study population
How to get appropriate sample size?
• Appropriate sample size formula– Can be decided from our research questions/
research problems/problem identification
• Correct sample size calculation
Type of research:Specific design
• Diagnostic– Sensitivity, specificity, PPV, NPV, LR (+), LR (-)
• Prognostic– Example: What are the prognostic factors of
shock in DHF patients?
• Survival analysis– Example: Is there a mortality rate difference
between HIV-patient treated with HAART starting at CD4 count 200 and 200 ?
Type of research:Non-specific design
• Descriptive– To estimate population proportion
• What is the prevalence of diarrhea in Kecamatan X? – To estimate population mean
• What is the mean of FBG level among adults in Kecamatan X?
• Analytic– To find relationship/association between
dependent and independent variable– To find a (proportion, mean) difference between
two or more groups– To find correlation between variables
Notes
• In one study, it is possible to use more than one sample size formula, due to:– More than one research questions– Different study design
• Cohort and nested-case control
Notes
• Stated in advance the primary and secondary research questions/hypotheses
• The sample size calculations are always focused on the primary research question/hypothesis
Power of the study(1 – β)
• Results may be different• Need to be calculated again due to:
– Actual sample/study subjects intended sample
– in correlation study is different
– Effect size (p1 – p2, x1 – x2) is different
– Sample size is predetermined
Z and Z *
Value of or Z
ZDescriptive or Two-sided
One-sided
1% 2.81 2.57 2.57
5% 1.96 1.64 1.64
10% 1.64 1.44 1.44
15% 1.44 1.28 1.28
20% 1.28 0.84 0.84
For two-tailed hypothesis Z1 – /2
For one-tailed hypothesis Z1 –
*From Dahlan, MS, 2008
Strategies for minimizing sample size and maximizing power
• Use continuous variable (for outcome variable)– Permits smaller sample size for a given power– Permits greater power for a given sample size
Strategies for minimizing sample size and maximizing power
• Use paired measurements or matching– By comparing each subject with herself, it
removes the baseline between-subjects part of the variability of the outcome variable
– Example: • Change in weight on a diet has less variability than
the final weight• Final weight is highly correlated with initial weight
Strategies for minimizing sample size and maximizing power
• Increase the precision– Standardizing the measurement methods– Training and certifying observer– Refining the instrument– Automating the instrument– Repeating the measurement
Strategies for minimizing sample size and maximizing power
• Use unequal group sizes– In general, the gain in power when the size of
one group is increase to twice the size of the other is considerable
– Tripling or quadrupling one of the groups provide progressively smaller gains.
– Example:
In a case control study 1 case : 2 controls
Strategies for minimizing sample size and maximizing power
• Use more common outcome (with caution!)– More frequent outcome– Enroll subjects at greater risk of developing
that outcome– Extend the follow-up period– Loosen the definition of what constitutes an
outcome
Common Errors to Avoid
• Estimating sample size late during the design of the study most common
• Percentage or rate misinterpreted as numeric• No planning for dropouts or subjects with
missing data• Equal vs unequal sample sizes• Two-sided alternative hypothesis or statistical
analysis (Z1 - /2), but we use one-sided (Z1 - ) during sample size determination
Literature vs Judgement* Variable Descriptive Analytic
Judgement Categorical Probability of type I error = Precision = d
Probability of type I error = (one/two-sided)Probability of type II error = p1 – p2
Numerical Probability of type I error = Precision = d
Probability of type I error = (one/two-sided)Probability of type II error = x1 – x2
Literature or pilot study
Categorical Proportion Proportion in control/non-exposed/standard group = P2
Numerical Standard deviation Combined standard deviation = SCorrelation coefficient = r
*From Dahlan, MS, 2008
Case I
• Students have a variety of reasons for doing research while in medical school. As part of the Jatinangor program you are interesting in reproductive health. The aim of your study is to know the prevalence of puberty (defined by menarche or wet dreams) among primary school children in Kecamatan Jatinangor. There is no previous study on prevalence of puberty in that community.
Answer
a. The most appropriate study design: cross-sectional studyOutcome variable : prevalence of puberty (history of menarche or wet dreams Yes-No, nominal) Predictor variable : -
b. The most appropriate statistical analysis for the study: Descriptive statistics
Answer
c. The target population: All Primary school in Kecamatan Jatinangor The accessible population: Primary school in Kecamatan Jatinangor Study unit of the study: Student age of 7 – 12 years old
d. The appropriate sampling technique for the study: Stratified random sampling, cluster sampling
Answer
e. Using 95% confidence interval ( =.05) and with precision of the study 10 % (within 10% of the true value), the sample size needed and the appropriate sampling technique are :
• For α= 0.05 then Z0.975 = 1.96
make sure npq ≥ 5 97(0,5)(0,5) = 24.25 ≥ 5 • The researcher will need at least 97 student age of 7
– 12 years old
Case II
• Suppose we wishes to know the random blood glucose level (mg/dl) among medical students in Faculty of Medicine X
Answer
a. The most appropriate study design: Cross-sectional studyOutcome variable : random blood glucose level (numeric) Predictor variable : -
b. The most appropriate statistical analysis for the study: Descriptive statistics
Answer
c. the target population: All medical students in Faculty of Medicine X the accessible population: All medical students in Faculty of Medicine X the study unit of the study: Medical student
d. The appropriate sampling technique for the study: Simple random sampling, stratified random sampling
AnswerThe aspects that can be determined by the researcher from the beginning
• d (precision)
The aspects that must be searched by the researcher from literature or a pilot study
• s (standard deviation)
f. Based on a pilot study, ten students were selected, and the following were the result of their random blood glucose level. Using α= 0.05 and a precision of 2.5 mg/dl, the estimation of sample size needed for the study are:
Answer
• For α = 0.05 then Z0.975 = 1.96 ; d = 2.5 mg/dl ; s = 13.47 mg/dl
• The researcher will need at least 112 medical students
Case III
• One of the batch 2010 medical student prepare to conduct a study (for his minor thesis) on risk factors of diarrhea. Let’s say that the hypothesis was exclusive breastfed babies (first six months of life) will be less dehydrated (mild to moderate vs severe) during diarrhea in their age 7 to 11 months. The researcher wishes to conduct the study in Hasan Sadikin Hospital Bandung period of January – December 2011.
Answer
a. The most appropriate study design? Case-control, cross-sectional study
Outcome variable : dehydration during diarrhea (mild to moderate or severe, nominal) Predictor variable : history of exclusive breastfeeding (yes or no, nominal)
b. The most appropriate statistical analysis for the study: Chi-square test (assuming there are no confounding variables)
Answer
c. The target population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The accessible population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The study unit of the study: Medical record
d. The appropriate sampling technique for the study: Simple random sampling
AnswerThe aspects that can be determined by the researcher from the beginning
• α • β,• p1 – p2
The aspects that must be searched by the researcher from literature or a pilot study
• p2 (depends on the study design)
Answer
• Using α = 0.05, β= 0.2, and difference of proportion considered by the researcher to be clinically significant = 0.2, the estimation of sample size needed for the study are
• For α = 0.05 then Z0.95 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 ; p1 – p2= 0.2
p2 = 18/35 = 0.51 (cross-sectional) p1 = 0.2 + p2 = 0.2 + 0.51 = 0.71 q1 = 1 – p1 = 1 – 0.71 = 0.29 q2 = 1 – p2 = 1 – 0.51 = 0.49 p = (p1+p2)/2 = (0.71 + 0.51)/2 = 0.61 q = 1 – p = 1 – 0.61 = 0.39
p2 = 17/32 = 0.53 (case control)p1 = 0.2 + p2 = 0.2 + 0.53 = 0.73 q1 = 1 – p1 = 1 – 0.73 = 0.27 q2 = 1 – p2 = 1 – 0.53 = 0.47 p = (p1+p2)/2 = (0.73 + 0.53)/2 = 0.63 q = 1 – p = 1 – 0.61 = 0.37
Answer
The researcher will need at least 73 exclusive breastfed babies and 73 non-exclusive breastfed babies diagnosed with diarrhea
Cross sectional study
Answer
• For case group, the researcher will need at least 71 babies diagnosed with diarrhea plus severe dehydration
• For control group, the researcher will need at least 71 babies diagnosed with diarrhea plus mild to moderate dehydration
Case control study
Case IV
• The researcher wishes to compare fasting blood glucose level (mg/dl) between medical students of Faculty of Medicine X with and without family history of DM type II. The subjects were matched according to age and sex.
Answer
a. The most appropriate study design: cross-sectional study
Outcome variable : fasting blood glucose level (numeric) Predictor variable : -
b. The most appropriate statistical analysis for the study: Paired t-test with Wilcoxon signed-rank test as an alternative
Answer
c. The target population: All medical students in Faculty of Medicine X The accessible population: All medical students in Faculty of Medicine X The study unit of the study: Medical student
d. The appropriate sampling technique for the study? Matching technique
AnswerThe aspects that can be determined by the researcher from the beginning
• α • β• x1 – x2
The aspects that must be searched by the researcher from literature or a pilot study
• S (combined standard deviation from two observations)
AnswerBased on a pilot study, six-paired students with family history of DM type II and without family history of DM type II were selected
α = 0.05, β = 0.2, and difference of mean considered by the researcher to be clinically significant = 2.5 mg/dl, the estimation of sample size needed for the study are
Answer
• For α = 0.05 then Z0.975 = 1.96 (two-sided) and β = 0.2 then Z0.8 = 0.84
• x1 – x2 = 2.5 ; s1 = 4.88 mg/dl, n1 = 6 ; s2 = 3.74 mg/dl, n2 = 6
The researcher will need at least 24 of medical students with family history of DM type II and 24 medical students without family history of DM type II (matched according to age and sex)
Case V
• The investigator wants to conduct a cross-sectional study to know whether DM will give negative effect on the treatment outcome of TB. Data will be collected from hospital. The register showed that there are 50 people meet the criteria of inclusion in this study. From previous study, after 6 months of therapy, 9.6% of cultured sputum specimens from non-diabetic patients were still positive for Mycobacterium tuberculosis (RR = 2.65).
Answer
a. Outcome variable : response for treatment (Yes-No, nominal)Predictor variable : random blood glucose level (numeric)
b. The most appropriate statistical analysis for the study: Chi-square test
Answer
c. The target population: All TB patients with DM in Hospital X The accessible population: Adult TB age of 20 to 65 years old diagnosed with DM treated in in Hospital X The study unit of the study: Medical record
d. The appropriate sampling technique for the study? Simple random sampling
• The power of the study in the number of samples taken from a total sampling? (Using = .05) : looking the formula and put the sample size
Case VI
• Let’s say the researcher has a hypothesis that serum 25(OH)-vitamin D levels (ng/ml) is positively correlated with bone mineral density, estimated using the quantitative ultrasound index (QUI), among postmenopausal women in Kecamatan Jatinangor
Answer
a. The most appropriate study design: Case-control, cross-sectional study
Serum 25(OH)-vitamin D levels (numeric) Quantitative ultrasound index (numeric)
b. What is the most appropriate statistical analysis for the study? Correlation methods (Pearson or Spearman’s rho coefficient correlation)
Answer
c. The target population: Postmenopausal women in Kecamatan Jatinangor The accessible population: Women who come to Posbindu Lansia in all villages The study unit of the study: Postmenopausal woman
d. The appropriate sampling technique for the study: Consecutive sampling
AnswerThe aspects that can be determined by the researcher from the beginning
• α • β
The aspects that must be searched by the researcher from literature or a pilot study
• r (Pearson’s correlation coefficient)
Based on pilot study, with 10 participants
For α = 0.05 then Z0.975 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 r = 0.78 (using SPSS or Excel)
Answer
• The researcher will need at least 9 postmenopausal women
Review• Study Design
– Non-specific or specific?– Observational (cross-sectional, case-control, cohort) or experimental?
• Variables– Predictor/dependent and outcome/independent– Scale of measurement
• Categorical (nominal or ordinal)• Numerical
• Paired vs unpaired observation• Hypothesis
– Type I and type II error (α, β)– Power of the study (1 – β)– One or two-sided alternative hypothesis
• Statistical analysis• Sampling technique
– Probabilistic sampling technique– Non-probabilistic sampling technique