estimating sample size computer laboratories epidemiology and biostatistics department faculty of...

Estimating Sample SizeComputer Laboratories

Epidemiology and Biostatistics Department

Faculty of Medicine Universitas Padjadjaran

2013

Reference

• Dahlan, MS. Besar Sampel dan Cara Pengambilan Sampel dalam Penelitian Kedokteran dan Kesehatan. Edisi 3. Jakarta: Salemba Medika; 2008

• Hulley, SB et al. Designing Clinical Research. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2007

Introduction

Whom? what? Design?

↓

How many subjects to sample?

Introduction

• If the sample size is too small fail to answer its research question

• If the sample size is too large more difficult and costly than necessary

Introduction

• Goal to estimate an appropriate number of subjects for a given study design

• Should be estimated early in the design phase, when major changes are still possible– Research design is not feasible– Different predictor or outcome variables are

needed

Reasons for sampling

• Unable to perform total sampling

• Results from representative sample (appropriate number of subjects and sampling technique) can be generalized to population

• More efficient and ethical

Generalization

Study subjects

Intended sample

Accessible population

Study/Target population

Internal validity

External validity I

External validity II

Internal validity

• Representative actual sample/study subjects from intended sample– same characteristics with intended sample– problems: non-response, drop-out, loss to

follow-up

External validity I

• Representative intended sample from accessible population– Appropriate sample size– Probabilistic sampling method

External validity II

• Representative accessible population from target/study population

How to get appropriate sample size?

• Appropriate sample size formula– Can be decided from our research questions/

research problems/problem identification

• Correct sample size calculation

Type of research:Specific design

• Diagnostic– Sensitivity, specificity, PPV, NPV, LR (+), LR (-)

• Prognostic– Example: What are the prognostic factors of

shock in DHF patients?

• Survival analysis– Example: Is there a mortality rate difference

between HIV-patient treated with HAART starting at CD4 count 200 and 200 ?

Type of research:Non-specific design

• Descriptive– To estimate population proportion

• What is the prevalence of diarrhea in Kecamatan X? – To estimate population mean

• What is the mean of FBG level among adults in Kecamatan X?

• Analytic– To find relationship/association between

dependent and independent variable– To find a (proportion, mean) difference between

two or more groups– To find correlation between variables

Notes

• In one study, it is possible to use more than one sample size formula, due to:– More than one research questions– Different study design

• Cohort and nested-case control

Notes

• Stated in advance the primary and secondary research questions/hypotheses

• The sample size calculations are always focused on the primary research question/hypothesis

Power of the study(1 – β)

• Results may be different• Need to be calculated again due to:

– Actual sample/study subjects intended sample

– in correlation study is different

– Effect size (p1 – p2, x1 – x2) is different

– Sample size is predetermined

Z and Z *

Value of or Z

ZDescriptive or Two-sided

One-sided

1% 2.81 2.57 2.57

5% 1.96 1.64 1.64

10% 1.64 1.44 1.44

15% 1.44 1.28 1.28

20% 1.28 0.84 0.84

For two-tailed hypothesis Z1 – /2

For one-tailed hypothesis Z1 –

*From Dahlan, MS, 2008

Strategies for minimizing sample size and maximizing power

• Use continuous variable (for outcome variable)– Permits smaller sample size for a given power– Permits greater power for a given sample size


• Use paired measurements or matching– By comparing each subject with herself, it

removes the baseline between-subjects part of the variability of the outcome variable

– Example: • Change in weight on a diet has less variability than

the final weight• Final weight is highly correlated with initial weight


• Increase the precision– Standardizing the measurement methods– Training and certifying observer– Refining the instrument– Automating the instrument– Repeating the measurement


• Use unequal group sizes– In general, the gain in power when the size of

one group is increase to twice the size of the other is considerable

– Tripling or quadrupling one of the groups provide progressively smaller gains.

– Example:

In a case control study 1 case : 2 controls


• Use more common outcome (with caution!)– More frequent outcome– Enroll subjects at greater risk of developing

that outcome– Extend the follow-up period– Loosen the definition of what constitutes an

outcome

Common Errors to Avoid

• Estimating sample size late during the design of the study most common

• Percentage or rate misinterpreted as numeric• No planning for dropouts or subjects with

missing data• Equal vs unequal sample sizes• Two-sided alternative hypothesis or statistical

analysis (Z1 - /2), but we use one-sided (Z1 - ) during sample size determination

Literature vs Judgement* Variable Descriptive Analytic

Judgement Categorical Probability of type I error = Precision = d

Probability of type I error = (one/two-sided)Probability of type II error = p1 – p2

Numerical Probability of type I error = Precision = d

Probability of type I error = (one/two-sided)Probability of type II error = x1 – x2

Literature or pilot study

Categorical Proportion Proportion in control/non-exposed/standard group = P2

Numerical Standard deviation Combined standard deviation = SCorrelation coefficient = r

*From Dahlan, MS, 2008

Case I

• Students have a variety of reasons for doing research while in medical school. As part of the Jatinangor program you are interesting in reproductive health. The aim of your study is to know the prevalence of puberty (defined by menarche or wet dreams) among primary school children in Kecamatan Jatinangor. There is no previous study on prevalence of puberty in that community.

Answer

a. The most appropriate study design: cross-sectional studyOutcome variable : prevalence of puberty (history of menarche or wet dreams Yes-No, nominal) Predictor variable : -

b. The most appropriate statistical analysis for the study: Descriptive statistics

Answer

c. The target population: All Primary school in Kecamatan Jatinangor The accessible population: Primary school in Kecamatan Jatinangor Study unit of the study: Student age of 7 – 12 years old

d. The appropriate sampling technique for the study: Stratified random sampling, cluster sampling

Answer

e. Using 95% confidence interval ( =.05) and with precision of the study 10 % (within 10% of the true value), the sample size needed and the appropriate sampling technique are :

• For α= 0.05 then Z0.975 = 1.96

make sure npq ≥ 5 97(0,5)(0,5) = 24.25 ≥ 5 • The researcher will need at least 97 student age of 7

– 12 years old

Case II

• Suppose we wishes to know the random blood glucose level (mg/dl) among medical students in Faculty of Medicine X

Answer

a. The most appropriate study design: Cross-sectional studyOutcome variable : random blood glucose level (numeric) Predictor variable : -

b. The most appropriate statistical analysis for the study: Descriptive statistics

Answer

c. the target population: All medical students in Faculty of Medicine X the accessible population: All medical students in Faculty of Medicine X the study unit of the study: Medical student

d. The appropriate sampling technique for the study: Simple random sampling, stratified random sampling

AnswerThe aspects that can be determined by the researcher from the beginning

• d (precision)

The aspects that must be searched by the researcher from literature or a pilot study

• s (standard deviation)

f. Based on a pilot study, ten students were selected, and the following were the result of their random blood glucose level. Using α= 0.05 and a precision of 2.5 mg/dl, the estimation of sample size needed for the study are:

Answer

• For α = 0.05 then Z0.975 = 1.96 ; d = 2.5 mg/dl ; s = 13.47 mg/dl

• The researcher will need at least 112 medical students

Case III

• One of the batch 2010 medical student prepare to conduct a study (for his minor thesis) on risk factors of diarrhea. Let’s say that the hypothesis was exclusive breastfed babies (first six months of life) will be less dehydrated (mild to moderate vs severe) during diarrhea in their age 7 to 11 months. The researcher wishes to conduct the study in Hasan Sadikin Hospital Bandung period of January – December 2011.

Answer

a. The most appropriate study design? Case-control, cross-sectional study

Outcome variable : dehydration during diarrhea (mild to moderate or severe, nominal) Predictor variable : history of exclusive breastfeeding (yes or no, nominal)

b. The most appropriate statistical analysis for the study: Chi-square test (assuming there are no confounding variables)

Answer

c. The target population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The accessible population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The study unit of the study: Medical record

d. The appropriate sampling technique for the study: Simple random sampling


• α • β,• p1 – p2


• p2 (depends on the study design)

Answer

• Using α = 0.05, β= 0.2, and difference of proportion considered by the researcher to be clinically significant = 0.2, the estimation of sample size needed for the study are

• For α = 0.05 then Z0.95 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 ; p1 – p2= 0.2

p2 = 18/35 = 0.51 (cross-sectional) p1 = 0.2 + p2 = 0.2 + 0.51 = 0.71 q1 = 1 – p1 = 1 – 0.71 = 0.29 q2 = 1 – p2 = 1 – 0.51 = 0.49 p = (p1+p2)/2 = (0.71 + 0.51)/2 = 0.61 q = 1 – p = 1 – 0.61 = 0.39

p2 = 17/32 = 0.53 (case control)p1 = 0.2 + p2 = 0.2 + 0.53 = 0.73 q1 = 1 – p1 = 1 – 0.73 = 0.27 q2 = 1 – p2 = 1 – 0.53 = 0.47 p = (p1+p2)/2 = (0.73 + 0.53)/2 = 0.63 q = 1 – p = 1 – 0.61 = 0.37

Answer

The researcher will need at least 73 exclusive breastfed babies and 73 non-exclusive breastfed babies diagnosed with diarrhea

Cross sectional study

Answer

• For case group, the researcher will need at least 71 babies diagnosed with diarrhea plus severe dehydration

• For control group, the researcher will need at least 71 babies diagnosed with diarrhea plus mild to moderate dehydration

Case control study

Case IV

• The researcher wishes to compare fasting blood glucose level (mg/dl) between medical students of Faculty of Medicine X with and without family history of DM type II. The subjects were matched according to age and sex.

Answer

a. The most appropriate study design: cross-sectional study

Outcome variable : fasting blood glucose level (numeric) Predictor variable : -

b. The most appropriate statistical analysis for the study: Paired t-test with Wilcoxon signed-rank test as an alternative

Answer

c. The target population: All medical students in Faculty of Medicine X The accessible population: All medical students in Faculty of Medicine X The study unit of the study: Medical student

d. The appropriate sampling technique for the study? Matching technique


• α • β• x1 – x2


• S (combined standard deviation from two observations)

AnswerBased on a pilot study, six-paired students with family history of DM type II and without family history of DM type II were selected

α = 0.05, β = 0.2, and difference of mean considered by the researcher to be clinically significant = 2.5 mg/dl, the estimation of sample size needed for the study are

Answer

• For α = 0.05 then Z0.975 = 1.96 (two-sided) and β = 0.2 then Z0.8 = 0.84

• x1 – x2 = 2.5 ; s1 = 4.88 mg/dl, n1 = 6 ; s2 = 3.74 mg/dl, n2 = 6

The researcher will need at least 24 of medical students with family history of DM type II and 24 medical students without family history of DM type II (matched according to age and sex)

Case V

• The investigator wants to conduct a cross-sectional study to know whether DM will give negative effect on the treatment outcome of TB. Data will be collected from hospital. The register showed that there are 50 people meet the criteria of inclusion in this study. From previous study, after 6 months of therapy, 9.6% of cultured sputum specimens from non-diabetic patients were still positive for Mycobacterium tuberculosis (RR = 2.65).

Answer

a. Outcome variable : response for treatment (Yes-No, nominal)Predictor variable : random blood glucose level (numeric)

b. The most appropriate statistical analysis for the study: Chi-square test

Answer

c. The target population: All TB patients with DM in Hospital X The accessible population: Adult TB age of 20 to 65 years old diagnosed with DM treated in in Hospital X The study unit of the study: Medical record

d. The appropriate sampling technique for the study? Simple random sampling

• The power of the study in the number of samples taken from a total sampling? (Using = .05) : looking the formula and put the sample size

Case VI

• Let’s say the researcher has a hypothesis that serum 25(OH)-vitamin D levels (ng/ml) is positively correlated with bone mineral density, estimated using the quantitative ultrasound index (QUI), among postmenopausal women in Kecamatan Jatinangor

Answer

a. The most appropriate study design: Case-control, cross-sectional study

Serum 25(OH)-vitamin D levels (numeric) Quantitative ultrasound index (numeric)

b. What is the most appropriate statistical analysis for the study? Correlation methods (Pearson or Spearman’s rho coefficient correlation)

Answer

c. The target population: Postmenopausal women in Kecamatan Jatinangor The accessible population: Women who come to Posbindu Lansia in all villages The study unit of the study: Postmenopausal woman

d. The appropriate sampling technique for the study: Consecutive sampling


• α • β


• r (Pearson’s correlation coefficient)

Based on pilot study, with 10 participants

For α = 0.05 then Z0.975 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 r = 0.78 (using SPSS or Excel)

Answer

• The researcher will need at least 9 postmenopausal women

Review• Study Design

– Non-specific or specific?– Observational (cross-sectional, case-control, cohort) or experimental?

• Variables– Predictor/dependent and outcome/independent– Scale of measurement

• Categorical (nominal or ordinal)• Numerical

• Paired vs unpaired observation• Hypothesis

– Type I and type II error (α, β)– Power of the study (1 – β)– One or two-sided alternative hypothesis

• Statistical analysis• Sampling technique

– Probabilistic sampling technique– Non-probabilistic sampling technique

estimating sample size computer laboratories epidemiology and biostatistics department faculty of...

Documents