|
Introduction
Serology is an important tool in monitoring vaccines and maternal
antibodies, establishing proper vaccination timing, detecting
infection, and determining disease prevalence. To maximize the
benefits of this monitoring tool, companies should make it part of a
comprehensive preventive medicine program.
Well-defined objectives and the correct interpretation of data are
necessary to obtain the real benefits of serology. In addition to
knowing the test specifications (i.e., sensitivity, specificity,
predictive value) and the historical data of the population, a
monitoring program should be established.
A monitoring program depends on sample size and frequency of
sampling. One of the main concerns in the field is the amount of
samples to be collected. The determination of the minimum amount of
samples is vital for the validity of the results.
Due to its statistical value, a 30-sample size is widely used in
veterinary medicine as well as in other areas. This sample size was
extensively used in monitoring programs until the pressure for
decreasing costs started to be a priority for companies. As a
consequence, the number of serological samples collected was reduced.
The key questions are:
In this brief discussion, we will attempt to unify the basic
concepts of sampling, show why the 30-sample size is recommended, and
determine what the implications are when working with various samples
sizes.
Sampling Concepts
A sample is any part of a population, whereas sampling
is the process of collecting samples from a population.1
The idea behind utilizing sampling in diagnostic tests is that the
collection of relative data of some elements of the population, and
the corresponding analysis thereof, should provide relevant
information applicable to the whole population. Sampling is closely
related to the basis of the process by which antibodies or antigens
are investigated by scrutiny; investigation of just one part of the
population to make inferences for the whole, instead of dealing with
the whole, in which case it would be a census.
Sampling is based on two premises. First, the similarity among the
elements of the population is such that a certain number of them will
properly represent the characteristics of the whole population.
Second, the discrepancies between the values of the population
variables (parameters) and the values of the same variables obtained
from the sample (statistic) are minimized because some of the
measurements underestimate the value of the parameter while others
overestimate it. If the sample has been properly obtained
(representative samples), the variations in those values tend to
counterbalance and cancel one another, resulting in sample
measurements that are, in general, close to the one of the population.3
Characteristics of a Good
Sample
The essence of a good sample lies in establishing a means to infer, as
precisely as possible, the characteristics of a population through
measurement of the characteristics of the sample. A good sample
comprises:
-
Precision: The agreement of the results obtained
from the sample (statistic) and the corresponding results that would
be measured in the whole population (parameters). Precision is the
measurement of the sampling error; the smaller the sampling error,
the higher the precision of the sample.2
-
Efficiency: The comparative measurement between
different sampling projects. It is said that a given project is more
efficient than others in specific conditions if it gives more
reliable and economic results with the same precision, or higher
precision with the same cost. It is important for a sample to be
precise and efficient (Figure 1).2
-
Accuracy: The degree of absence of nonsampling
errors in the sample. A sample is considered accurate if the
overestimate and underestimate measurements compensate each other
among the components of the sample.2
Figure 1.

Steps for Selecting Samples:
Step 1
: Definition of the population of interest
Step 2
: Determination of sample size
Step 3
: Determination of specific procedure for sample selection
Step 4
: Collection of sample based on the above steps
Types of Sampling
There are a wide variety of sampling types, but a distinction should
be made between probability sampling and nonprobability sampling.
Probability Sampling: Each element of the
population has a known (nonzero) chance of being selected to be part
of the sample. This is also known as random sampling.
Nonprobability Sampling: The selection of the
elements to be included in the sample is dependent upon the
investigator's judgment.
It is important to notice that in any population there are many
possible samples of any size. What should be kept in mind is that classical
statistical inference is based on what happens when different samples
of the same size are repeatedly selected in the same population.
Consider the analysis of three samplings of five samples each (n=5)
from a known population, and calculation of their means. The means
will be close, yet different. It can be imagined that there are three
different parameters in the population; however, the statistical
theory recommends not stopping at three samplings, but to keep taking
samples until certain mean values are repeated with more frequency.
Then it will be apparent that sample means closer to the population
mean will be repeated more frequently than the more distant ones. When
those values are plotted on a two-axis system, a Gaussian curve
(normal curve) will be observed. This distribution of the sample means
is known as the sampling distribution of the means, or sampling
distribution.1,2
Symmetrically positioned intervals centered on the most likely mean
are called confidence intervals of the mean. Three intervals are
commonly referenced. The first is 68%, the second 95% and the third
99%. Figure 2 illustrates the Gaussian curve
and the respective confidence intervals.
Figure 2. Area under the
normal curve for 1, 2 and 3 standard deviations from the mean.2

To understand the meaning of these intervals, consider the 68%
confidence interval. It is clear that the sampling distribution mean
is equal to the mean of the population, and that in practice, of all
the possible samples of size n, only one is taken and its
mean is used as an estimator of the population mean (which is
unknown). Notice that the sampling mean may or may not be within the
calculated confidence interval, 68% in this case. To be in the 68%
level of confidence interval does not mean that there are 68 chances
out of a 100 for the sampling mean to be included in such interval;
instead, it means that if 100 different random samples were taken from
the population, and the 68% confidence interval constructed for each,
the population mean is expected to fall within 68 of these intervals.
Sample Size and Data
Precision
For same size samples, the higher the confidence level, the higher the
precision. Precision also increases as the number of elements in the
sample increases, but the increase in sample number is not
proportional to data precision.1,2
There are tables that can be used that include three components for
errors in the range of 1% to 10%, and for confidence levels of 68%,
95% and 99.7% (Table 1).
Table 1: Correlation of
error, confidence level and number of elements for a sample of
infinite dichotomous populations (n>3000).2
|
| Error(e) |
n=PQ/e2 68% |
n=4PQ/e2 95% |
n=9PQ/e2 99% |
P=Q=0.50
|
| 0.01 |
2500 |
10000 |
22500 |
| 0.02 |
625 |
2500 |
5625 |
| 0.03 |
278 |
1112 |
2502 |
| 0.04 |
156 |
624 |
1404 |
| 0.05 |
100 |
400 |
900 |
| 0.06 |
70 |
280 |
630 |
| 0.07 |
51 |
204 |
459 |
| 0.08 |
39 |
156 |
351 |
| 0.09 |
31 |
124 |
279 |
| 0.10 |
25 |
100 |
225 |
|
This is a general table that can be used on calculations for
samples from studies of diverse natures. According to the table, for
an infinite population with 9% error and a confidence level of 68%, 31
samples should be taken in a probability sampling to determine the
presence of a particular disease and/or vaccine immune response
situation in at least 68% of the population.
Sampling Strategy According
to the Objective of the Study
Regarding the serology objective as related to the sampling type, two
situations may be considered: disease detection and disease prevalence
determination. We will focus on disease detection in
vaccinated animals, a common study at private laboratory level.5
The formula to calculate sample size, considering disease
prevalence in an infinite population (>3000) is:
n =
log (error) ÷ log (1- disease prevalence)
Example:
What would be the sampling strategy to detect a determined disease in
a population with 10% prevalence and 95% confidence?
n = log0.05 ÷ log
(1-0.10)
Answer:
n = 29
This means that if we take a minimum of 29 animals, we will have
the chance to detect at least one infected animal, with a confidence
level of 95%. Table 2 illustrates the sample
size necessary to detect disease given various levels of confidence
and prevalence.
Table 2. Estimated sample
sizes to detect disease in populations with a large number of
individuals (2000).2
|
| Confidence |
Percent Prevalence (%) |
| |
5% |
10% |
15% |
| 99% |
90 |
44 |
28 |
| 95% |
58 |
28 |
18 |
| 90% |
45 |
22 |
14 |
| 85% |
37 |
18 |
12 |
| 80% |
31 |
15 |
10 |
| 75% |
27 |
13 |
9 |
|
The number of samples needed to ensure 95% confidence for disease
detection (that is 95% probability of disease detection), considering
a population with an infection rate of 10%, is 28. The difference
between 28 and 29 samples can be accounted for by the table's basis on
a population of 2000 individuals and the calculation having been done
on a population equal to or greater than 3000 individuals (infinite).
From Table 2 it can be seen that when the
number of samples decreases, you have to work either with a lower
percentage of confidence or wait until the disease increases in
prevalence to be detected. In preventive medicine, the
objective is to detect disease as early as possible, with the highest
index of confidence. This is why, when working with 95%
confidence to detect an infection with 10% prevalence, the recommended
sample size is 28.
In the field, sampling frequency is a common justification for
using a small sample size. Caution is well advised in this case, since
error is not corrected by taking repeated small number of samples. For
example, suppose that a company has elected to use a sample size of
10, instead of 29 in the hope of detecting 15% prevalence in a herd of
size 2000. From Table 2 above, this decision
reduces their confidence from 99% to 80%. If 13 samples are taken from
each house every five weeks, it will result in 75% confidence to
detect 10% prevalence.
Another way to understand the consequences of decreasing the sample
size is revealed in the following table. Table 3
shows the number of samples needed, according to the herd size, to
detect infection in at least one animal (n=1), in a population with 5%
prevalence and 95% confidence. Compare the confidence percentage, when
the number of samples is reduced to 10.
Table 3. Number of samples,
according to population size, to detect infection with 95% confidence
in a population with 5% of disease prevalence, and degree of detection
confidence when the number of samples is reduced to 10.4
|
|
| 20 |
19 |
95% |
10 |
50% |
| 40 |
31 |
95% |
10 |
44% |
| 60 |
38 |
95% |
10 |
43% |
| 80 |
42 |
95% |
10 |
42% |
| 100 |
45 |
95% |
10 |
42% |
| 120 |
47 |
95% |
10 |
41% |
| 160 |
49 |
95% |
10 |
41% |
| 200 |
51 |
95% |
10 |
41% |
| 300 |
54 |
95% |
10 |
40% |
| 400 |
55 |
95% |
10 |
40% |
| 500 |
56 |
95% |
10 |
40% |
| 1000 |
57 |
95% |
10 |
40% |
| 2000 |
58 |
95% |
10 |
40% |
| >=3000 |
59 |
95% |
10 |
40% |
|
With only 10 samples per population, the probability of detecting
any infection decreases to 40% (assuming 5% prevalence).
Table 4. Number of samples to
have 95% probability to detect one (1) or more positives in an
infected population.2
|
Population
Size |
Percent Prevalence (%) |
|
| |
50 |
40 |
30 |
25 |
20 |
15 |
10 |
5 |
2 |
1 |
0.5 |
0.1 |
| 20 |
4 |
6 |
7 |
9 |
10 |
12 |
16 |
19 |
20 |
20 |
20 |
20 |
| 30 |
4 |
6 |
8 |
9 |
11 |
14 |
19 |
26 |
30 |
30 |
30 |
30 |
| 40 |
5 |
6 |
8 |
10 |
12 |
15 |
21 |
31 |
40 |
40 |
40 |
40 |
| 50 |
5 |
6 |
8 |
10 |
12 |
16 |
22 |
35 |
46 |
50 |
50 |
50 |
| 60 |
5 |
6 |
8 |
10 |
12 |
16 |
23 |
38 |
55 |
60 |
60 |
60 |
| 70 |
5 |
6 |
8 |
10 |
13 |
17 |
24 |
40 |
62 |
70 |
70 |
70 |
| 80 |
5 |
6 |
8 |
10 |
13 |
17 |
24 |
42 |
68 |
79 |
80 |
80 |
| 90 |
5 |
6 |
8 |
10 |
13 |
17 |
25 |
43 |
73 |
87 |
90 |
90 |
| 100 |
5 |
6 |
9 |
10 |
13 |
17 |
25 |
45 |
78 |
96 |
100 |
100 |
| 150 |
5 |
6 |
9 |
11 |
13 |
18 |
27 |
49 |
95 |
130 |
148 |
150 |
| 200 |
5 |
6 |
9 |
11 |
13 |
18 |
27 |
51 |
105 |
155 |
190 |
200 |
| 500 |
5 |
6 |
9 |
11 |
14 |
19 |
28 |
56 |
129 |
225 |
349 |
500 |
| 1000 |
5 |
6 |
9 |
11 |
14 |
19 |
29 |
57 |
138 |
258 |
450 |
950 |
| 5000 |
5 |
6 |
9 |
11 |
14 |
19 |
29 |
59 |
147 |
290 |
564 |
2253 |
| 10000 |
5 |
6 |
9 |
11 |
14 |
19 |
29 |
59 |
148 |
294 |
581 |
2588 |
| ∞ |
5 |
6 |
9 |
11 |
14 |
19 |
29 |
59 |
149 |
299 |
596 |
2995 |
|
Conclusion
The basic concepts of sampling should be known before establishing a
monitoring program. Statistical tables should always be consulted
before changing sample sizes. When a company decides to reduce costs,
it should also recognize that lowering the number of samples could
lead to a loss of information and a potential misinterpretation of
results, which actually can lead to a higher cost of production.
The advantage of serology, mainly ELISA, is early disease
detection. The use of 23 samples (between 29 and 19) for vaccinated
animals, resulting in a 95% confidence (or probability) in detecting
diseases with 1015% prevalence is strongly recommended. The
number of samples for detection of diseases of slow transmission and
low prevalence (<5%) can be taken from Table 4.
|