Standard Errors of Proportions
Many estimates in managing a dairy involve proportions or probabilities. For example, conception risk is the probability that an inseminated cow will be diagnosed pregnant sometime in the future. Pregnancy risk is the probability that an open, eligible cow will become pregnant in a given 21-day cycle. (Interestingly, neither of these is really a rate in the epidemiologic sense, but rather a risk).
Estimating the proportion is easy – divide the positives by the total. For example, if there are 3 females in the past 10 calves, or 30 ketosis cases in the past 100 freshenings, or 300 pregnancies in the past 1000 breedings, the proportion in each case is 30 percent. However, if we are interested in predicting what might occur in the future, we might want to know how likely it might be to stay at 30%.
Certainly, we have much more confidence when we have 1000 breedings than when we have 10 freshenings. This confidence can be quantified by a confidence interval.
|
Positives |
Total |
Percent |
Lower |
Upper |
|
3 |
10 |
30 |
2 |
58 |
|
30 |
100 |
30 |
21 |
39 |
|
300 |
1000 |
30 |
27 |
33 |
Note that if we recorded 3 females out of the past ten calves, the 95% probability of the next calf being a female is somewhere between 2% and 58%. This means 3 out of 10 does not tell us very much – there is a 28% swing in either direction. On the other hand, 300 pregnancies from 1000 breedings has a range of only 3% in either direction.
One formula for the error is 1.96 * sqrt (0.3 * (1-0.3)/ total). So:
2 * sqrt (.21 / 10) = 2 * 0.14 = 0.28 .30-.28 = 0.02 .30+.28 = .58
2 * sqrt (.21 / 100) = 2 * 0.045 = 0.09 .30-.09 = 0.21 .30+.09 = .39
2 * sqrt (.21/1000) = 2 * 0.015 = 0.03 .30-.03 = 0.27 .30+.03 = .33
The practical applications include comparing AI technicians, cure rates, etc. If the error intervals do not overlap, there is likely a difference.
If tech A has 300 pregnancies of 1000 breedings, and tech B has 40 out of 100, the tech A is (27 - 33), and Tech B is 40 (30,50). Because the intervals overlap, it is not possible to say with much certainty that technician B is better than technician A.
If the cure rate from mastitis treatment X is 70%, and the cure rate from mastitis treatment Y is 60%, and there were 500 mastitis cases in each treatment group, X would be (66 - 74), and Y would be (56 - 64), and we could be confident that treatment X was better than treatment Y.
BREDSUM shows the upper confidence interval by using a black line above the bar graph. As a general rule, the error above is similar to the error below, so the lower bounds were omitted for clarity.
Final caveats: Other issues can destroy these estimates. If technician B is primarily responsible for breeding virgin heifers, and technician A breeds the older cows, there are confounding issues that are like more important than the proportions. Similar issues would arise if technician A started at the beginning of the hot summer months. In the mastitis example, we need to be sure that the treatments were randomized, not that treatment X was used on the mild cases, and treatment Y was only used on the more severe cases.
Nota Bene:
The formula described above is an adequate estimate for proportions between 30% and 70% with large number of observations. However, it is not very good for low proportions, such as the risk of a DA, or death rates. This is clear when you find a negative lower bound, which of course, makes no sense.
For most 95% confidence intervals, Dairy Comp 305 uses a much more robust estimate.
This is the resulting table for 30%:
|
Positives |
Total |
Percent |
Lower |
Upper |
|
3 |
10 |
30 |
11 |
60 |
|
30 |
100 |
30 |
22 |
40 |
|
300 |
1000 |
30 |
27 |
33 |
Note that when there are 1000 observations, the estimates are the same as above
Now look at a 10% risk using the short-cut formula:
|
Positives |
Total |
Percent |
Lower |
Upper |
|
1 |
10 |
10 |
-59 |
69 |
|
10 |
100 |
10 |
-9 |
29 |
|
100 |
1000 |
10 |
4 |
16 |
And the robust one:
|
Positives |
Total |
Percent |
Lower |
Upper |
|
1 |
10 |
10 |
2 |
40 |
|
10 |
100 |
10 |
6 |
17 |
|
100 |
1000 |
10 |
8 |
12 |
When there are few observations or proportions not very close to 50%, the simple formula is incorrect. Also, the CI is not symmetric when the correct formula is used.