### Abstract

The potential exposure to and infection by foodborne microorganisms depend, among other factors, on the microbial concentrations in food and on the microbial behaviour (growth, survival and transfer) along the food chain. Both factors are therefore important inputs in QMRA.

Since microbial concentrations vary among different samples of a food lot, probability distributions are used to describe these concentrations in QMRA. As microbial behaviour varies with food storage conditions (because it depends on intrinsic properties of food andextrinsic environmental variables), predictive models of bacterial growth and survival that account for those factors are used in QMRA, to describe expected changes in bacterial concentrations.

Both probability distributions and predictive models may contribute to the imprecision of QMRA: on one hand, there are several distribution alternatives available to describe concentrations and several methods to fit distributions to bacterial data; on the other hand predictive models are built based on controlled laboratory experiments of microbial behaviour, andmay not be appropriate to apply in the context of real food. Hence, these models need to be validated with independent data for conditions of real food before use in QMRA.

The overall goal of the work presented in this thesis is to study different factors related to quantitative microbial data that may have an impact on the outcome ofQMRA, in order to find appropriate solutions that limit the imprecision of risk estimates. A new method of fitting a distribution to microbial data is developed that estimates both prevalence and distribution of concentrations (manuscript I). Different probability distributions are used to describe concentrations in a simple QMRA model and the risk estimates obtained are compared (manuscript II). The predictive accuracy ofa microbial growth model against different literature datasets are compared in order to identify different factors related to experimental data collection with a relevant impact on the model evaluation process (manuscript III).

In manuscript I (“Fitting a distribution to microbial counts: making sense of zeroes”) it is hypothesised that when “artificial” zero microbial counts, which originate by chance from contaminated food products, are not separated from “true” zeroes originating from uncontaminated products, the estimates of prevalence and concentration may be inaccurate. Such inaccuracy may have an especially relevant impact in QMRA in situations where highly pathogenic microorganisms are involved and where growth can occur along the food pathway. Hence, a method is developed that provides accurate estimates of concentration parameters and differentiates between artificial and true zeroes, thus also accurately estimating prevalence. It is demonstrated that depending on the original distribution of concentrations and the limit of quantification (LOQ) of microbial enumeration, it may be incorrect to treat artificial zeroes as censored below a quantification threshold. The method that is presented estimates the prevalence of contamination within a food lot and the parameters (mean and standard deviation)characterizing the within-lot distribution of concentrations, without assuming a LOQ, and using raw plate count data as input. Counts resulting both from contaminated and uncontaminated sample units are analysed together, which allows estimating the proportion of artificial zeroes among the total of zero counts.

The method yields good estimates of mean, standard deviation and prevalence, especially at low prevalence levels and low expected standard deviation. This study shows that one of the keys to an accurate characterization of the overall microbial contamination is the correct identification and separation of true and artificial zeroes, and that estimation of prevalence and estimation of the distribution of concentrations are interrelated and therefore should be done simultaneously.

In manuscript II (“Impact of microbial count distributionson human health risk estimates”) the impact of fitting microbial distributions on risk estimates is investigated at two different concentration scenarios and at a range of prevalence levels. Four different parametric distributions are used to investigate the importance of accounting for the randomness in counts, the difference between treating true zeroes as such or as censored below a LOQ and the importance of making the correct assumption about the underlying distribution of concentrations. By running a simulation experiment it is possible to assess the difference between expected risk and the risk estimated with using a lognormal, a zero-inflated lognormal, a Poissongamma and a zero-inflated Poisson-lognormal distribution.The method developed in manuscript I is used in this study to fit the latter.

The results show that the impact of the choice of different probability distributions to describe concentrations at retail on risk estimates depends both on the concentration and prevalence levels, but that in general it is larger at high levels of microbial contamination (high prevalence and high concentration). Also, a zeroinflation tends to improve the accuracy of the risk estimates.

In manuscript III (“Variability and uncertainty in the evaluation of predictive models with literature data – consequences to quantitative microbiological risk assessment”) it is assessed how different growth settings inherent to literature datasets affect the performance of a growth model compared to its performance with the data used to generate it. The effect of the numberof observations, the ranges of temperature, water activity and pH under which observations were made, the presence or absence of lactic acid in the growth environment, the use of a pathogenic or non-pathogenic strain and the type of growth environment on model performance are analysed. Model performance is measured in terms of DifAf- the difference between the accuracy factor (Af) of the model with the data used to generate it and the Af with an independent dataset. The study is performed using a square root-type model for the growth rate of Escherichia coliin response to four environmental factors and literature data that have been previously used to evaluate this model. It is hypothesised that the Afof the model with the data used to generate it reflects the model’s best possible performance, and hence DifAfis smaller and less variant when the conditions of an independent dataset are closer to the data that originated the model. The distributions of DifAfvalues obtained with different datasets are compared graphically and statistically.

The results suggest that if predictive models developed under controlled experimental conditions are validated against independent datasets collected from published literature, these datasets must contain a high number of observations and be based on a similar experimental growth media in order to reduce the variation of model performance. By reducing this variation, the contribution of the predictive model with uncertainty and variability sources in QMRA also decreases, which affects positively the precision of the risk estimates.

To conclude, this thesis contributes to the clarification of the impact that the analysis of microbial data may have in QMRA, provides a new accurate method of fitting a distribution to microbial data, and suggests guidelines for the selection of appropriate published datasets for the validation of predictive models of microbial behaviour, before their use in QMRA.

Perspectives of future work include the validation of the method developed in manuscript I with real data, and its presentation as a tool made available to the scientific community by developing, for example, a working package for the statistical software R. Also, the author expects that a standardized way of reporting microbial counts that clearly specifies the steps taken during data collection should be adopted in the future. Extending the work presented on manuscript II will allow obtaining more sound conclusions about the general impact of different frequency distributions on risk estimates. Following manuscript III, a simulation study could help to investigate to what level QMRA-targeted development and validation of predictive models are necessary for the accurate estimation of risk. Future needs in food microbiology and QMRA include the development of appropriate statistical methods to summarize novel data obtained from different “omics” technologies, adaptation of the current structure of QMRA studies to allow them to make the use of such data, and the assessment of the variabilityand uncertainty attending those data.

Original language | English |
---|

Place of Publication | Søborg |
---|---|

Publisher | National Food Institute, Technical University of Denmark |

Number of pages | 140 |

Publication status | Published - 2013 |

### Cite this

*The interpretation of quantitative microbial data: meeting the demands of quantitative microbiological risk assessment*. Søborg: National Food Institute, Technical University of Denmark.

}

*The interpretation of quantitative microbial data: meeting the demands of quantitative microbiological risk assessment*. National Food Institute, Technical University of Denmark, Søborg.

**The interpretation of quantitative microbial data : meeting the demands of quantitative microbiological risk assessment.** / Ribeiro Duarte, Ana Sofia.

Research output: Book/Report › Ph.D. thesis › Research

TY - BOOK

T1 - The interpretation of quantitative microbial data

T2 - meeting the demands of quantitative microbiological risk assessment

AU - Ribeiro Duarte, Ana Sofia

PY - 2013

Y1 - 2013

N2 - Foodborne diseases carry important social, health, political and economic consequences. Quantitative microbiological risk assessment (QMRA) is a science based tool used to estimate the risk that foodborne pathogens pose to human health, i.e. it estimates the number of cases of human foodborne infection or disease due to ingestion of a specific pathogenic microorganism conveyed by specific food products; it is also used to assess the effect of different control measures. In their role of risk managers, public authorities base their policies on the outcome of risk assessmentstudies. Therefore, they need to be transparent and affected by minimum imprecision. The potential exposure to and infection by foodborne microorganisms depend, among other factors, on the microbial concentrations in food and on the microbial behaviour (growth, survival and transfer) along the food chain. Both factors are therefore important inputs in QMRA. Since microbial concentrations vary among different samples of a food lot, probability distributions are used to describe these concentrations in QMRA. As microbial behaviour varies with food storage conditions (because it depends on intrinsic properties of food andextrinsic environmental variables), predictive models of bacterial growth and survival that account for those factors are used in QMRA, to describe expected changes in bacterial concentrations. Both probability distributions and predictive models may contribute to the imprecision of QMRA: on one hand, there are several distribution alternatives available to describe concentrations and several methods to fit distributions to bacterial data; on the other hand predictive models are built based on controlled laboratory experiments of microbial behaviour, andmay not be appropriate to apply in the context of real food. Hence, these models need to be validated with independent data for conditions of real food before use in QMRA. The overall goal of the work presented in this thesis is to study different factors related to quantitative microbial data that may have an impact on the outcome ofQMRA, in order to find appropriate solutions that limit the imprecision of risk estimates. A new method of fitting a distribution to microbial data is developed that estimates both prevalence and distribution of concentrations (manuscript I). Different probability distributions are used to describe concentrations in a simple QMRA model and the risk estimates obtained are compared (manuscript II). The predictive accuracy ofa microbial growth model against different literature datasets are compared in order to identify different factors related to experimental data collection with a relevant impact on the model evaluation process (manuscript III). In manuscript I (“Fitting a distribution to microbial counts: making sense of zeroes”) it is hypothesised that when “artificial” zero microbial counts, which originate by chance from contaminated food products, are not separated from “true” zeroes originating from uncontaminated products, the estimates of prevalence and concentration may be inaccurate. Such inaccuracy may have an especially relevant impact in QMRA in situations where highly pathogenic microorganisms are involved and where growth can occur along the food pathway. Hence, a method is developed that provides accurate estimates of concentration parameters and differentiates between artificial and true zeroes, thus also accurately estimating prevalence. It is demonstrated that depending on the original distribution of concentrations and the limit of quantification (LOQ) of microbial enumeration, it may be incorrect to treat artificial zeroes as censored below a quantification threshold. The method that is presented estimates the prevalence of contamination within a food lot and the parameters (mean and standard deviation)characterizing the within-lot distribution of concentrations, without assuming a LOQ, and using raw plate count data as input. Counts resulting both from contaminated and uncontaminated sample units are analysed together, which allows estimating the proportion of artificial zeroes among the total of zero counts. The method yields good estimates of mean, standard deviation and prevalence, especially at low prevalence levels and low expected standard deviation. This study shows that one of the keys to an accurate characterization of the overall microbial contamination is the correct identification and separation of true and artificial zeroes, and that estimation of prevalence and estimation of the distribution of concentrations are interrelated and therefore should be done simultaneously. In manuscript II (“Impact of microbial count distributionson human health risk estimates”) the impact of fitting microbial distributions on risk estimates is investigated at two different concentration scenarios and at a range of prevalence levels. Four different parametric distributions are used to investigate the importance of accounting for the randomness in counts, the difference between treating true zeroes as such or as censored below a LOQ and the importance of making the correct assumption about the underlying distribution of concentrations. By running a simulation experiment it is possible to assess the difference between expected risk and the risk estimated with using a lognormal, a zero-inflated lognormal, a Poissongamma and a zero-inflated Poisson-lognormal distribution.The method developed in manuscript I is used in this study to fit the latter. The results show that the impact of the choice of different probability distributions to describe concentrations at retail on risk estimates depends both on the concentration and prevalence levels, but that in general it is larger at high levels of microbial contamination (high prevalence and high concentration). Also, a zeroinflation tends to improve the accuracy of the risk estimates. In manuscript III (“Variability and uncertainty in the evaluation of predictive models with literature data – consequences to quantitative microbiological risk assessment”) it is assessed how different growth settings inherent to literature datasets affect the performance of a growth model compared to its performance with the data used to generate it. The effect of the numberof observations, the ranges of temperature, water activity and pH under which observations were made, the presence or absence of lactic acid in the growth environment, the use of a pathogenic or non-pathogenic strain and the type of growth environment on model performance are analysed. Model performance is measured in terms of DifAf- the difference between the accuracy factor (Af) of the model with the data used to generate it and the Af with an independent dataset. The study is performed using a square root-type model for the growth rate of Escherichia coliin response to four environmental factors and literature data that have been previously used to evaluate this model. It is hypothesised that the Afof the model with the data used to generate it reflects the model’s best possible performance, and hence DifAfis smaller and less variant when the conditions of an independent dataset are closer to the data that originated the model. The distributions of DifAfvalues obtained with different datasets are compared graphically and statistically. The results suggest that if predictive models developed under controlled experimental conditions are validated against independent datasets collected from published literature, these datasets must contain a high number of observations and be based on a similar experimental growth media in order to reduce the variation of model performance. By reducing this variation, the contribution of the predictive model with uncertainty and variability sources in QMRA also decreases, which affects positively the precision of the risk estimates. To conclude, this thesis contributes to the clarification of the impact that the analysis of microbial data may have in QMRA, provides a new accurate method of fitting a distribution to microbial data, and suggests guidelines for the selection of appropriate published datasets for the validation of predictive models of microbial behaviour, before their use in QMRA.Perspectives of future work include the validation of the method developed in manuscript I with real data, and its presentation as a tool made available to the scientific community by developing, for example, a working package for the statistical software R. Also, the author expects that a standardized way of reporting microbial counts that clearly specifies the steps taken during data collection should be adopted in the future. Extending the work presented on manuscript II will allow obtaining more sound conclusions about the general impact of different frequency distributions on risk estimates. Following manuscript III, a simulation study could help to investigate to what level QMRA-targeted development and validation of predictive models are necessary for the accurate estimation of risk. Future needs in food microbiology and QMRA include the development of appropriate statistical methods to summarize novel data obtained from different “omics” technologies, adaptation of the current structure of QMRA studies to allow them to make the use of such data, and the assessment of the variabilityand uncertainty attending those data.

AB - Foodborne diseases carry important social, health, political and economic consequences. Quantitative microbiological risk assessment (QMRA) is a science based tool used to estimate the risk that foodborne pathogens pose to human health, i.e. it estimates the number of cases of human foodborne infection or disease due to ingestion of a specific pathogenic microorganism conveyed by specific food products; it is also used to assess the effect of different control measures. In their role of risk managers, public authorities base their policies on the outcome of risk assessmentstudies. Therefore, they need to be transparent and affected by minimum imprecision. The potential exposure to and infection by foodborne microorganisms depend, among other factors, on the microbial concentrations in food and on the microbial behaviour (growth, survival and transfer) along the food chain. Both factors are therefore important inputs in QMRA. Since microbial concentrations vary among different samples of a food lot, probability distributions are used to describe these concentrations in QMRA. As microbial behaviour varies with food storage conditions (because it depends on intrinsic properties of food andextrinsic environmental variables), predictive models of bacterial growth and survival that account for those factors are used in QMRA, to describe expected changes in bacterial concentrations. Both probability distributions and predictive models may contribute to the imprecision of QMRA: on one hand, there are several distribution alternatives available to describe concentrations and several methods to fit distributions to bacterial data; on the other hand predictive models are built based on controlled laboratory experiments of microbial behaviour, andmay not be appropriate to apply in the context of real food. Hence, these models need to be validated with independent data for conditions of real food before use in QMRA. The overall goal of the work presented in this thesis is to study different factors related to quantitative microbial data that may have an impact on the outcome ofQMRA, in order to find appropriate solutions that limit the imprecision of risk estimates. A new method of fitting a distribution to microbial data is developed that estimates both prevalence and distribution of concentrations (manuscript I). Different probability distributions are used to describe concentrations in a simple QMRA model and the risk estimates obtained are compared (manuscript II). The predictive accuracy ofa microbial growth model against different literature datasets are compared in order to identify different factors related to experimental data collection with a relevant impact on the model evaluation process (manuscript III). In manuscript I (“Fitting a distribution to microbial counts: making sense of zeroes”) it is hypothesised that when “artificial” zero microbial counts, which originate by chance from contaminated food products, are not separated from “true” zeroes originating from uncontaminated products, the estimates of prevalence and concentration may be inaccurate. Such inaccuracy may have an especially relevant impact in QMRA in situations where highly pathogenic microorganisms are involved and where growth can occur along the food pathway. Hence, a method is developed that provides accurate estimates of concentration parameters and differentiates between artificial and true zeroes, thus also accurately estimating prevalence. It is demonstrated that depending on the original distribution of concentrations and the limit of quantification (LOQ) of microbial enumeration, it may be incorrect to treat artificial zeroes as censored below a quantification threshold. The method that is presented estimates the prevalence of contamination within a food lot and the parameters (mean and standard deviation)characterizing the within-lot distribution of concentrations, without assuming a LOQ, and using raw plate count data as input. Counts resulting both from contaminated and uncontaminated sample units are analysed together, which allows estimating the proportion of artificial zeroes among the total of zero counts. The method yields good estimates of mean, standard deviation and prevalence, especially at low prevalence levels and low expected standard deviation. This study shows that one of the keys to an accurate characterization of the overall microbial contamination is the correct identification and separation of true and artificial zeroes, and that estimation of prevalence and estimation of the distribution of concentrations are interrelated and therefore should be done simultaneously. In manuscript II (“Impact of microbial count distributionson human health risk estimates”) the impact of fitting microbial distributions on risk estimates is investigated at two different concentration scenarios and at a range of prevalence levels. Four different parametric distributions are used to investigate the importance of accounting for the randomness in counts, the difference between treating true zeroes as such or as censored below a LOQ and the importance of making the correct assumption about the underlying distribution of concentrations. By running a simulation experiment it is possible to assess the difference between expected risk and the risk estimated with using a lognormal, a zero-inflated lognormal, a Poissongamma and a zero-inflated Poisson-lognormal distribution.The method developed in manuscript I is used in this study to fit the latter. The results show that the impact of the choice of different probability distributions to describe concentrations at retail on risk estimates depends both on the concentration and prevalence levels, but that in general it is larger at high levels of microbial contamination (high prevalence and high concentration). Also, a zeroinflation tends to improve the accuracy of the risk estimates. In manuscript III (“Variability and uncertainty in the evaluation of predictive models with literature data – consequences to quantitative microbiological risk assessment”) it is assessed how different growth settings inherent to literature datasets affect the performance of a growth model compared to its performance with the data used to generate it. The effect of the numberof observations, the ranges of temperature, water activity and pH under which observations were made, the presence or absence of lactic acid in the growth environment, the use of a pathogenic or non-pathogenic strain and the type of growth environment on model performance are analysed. Model performance is measured in terms of DifAf- the difference between the accuracy factor (Af) of the model with the data used to generate it and the Af with an independent dataset. The study is performed using a square root-type model for the growth rate of Escherichia coliin response to four environmental factors and literature data that have been previously used to evaluate this model. It is hypothesised that the Afof the model with the data used to generate it reflects the model’s best possible performance, and hence DifAfis smaller and less variant when the conditions of an independent dataset are closer to the data that originated the model. The distributions of DifAfvalues obtained with different datasets are compared graphically and statistically. The results suggest that if predictive models developed under controlled experimental conditions are validated against independent datasets collected from published literature, these datasets must contain a high number of observations and be based on a similar experimental growth media in order to reduce the variation of model performance. By reducing this variation, the contribution of the predictive model with uncertainty and variability sources in QMRA also decreases, which affects positively the precision of the risk estimates. To conclude, this thesis contributes to the clarification of the impact that the analysis of microbial data may have in QMRA, provides a new accurate method of fitting a distribution to microbial data, and suggests guidelines for the selection of appropriate published datasets for the validation of predictive models of microbial behaviour, before their use in QMRA.Perspectives of future work include the validation of the method developed in manuscript I with real data, and its presentation as a tool made available to the scientific community by developing, for example, a working package for the statistical software R. Also, the author expects that a standardized way of reporting microbial counts that clearly specifies the steps taken during data collection should be adopted in the future. Extending the work presented on manuscript II will allow obtaining more sound conclusions about the general impact of different frequency distributions on risk estimates. Following manuscript III, a simulation study could help to investigate to what level QMRA-targeted development and validation of predictive models are necessary for the accurate estimation of risk. Future needs in food microbiology and QMRA include the development of appropriate statistical methods to summarize novel data obtained from different “omics” technologies, adaptation of the current structure of QMRA studies to allow them to make the use of such data, and the assessment of the variabilityand uncertainty attending those data.

M3 - Ph.D. thesis

BT - The interpretation of quantitative microbial data

PB - National Food Institute, Technical University of Denmark

CY - Søborg

ER -