Estimating breed composition for pigs: A case study focused on Mangalitsa pigs and two methods

Josue Chinchilla-Vargas*, Francesca Bertolini, K. J. Stalder, J. P. Steibel, M. F. Rothschild

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review


Breed associations and registries maintain breed purity by enforcing certain conformational characteristics defining the breed along with cataloging the pedigree of every animal in the registry. Furthermore, developing niche markets is often based on specialized products using heritage breeds that need to guarantee breed purity. Genomic technology and the progressively lower costs of genotyping can be helpful when assessing breed purity by estimating breed composition. In this research, genotypes from 648 pigs and 11 breeds were used to develop marker panels to estimate breed composition with special emphasis on Mangalitsa pigs as a heritage breed. Two sets of panels were created. The first set was based on Fst scores that were calculated individually for ~31,000 available markers across the pig genome. Here, panels composed of the 10, 50, 100, 500 and 1000 markers with the highest Fst scores were generated. The second set was composed by randomly selected markers and had the same number of markers as the Fst-derived panels. Two statistical methods, linear regression and random forest were then used on the marker panels to estimate breed composition, of 107 pigs including 47 individuals known to have Mangalitsa background. Fst appeared to be better at identifying Mangalitsa individuals when compared to random markers regardless of the method used to estimate breed composition. However, random markers were more accurate at estimating breed composition for non-Mangalitsa individuals. When the results were compared across methods for estimating breed composition, linear regression produced more accurate estimates of breed composition than random forest. However, both methods lacked accuracy when estimating breed composition for crossbred individuals. It must also be noted that these methods were focused on estimating breed composition of Mangalitsa pigs and different markers should be selected if different breeds will be the focus and accuracy of prediction will depend on the breeds that are available to be used as references for the Fst calculations. The results presented in this study allow us to conclude that: 1) Random forest was effective at classifying individuals into breeds, but not at estimating breed composition when compared to the linear regression method. 2) Markers filtered using Fst scores are more effective at identifying Mangalitsa breed composition while not as effective at identifying other breeds. 3) If Fst-filtered markers that are effective at identifying Mangalitsa from other breeds are being used to estimate breed composition for individuals of other breeds, a greater number of markers is needed.
Original languageEnglish
Article number104398
JournalLivestock Science
Number of pages10
Publication statusPublished - 2021


  • Mangalitsa
  • Mangalica
  • Swine
  • Breed composition
  • Random forest
  • Linear regression


Dive into the research topics of 'Estimating breed composition for pigs: A case study focused on Mangalitsa pigs and two methods'. Together they form a unique fingerprint.

Cite this