Methodology

Abstract

Closed studbook dog breeds present a population-genetics problem distinct from the one most of the field's common tools were built to address. Inbreeding coefficients — whether pedigree-based, genetic, or genomic — estimate how much of an individual's genome is identical by descent. They are valid measures used for the purpose they were designed for. They are not, however, instruments for managing the allelic richness of a breed over the decades-long time scale on which closed gene pools actually evolve. That is a different question, and breeders asking it have been underserved by the tools available to them.

This paper describes the framework BetterBred uses for population-level diversity management in closed studbook breeds, and the reasoning behind it. The framework's origin is practical rather than theoretical: it began with the puzzle of complex autoimmune disease appearing across bloodlines that breeders believed were unrelated, and with the recognition that pedigree data — however carefully kept — could not resolve which portions of a shared ancestral genome any given dog had actually inherited. The 33-locus microsatellite panel developed at the University of California Davis Veterinary Genetics Laboratory produced the data that pedigrees could not. BetterBred was built to make that data usable by individual breeders as breeding decisions.

Three BetterBred metrics are described: the Outlier Index (OI), which identifies dogs carrying a disproportionate share of a breed's less-common alleles; Average Genetic Relatedness (AGR), a molecular analog of mean kinship (Ballou and Lacy, 1995); and Internal Relatedness (IR), retained as a monitoring metric for individual inbreeding rather than as a breeding target. The paper explains why microsatellite markers remain the appropriate instrument for resolving population changes within modern breed time scales (approximately 150 years, 30–50 generations), why the 33-locus panel provides haplotype-level resolution at the DLA region that SNP arrays cannot reliably match, and how the framework relates to SNP-based genomic COI methods as complementary rather than competing tools.

Inbreeding avoidance does slow the loss of variation in a closed population relative to random mating, and breeders have used it toward that end for decades. But inbreeding measures ask only whether a dog's alleles are the same or different — not whether they are common or uncommon in the breed. The working hypothesis BetterBred has operated under across more than a decade of enrolled-breed data is that methods working directly on allele frequencies preserve more variation per generation than inbreeding avoidance alone. The paper is written for breeders and researchers working against the more familiar mental model, one in which the primary task of genetic management is avoiding inbreeding. That model is not wrong. It is incomplete.

1. How We Got Here

1.1 A Brief History of Genetic Management and Testing

For most of the past fifty years, genetic management in closed studbook breeds has meant finding disease genes one at a time, and usually assuming they were recessive. Clear, carrier, affected — three known states. From a carrier-to-carrier pairing, the expected distribution is one-quarter clear homozygotes, one-half carriers, and one-quarter affected homozygotes. Before DNA tests existed for any of it, most breeders reasoned that the responsible choice was to keep only the clear dogs, because clear dogs could not produce affected puppies no matter who they were bred to. That logic is sound as far as it goes, but in practice it meant retaining only the roughly twenty-five percent of a breed that was homozygous clear, even though another fifty percent were unaffected carriers who could have been bred safely to clear mates. The unintended consequence was a substantial loss of diversity — and because fewer dogs were being used, a corresponding rise in average inbreeding.

That problem eased as commercial DNA tests became available. Once a carrier could be identified before breeding, a carrier could be bred to a clear dog with no risk of affected offspring, and the breed no longer had to lose three-quarters of its potential breeding population to control one recessive. This was the first clear case in which diversity-awareness improved both individual health outcomes and the breed's genetic health together, and a great many breeders donated time, money, and samples toward making those tests available for their breeds.

The recessive model did not cover everything, though, and that fact took a while to become clear. Some conditions that looked like recessives on pedigree inspection turned out not to segregate in the expected ratios. Some were far more common than recessive inheritance would explain; others were far less common. Incomplete penetrance, later-onset expression, dominant genes with variable effect, polygenic thresholds — the actual genetic architecture of breed-specific disease was more complex than the carrier-clear-affected framework could handle. Breeding strategies that had worked for straightforward recessives could not be adapted in any obvious way to conditions where multiple contributing variants had to come together for a disease to express.

Some breeders responded by stepping back — healthy dogs were still being produced, and they were not going to stop breeding the good dogs they had. Others, usually those with direct experience of producing affected puppies and the families who loved them, went looking for other answers, and some of them found their way to population genetics. Across the same period, genetic bottlenecks in closed studbook breeds became harder to ignore. Pedigree research was documenting them. Breed-specific disease patterns were doing the same. Pugs had Pug diseases. Standard Poodles had their own, and the Miniature and Toy varieties shared a different set. Dobermans had Dobermans' demons. Golden Retrievers were dying earlier and earlier of cancer. Not every dog in any of those breeds was affected. Too many were.

Lowering inbreeding made intuitive sense to breeders who had lived through the transition from pre-testing to post-testing recessive control. A great deal of funding and volunteer time followed that instinct — swabs collected, blood donated, pedigree databases built. "Genetic diversity" became a buzzword, and like most buzzwords it was absorbed at varying depths. Some breeders rejected the whole framework, reasoning that the recent wave of genetic science hadn't made their breeds healthier on balance. Some held to the older model of finding a single gene for every problem. Some began learning from the scientific sources and advocacy communities that were promoting diversity-aware breeding and, in a few cases, outcrossing. That is largely where the field still is — divided, well-intentioned, and working with tools that were built for a narrower question than the one in front of it.

1.2 The Question Has Changed

For the breeders most focused on genetic diversity, the working question has long been a version of: how inbred is this dog? The tools built to answer that question — pedigree coefficients of inbreeding, genetic COI measured from STR markers, genomic COI measured from SNP arrays, and Internal Relatedness (Amos et al., 2001) — have given breeders something they did not previously have. They quantify the relatedness implied by a pedigree, or the homozygosity observed in the DNA itself, and used for the purpose they were designed for they work. Pedigree COI was better than nothing. STR-based genetic measures are better than pedigree and, usefully, distinguish between siblings where pedigree COI cannot. SNP-based genomic measures, drawing on tens of thousands of loci, add another layer of precision for the within-individual inbreeding question. None of these measures are interchangeable with the others, but all of them answer variants of the same underlying question.

The cultural weight breeders have been taught to place on inbreeding measures comes from a specific historical concern. Inbreeding control protects against the expression of autosomal recessive disease by reducing the probability that two copies of a deleterious allele land in the same dog. In the late twentieth century, when novel recessives were being discovered from observation alone and no DNA test could detect a carrier, that concern was legitimate and the caution proportionate. The risk landscape today is materially different. The great majority of commercially consequential autosomal recessives in purebred dogs are now directly testable — breeders can identify carriers before breeding — and genuinely novel recessives are rare enough that blanket avoidance of inbreeding, or of its milder form linebreeding, is not as crucial as it once was. Within a closed pool, neither close inbreeding nor maximally outbred pairings reliably preserve allelic richness: the first elevates homozygosity, while the second can dilute remnant-line alleles by half each generation against a numerically dominant background. Section 3 returns to that mechanism in detail.

What inbreeding control does not address, even at its best, is the class of diseases that now occupies most of the attention of breed health committees: complex, polygenic conditions in which multiple contributing variants must be present together for the disease to express. Outbreeding within a closed pool offers no reliable protection against complex disease, and can in some circumstances assemble the full complement of risk variants faster than close breeding would, because the contributing variants are scattered through different bloodlines of the same breed. Breaking up a disease haplotype requires a mate who carries genuinely different alleles, not simply other common alleles — and in a breed where most dogs trace back to the same small set of ancestors, mating two unrelated-looking pedigrees often just recombines variants from the same shared background. Inbreeding coefficients do not distinguish between these two cases because they do not measure what the alleles are, only whether a dog's two copies at each locus are the same or different. A dog carrying two different but common alleles, both inherited from the bottleneck, reads as healthily heterozygous by any inbreeding measure. So does a dog carrying two different alleles from genuinely separate family lines. The inbreeding measure cannot see the difference between same-family variation and between-family variation, because that distinction is about which alleles a dog carries, not how homozygous the dog is.

The more fundamental limitation, though, is not about disease. It is about what happens to a closed population over time. When a gene pool has been closed for eighty, or a hundred and fifty, years, the question that determines the breed's long-term viability is not primarily — extreme cases aside — how inbred the individual dog in front of the breeder is. It is whether the breed, as a population, still contains the range of genetic variation it will need a century from now, and whether the dogs being bred today are contributing that variation or narrowing it. In every closed population, low-frequency alleles are lost to genetic drift at a steady rate. Some of those alleles are the oldest variants in the breed — genetics that evolved over eons, carried forward from the breed's original founders, now represented by a small number of dogs in lines that have become less known, sometimes for good reason, and sometimes for no good reason at all. The same numerical forces that make a popular sire's alleles ubiquitous make the alleles of a less-popular line harder to preserve every generation. Drift is patient and one-directional. Once an allele is gone from a closed pool, it is gone.

This is not a theoretical concern. The conservation-genetics literature has been working on it carefully for decades. Ballou and Lacy (1995) formalized mean kinship as the governing criterion for managing variation in captive zoo populations, and the follow-up work across conservation genetics has consistently reinforced the same point: in finite, closed populations, allelic richness — the range of distinct alleles retained across the genome — is the property that has to be actively managed, because heterozygosity can be maintained, or even rise, while allelic richness erodes. The two are related outcomes but they are not the same outcome, and optimizing one does not automatically serve the other.

In canine research, the same lesson has accumulated case by case. Microsatellite analyses of closed dog breeds have documented compromised diversity across many populations, often with pedigree-based and molecular-based estimates disagreeing in ways that matter for management (Leroy et al., 2009; Wijnrocx et al., 2016). A sustained body of UC Davis work using the 33-locus VGL panel has characterized specific breeds — Standard Poodles (Pedersen et al., 2015a), Italian Greyhounds (Pedersen et al., 2015b), Samoyeds (Pedersen et al., 2017) — in terms of allele frequency distributions, DLA haplotype inventories, and the concrete decisions available to breeders working with each situation. The picture is not uniform across breeds. Some still carry substantial allelic richness and can maintain it with active management. Others have lost enough that the question shifts to what can be done with what remains.

The framework this paper describes did not begin with allele frequencies. It began with a puzzle that was not solely about inbreeding and not solely about disease. In Standard Poodles, Addison's disease and sebaceous adenitis were appearing across bloodlines that breeders believed were unrelated, and neither condition behaved like a simple autosomal recessive. A considerable cooperative effort among breeders, led by experienced long-time breeders, had produced a pedigree database with verified veterinary diagnoses, and it was instructive for tracing the sources of these common diseases. At the same time, not every dog from the most bottleneck-dense North American kennels was sick. Some were notably healthy, from the same kennels and sometimes the same litters as affected dogs. The pattern of inheritance ruled out single-gene explanations and suggested that what mattered was which portions of the shared ancestral genome a given dog had actually inherited. Pedigrees could not always resolve that question. Every puppy in a litter has the same pedigree-based coefficient of inbreeding; two littermates can carry materially different inherited genetics and no pedigree will distinguish them. Old-time breeders had long understood the consequence of this: starting from a shared set of foundation dogs, different breeders develop distinctly different lines — different in appearance, in working qualities, and presumably in what they have preserved of the breed's earlier variation. The pedigree shows the founders; it does not show which of the founders' genetics still persist in any given living dog.

The second limitation of pedigree research was simpler. Even a careful pedigree only reaches as far back as the records allow, and for dogs outside the North American mainstream the records often did not reach far. Around 2005, an effort was made to locate genuinely non-bottleneck ancestry by importing three show-quality black Standard Poodle puppies from Prague, on the hypothesis that the Iron Curtain had kept Czech breeding isolated from the Wycliffe-dominated North American lines. The written pedigrees reached back to the late 1970s and early 1980s and appeared to confirm that assumption. Not long after the puppies arrived, the deeper pedigrees were filled in. The outcomes — epilepsy in one puppy, severe temperament problems in another, Addison's disease in an offspring of the third — were not a surprise once the full ancestry became visible. The three puppies were littermates, and both their parents traced back, repeatedly, to multiple Wycliffe-bred dogs. Wycliffe Timoteo himself — an extremely inbred Canadian dog whelped in 1970 whose ancestors all went back to the heart of what is called the Mid-Century Bottleneck (MCB), a group of ten Standard Poodles born between 1948 and 1952 to whom every Standard Poodle on earth today owes about half of its ancestry — was not used heavily in Europe. But he appeared behind this single litter more than a dozen times. Timoteo had a pedigree coefficient of inbreeding of 41 percent over fifteen generations and 100 percent Wycliffe ancestry. Exported from Canada to East Germany, he produced several litters there, and his descendants, together with those of a few very close cousins exported from the same Wycliffe program, populate European Standard Poodle lines today. The apparent separation between European and North American Standard Poodles was largely illusory. Timoteo was, no doubt, a tall, elegant dog like his cousins, imported at a moment when European breeders wanted their own lines to be taller and more elegant, and the same popular-sire dynamics that consolidated the mid-century North American bottleneck played out in Europe through his lineage and that of his close relatives, drowning out much of the older landrace genetics still present at the time. A small number of genuinely non-bottleneck lines did remain in the breed, traceable chiefly through some apricot and red lines and a few rare brown and silver lines, but they were far fewer than had been hoped.

These experiences framed the question that eventually became the starting point of BetterBred: whether DNA testing could tell breeders which dogs still carried some of the breed's historical genetics, if any remained, and how much they carried, and whether that information could be used to preserve and increase what was left. A related question came with it: could the same data distinguish the dogs who remained healthy despite a bottleneck-dense background from the ones who did not, based on how much of that bottleneck influence they carried? At roughly the same time, Dr. Niels Pedersen and the UC Davis Veterinary Genetics Laboratory were developing a 33-locus canine microsatellite panel and looking for populations on which to test it. The Standard Poodle genetic diversity community had dogs that needed testing and questions the panel was structured to address. BetterBred grew out of what followed — specifically, out of the problem of making the VGL results usable by individual breeders, each working toward their own goals but all concerned with the long-term health of the breed. Numbered alleles for individual dogs, even with their frequencies, are not something breeders can translate into actionable information. That interpretation was the need BetterBred was built to fulfill.

The goal of active management is not preservation of a static population. It is rebalancing — pulling low- and mid-frequency alleles upward so that the breed's older variation is not lost to drift, and reducing the numerical dominance of alleles that are already secure. A breeder who selects, from a litter of healthy, typical-quality puppies, the one that carries more of the breed's less-common VGL-identified alleles is not doing anything exotic. She is letting a puppy that represents more of the breed's less-common ancestry enter the breeding pool, rather than defaulting to the puppy that is genetically more typical of dogs already there. Deliberate negative selection against deleterious alleles continues unchanged — breeders select healthy parents, produce healthy litters, do the DNA testing and health screening available to them. What changes is which of the healthy dogs get to contribute to the next generation. That decision, accumulated across a breed community over many generations, is what determines whether the breed carries forward the breadth of variation that reached it from its founders, or a progressively narrower slice of it.

Inbreeding avoidance does slow the loss of variation in a closed population relative to random mating, and breeders have used it toward that end for decades. But inbreeding measures only ask whether a dog's two alleles at a locus are the same or different. They do not ask whether those alleles are common or uncommon in the breed. A dog homozygous for a common allele and a dog homozygous for a rare allele are treated identically by an inbreeding measure, even though their contribution to the breed's remaining variation is not remotely the same. Across more than a decade of BetterBred's work with enrolled breeds — multiple generations of tested dogs in some breeds — methods working directly on allele frequencies, by preferentially retaining dogs whose genetics are less represented in the current population, are hypothesized to preserve more variation per generation than inbreeding avoidance alone. The framework described in this paper treats that as the working hypothesis it was built around.

The rest of the paper describes the tools this framework uses, the decisions they support, and the limits of what they can tell a breeder. One thing this paper does not try to do: it does not claim that any single metric is "the answer." A framework is not a metric. The point is that the question has changed, and the tools have to change with it.

2. What Closed Studbook Breeds Are, as Populations

A closed studbook is a specific kind of biological object. For conservation genetics purposes it sits somewhere between a wild outbred population and a laboratory inbred line — closer to a managed zoo population with restricted immigration than to either extreme, but not identical to any of them. The differences matter for which tools apply and how.

In a wild outbred population, migration between local groups continually introduces new variation, drift is counterbalanced by gene flow, and allelic richness tends to stabilize around the carrying capacity of the population's genetic environment. The tools developed in that context — Wright's F-statistics, classical population-genetics models that assume migration, heterozygosity estimators calibrated for outbreeding — apply cleanly to the problem they were built for. They apply less cleanly to closed studbook breeds, in which migration is by definition absent. When the studbook closes, the pool of genetic variation in the breed is fixed at whatever the founders happened to carry, and every generation after that can only work with what remains.

In a managed zoo population, a small number of animals are bred under centralized coordination. Studbook keepers track pedigrees, calculate mean kinship, and allocate breeding opportunities to minimize the loss of founder variation over generations. Ballou and Lacy (1995) formalized this work as a set of principles that have guided zoo conservation genetics for thirty years. Dog breeds share the closed-pool structure of managed zoo populations, and the theoretical framework carries across — but dog breeds differ from zoo populations in four consequential ways.

First, dog breeds are not small. A numerically large breed may have hundreds of thousands of living individuals, which creates a false sense of genetic security: a breed can have tens of thousands of registered dogs and an effective population size in the low hundreds, because most of those dogs trace back to a small number of influential ancestors whose genetic contribution is still dominant. Population size and genetic diversity are not the same thing, and in closed breeds they routinely diverge substantially.

Second, dog breeds are under strong artificial selection for non-random traits — conformation, temperament, working qualities, color. Zoo populations are selected chiefly for health and survival; dog breeds are selected for a long list of additional phenotypes, some of which correlate with specific genetic backgrounds in ways that concentrate ancestry and accelerate loss of variation. A zoo geneticist minimizing mean kinship across a breeding program operates under fewer phenotypic constraints than a breeder trying simultaneously to preserve diversity, pass health clearances, and produce dogs that can succeed in the show ring, the field, or the home.

Third, dog breeding operates without centralized coordination. No registry assigns breeding allocations. Every breeder decides for themselves which of their dogs will contribute to the next generation and which will not, constrained only by club rules, national law where it applies, and the availability of suitable mates. This is the most consequential difference from the zoo case. Optimal Contribution Selection — the formal method by which managed populations allocate breeding contributions to minimize inbreeding accumulation — cannot be applied to a population that has no central allocator. Any framework intended for working breeders has to operate at the decision point breeders actually control: which of their own dogs to breed, and which puppies from the resulting litters to retain.

Fourth, dog breed communities are also cultural communities. Breed clubs have traditions, aesthetic preferences, rivalries between kennels, and histories that shape which lines are bred to which. These influences are not noise around the biology; they are part of the selection pressure operating on the breed. A diversity tool that treats the breed only as a genetic population, without acknowledging that breeders are people making decisions within a community, will produce recommendations that do not travel well from laboratory to kennel.

These four differences do not invalidate the Ballou and Lacy framework. They require its adaptation. The work of the rest of this paper is to describe what that adapted framework looks like — what it measures, what it does not measure, and how its outputs turn into decisions a breeder can actually make.

3. Allelic Richness and Why It Matters More Than Heterozygosity

Two measures get conflated in casual conversation about genetic diversity, and the conflation is the source of a lot of unproductive disagreement. Heterozygosity is a property of an individual or of a population at a given moment — the proportion of loci at which two different alleles are present. Allelic richness is the count of distinct alleles at each locus existing in the breed.

The two are not parallel properties. Allelic richness influences heterozygosity directly: the more distinct alleles exist at a locus in the breed, the higher the probability any individual dog will inherit two different ones. A locus with ten alleles circulating in the breed supports higher heterozygosity than a locus with three. But the reverse relationship does not hold. A locus with only two or three alleles present can still show high heterozygosity if those few remaining alleles happen to be balanced in frequency — and that is the condition the rest of this paper is most concerned with, because it is the condition that makes heterozygosity alone a misleading measure of a breed's genetic state. A breed can appear diverse by heterozygosity while actually running short on the variation that produces it.

The difference is easiest to picture with a tangible analogy. Think of a breed's gene pool as a drawer of t-shirts at each locus, with each color a different allele. Some drawers contain many colors; some contain only a few. Allelic richness is the count of colors in the drawer. The evenness of those colors matters separately: a drawer with mostly green shirts and a few red, blue, and yellow ones has four colors but behaves, when you reach in, almost as if the drawer had effectively only one and a half colors — green is what you usually pull out. A drawer with equal numbers of green, red, blue, and yellow gives you a different color most times you reach. Heterozygosity is expected to be higher when the t-shirts are at similar numbers per color, because two dogs reaching into similar drawers are more likely to combine to give a puppy two different alleles at that locus. Allelic richness reports how many alleles (colors) are available. Their distribution within each locus (drawer) will say which ones are most likely to be used.

This distinction matters because the two properties respond differently to selection pressure and to drift in a closed gene pool. A breed that loses a color from the drawer entirely — the allele fixes in some dogs, or drifts out of the population — will continue to show high heterozygosity at that locus if the colors that remain are present at balanced frequencies. The measurement reads as healthy. The biological reality is that the breed now has fewer distinct options at that locus than it had before, and once a color is gone from a closed pool, it is gone. Outcrossing is the only way to restore it, and outcrossing is not available in a closed studbook without formal registry action.

The mechanism by which alleles disappear from closed populations is genetic drift. In any finite population, the allele frequencies at the next generation are a sample of the current generation's frequencies, and samples have sampling error. A common allele in one generation may appear at a slightly lower frequency in the next generation purely by chance. A rare allele may disappear entirely in a single generation, if none of the dogs that carried it are bred or if their offspring do not inherit that particular allele at that particular locus. Drift is patient. It works locus by locus, generation by generation, and it favors no particular allele — but because rare alleles have less buffer against sampling error than common alleles do, drift disproportionately removes the less-common variants over time. Over decades, in a population of a few hundred breeding dogs, the cumulative effect is substantial.

This is where popular sires, or more generally the uneven reproductive success of lines within a breed, compound drift's ordinary effect. When one sire produces hundreds of offspring while dogs from less-represented lines produce none, the less-common alleles those unrepresented dogs carry are disproportionately at risk. They may be present in the breed at the start of a generation and absent at the end, not because anyone chose against them, but because the breeders producing the next generation happened not to work with the lines that carried them. Pedigree COI or IR will not detect this. Neither metric tracks which alleles are present in the population at all — they measure, for individual dogs, the degree of allele sharing within or between genomes. A breed can lose a locus worth of variation entirely while its average inbreeding coefficient holds steady.

Heterozygosity-maximizing selection strategies can make this worse in a specific and predictable way. A strategy that consistently pairs the most genetically distant available mates — which will often identify pairings between dogs from dominant lines and dogs from remnant, less-represented lines — produces heterozygous offspring at the cost of diluting the less-represented parent's unusual contribution by half every generation. The unusual alleles do not disappear in one generation. They disappear over five or ten, as each subsequent pairing repeats the same dilution against a numerically larger dominant-line population. The dogs that result from this pattern are not inbred; in fact, they are often the most heterozygous dogs in the breed. What they are not is representative of the breed's full allelic range. Something has been lost that the heterozygosity measure was never built to detect.

None of this is an argument against heterozygosity as a property worth measuring or valuing. Dogs homozygous at many loci are more likely to express homozygous-recessive disease, and heterozygosity estimates are useful to individual breeders making individual decisions about individual dogs. The argument is narrower: at the level of the breed as a population managed over decades, allelic richness is the resource, and heterozygosity is expected to be higher when the t-shirts that remain are at similar numbers per color. The two measures answer different questions, and optimizing for one does not automatically serve the other.

The framework described in the rest of this paper treats allelic richness as the primary conservation target in closed studbook breeds. The tools it uses are designed to identify dogs whose genetics are less represented in the current population — dogs whose contribution to the next generation increases the breed's retained allelic range rather than narrowing it further — and to guide breeding decisions toward retaining those contributions across generations. Heterozygosity is monitored but is not the objective. The objective is keeping the number of colors in each drawer intact, and keeping them at relatively similar numbers each.

4. Two Measurements and What They Mean

Genetic management in a closed studbook breed draws on two different kinds of measurement. The first asks how much of a dog's genome is identical-by-descent — how much of what the dog inherited from its sire is the same genetic material as what it inherited from its dam. The second asks how common the genetic material a dog carries is in the breed — whether its specific alleles are found in a great many other dogs, some other dogs, or very few other dogs. The first measure is heterozygosity, and the second is allelic richness.

The word diversity is used for both of these things, and the two meanings are often treated as interchangeable. They are not. One measure tells you whether a dog's allele pairs are the same or different. The other tells you which alleles the dog carries, and how common they are in the breed. A system for managing a closed population must address both and ignore neither.

Inbreeding-based measures

Inbreeding has been estimated three different ways in dogs: from pedigrees (pedigree COI), from SNP arrays that identify runs of homozygosity across the genome, and through Internal Relatedness (IR), an algorithm developed originally for wild-population studies using microsatellites where pedigrees are unavailable (Amos et al., 2001). IR requires polymorphic loci as input; in the BetterBred software, it is derived from the VGL panel's 33-locus microsatellite genotypes combined with breed-wide allele frequencies, and assigns higher values when a dog is homozygous for rare alleles than when it is homozygous for common ones.

Each of these methods answers the same question with different inputs and different precision. Pedigree COI underestimates true inbreeding in most closed studbook breeds because the pedigree does not reach back far enough to capture the relatedness that exists at the breed's founding. SNP-array estimates from runs of homozygosity are substantially more precise at the individual-dog level because they observe the genome directly rather than inferring from pedigree paths. IR from the VGL panel occupies a useful middle position for BetterBred's work: it uses direct genotype data, and its precision is sufficient for the decisions the platform is designed to support. Where greater precision is needed on an individual dog, SNP-array methods are a complementary option (Section 6).

All of these measurements describe the same thing: how much of what a dog inherited from its two parents is the same genetic material, versus different. That is a legitimate and useful question. It is not the only question that matters for the breed's long-term viability.

Frequency- and distance-based measures

Where inbreeding measures describe a single dog's internal state, allelic richness describes the breed's genetic range — which alleles exist in the population, and at what frequencies. It is measured at the locus level. The two standard metrics are the number of alleles observed at a locus (Na) and the effective number of alleles (Na_e), which weights Na by how evenly distributed those alleles are. A locus carrying ten alleles at roughly equal frequency has a higher Na_e than a locus carrying ten alleles where one is present in 90 percent of dogs and nine are rare. Across a panel of loci, Na and Na_e together describe how much genetic variation the breed still holds.

The measurement's power depends on markers having broad variation within and across breeds. Highly polymorphic loci are unlikely to fall inside recent selective sweeps, because if they did, variation at those loci would have collapsed. That makes their allele frequencies a reasonable way to track demographic processes — drift, bottlenecks, founder effects, popular-sire effects — rather than selection pressure at the markers themselves. The 33-locus VGL panel was selected for exactly this property (Section 5).

Allelic richness connects to heterozygosity but does not reduce to it. More effective alleles at a locus should support higher expected heterozygosity, as a matter of simple arithmetic: when more alleles circulate at meaningful frequencies, more pairings produce heterozygous offspring. The relationship runs in that direction only. Reducing inbreeding within a breeding pool does not generate new alleles or restore lost ones; it redistributes the alleles already present. A breed that has lost allelic richness cannot recover it by breeding less closely. Allelic richness is the upstream quantity, and heterozygosity follows from it.

5. The 33-Locus UC Davis VGL STR Panel

Short tandem repeat markers have been used in conservation and forensic work for decades, and they have proved useful. They are older technology than genome-wide SNP arrays, but they offer information that other methods cannot return without significant development work — and even then, there is no guarantee that the newer results would be meaningfully more informative for the question of allelic richness in closed breeds. Any polymorphic data from any genotyping method, such as SNP microhaplotypes, can be used by BetterBred's algorithms should a more informative and affordable option become clear. The 33-locus VGL panel is currently the best option available, and ten years of work across dozens of enrolled breeds has shown it performs as intended on closed breed populations.

5.1 Marker selection and independence

The 33-locus panel is a composite. Twenty-one of its markers come from the International Society of Animal Genetics (ISAG) canine parentage panel, standardized for parentage verification and used by registries and laboratories worldwide. The other twelve markers were selected by the VGL from the broader pool of canine genomic microsatellites developed for forensic and population research. That pool includes the fifteen unlinked tetranucleotide loci documented in the DogFiler developmental validation (Wictum et al., 2013), screened from a starting pool of 3,113 candidate loci, validated to SWGDAM forensic criteria — the same guidelines governing human DNA profiling — and accepted in U.S. courts for individual identification. A separate set of seven STRs flanking the DLA class I and class II regions on chromosome twelve is used to identify DLA haplotypes, and is reported separately from the 33-locus diversity result.

Two properties of that selection carry through to the panel. The first is independence: the markers are on different chromosomes, or are far enough apart on the same chromosome that they are not linked. A dog's allele at any one locus gives no information about its allele at any other. The second is polymorphism. Some loci carry many alleles and provide a granular view of relatedness in the breed. Others carry few alleles even in highly diverse breeds. Both are informative; the panel was selected from loci known across canine research to carry meaningful polymorphism in most breeds.

A third property matters specifically in purebred dogs. Linkage disequilibrium in closed dog breeds extends much further than in outbred human populations — in the five breeds examined by Sutter et al. (2004), it reached across megabases on either side of any given marker, roughly a hundred times the range seen in humans. Subsequent work has confirmed this (Parker, 2012). An STR allele in a purebred dog therefore does not represent only the microsatellite repeat itself. It tags a chromosomal block that may carry hundreds of flanking genes along with it. A 33-locus panel under those conditions samples a much larger functional share of the genome than the raw marker count would suggest.

5.2 Why short tandem repeats remain appropriate to closed-breed time depths

The suitability of any marker to a population-genetics question depends on the time depth the question covers. Short tandem repeats mutate orders of magnitude faster than single-nucleotide polymorphisms (Brinkmann et al., 1998). That difference is what makes them useful for different questions. SNPs accumulate slowly and are well suited to deep-time questions — species and subspecies divergence, the early domestication of the dog. STRs change fast enough to retain information on a much shorter timescale, specifically recent and ongoing population dynamics within a closed breeding pool.

Modern closed studbook dog breeds sit squarely in the range where STR polymorphism is most informative. Most AKC-recognized breeds were formalized within the past hundred and fifty years — roughly thirty to fifty dog generations. The population changes a diversity tool needs to resolve in these breeds are founder effects, popular-sire bottlenecks, mid-century ancestry consolidations, and ongoing drift. These happened, and continue to happen, on exactly the timescale STRs are best at tracking.

There are practical reasons the technology has stayed in use. STR genotyping works on degraded DNA, which matters for archived or field-collected samples. It can be run in any laboratory with standard PCR capability, rather than requiring a proprietary chip and the fixed-cost infrastructure that goes with one. These are not small considerations for a field in which samples come from many sources and the cost of testing determines how widely it can be adopted.

5.3 The DLA region and haplotype-level resolution

Seven of the panel's markers are found in tight linkage with the canine DLA region: four flanking class I (DLA-88) and three associated with the class II genes DRB1, DQA1, and DQB1. Pedersen, Liu, Millon and Greer (2011) validated the principle behind this design using necrotizing meningoencephalitis in Pug dogs as a test case. Three approaches — sequencing only DQB1, using class-II-linked STR markers, and using a small targeted SNP array — each produced disease-risk assessments equivalent to full three-locus exon-2 sequencing of DRB1, DQA1, and DQB1. The strong linkage between DLA class II haplotypes and the polymorphic markers flanking them is what makes this work. STRs in tight linkage with the functional class II genes are not a proxy for something better. They are a validated substitute, at a fraction of the cost and time, suitable for routine breeder-facing testing.

The reason the panel reports DLA haplotypes and not just zygosity is that haplotype identity carries information that zygosity alone does not. Class I and class II haplotypes do not always move in lockstep. A dog reported as homozygous at the class II haplotype can still be heterozygous at class I. The VGL panel resolves both, which means it distinguishes a dog carrying 1001/2001 from a dog carrying 1002/2001 even when both look identical at class II. To our knowledge no other routine breeder-facing test resolves class I haplotypes. The information matters because class I and class II together form the extended DLA haplotype, and the diversity that remains in the region is not fully described by class II frequencies alone.

Across the breeds in our database, DLA haplotype distributions can also reveal where minority diversity in the region still exists. In the initial sample of Standard Poodles, two class I haplotypes accounted for roughly 45% of the population, and four accounted for 90% (Pedersen et al., 2015a). A single class II haplotype was present in heterozygous or homozygous form in 83% of dogs. The Doberman Pinscher shows a similar pattern at the breed level: in the VGL enrolled population (n=1,475), class I haplotype 1094 accounts for 73.9% of class I haplotypes and class II haplotype 2089 accounts for 77.1% (VGL canine genetic diversity statistics, 2026). The two are usually inherited together as a single extended haplotype, but the small difference in their frequencies reflects a handful of recombinant dogs in which 2089 pairs with a class I haplotype other than 1094 — itself a useful piece of information about the breed's DLA structure that a class-II-only report cannot show. Section 8.2 describes the limit case of this pattern in the Berger Picard, where two extended haplotypes in each DLA class are all that remain.

A methodological note: the DLA region has been the subject of decades of disease-association work in purebred dogs, and specific class I or class II haplotypes have been reported as elevating risk for various autoimmune and inflammatory conditions in particular breeds. The strength of evidence varies. Safra et al. (2011) showed that exon-2 typing in heavily inbred breeds with limited DLA diversity can produce statistically significant but spurious class II associations, and that distinguishing real associations from artifacts of bottlenecked-population structure requires larger samples and denser typing across the region. Brown et al. (2026) made the point from the other direction: in Nova Scotia Duck Tolling Retrievers, where DLA had long been implicated in juvenile-onset Addison's disease, denser study identified the actual causative variant in RESF1 on chromosome 27 — outside the DLA region entirely. The panel reports DLA haplotypes; readers should treat any specific haplotype-disease association with the appropriate caution about study design, sample size, and breed-specific population structure. Section 7.3 returns to how DLA information should and should not be weighted in pairing decisions.

6. A Complementary Tool: SNP Arrays

Single-nucleotide polymorphism arrays are the dominant marker technology in canine genetics today, and they are the technology behind most of the individual-dog genetic testing currently marketed to breeders. It is worth describing what they do well, on their own terms, because the framework this paper describes is complementary to SNP-array methods, and to other marker-based tools, rather than an alternative to any of them. A breeder using a frequency-based diversity analysis and a breeder using a SNP-array genomic COI are asking related but distinct questions, and the answers they get are most useful when understood in relation to each other.

The defining strength of dense SNP arrays is their coverage. A typical commercial canine SNP array carries between a hundred thousand and a million markers distributed across the genome. At that density, the genome is sampled finely enough that long stretches of homozygous sequence — runs of homozygosity — become directly observable. A run of homozygosity is a region where both copies of the chromosome carry the same alleles at adjacent markers for a measurable distance, which is the genomic signature of shared ancestry at that segment of the genome. Short runs reflect distant shared ancestors; long runs reflect recent ones. The total length of the genome contained within runs of homozygosity, expressed as a fraction, gives a direct estimate of an individual's inbreeding coefficient — one that does not depend on pedigree depth or pedigree accuracy, and one that distinguishes littermates from each other where pedigree COI cannot.

For the specific question of individual-level inbreeding estimation, SNP-array genomic COI is the more accurate tool. A thirty-three-locus panel cannot directly observe runs of homozygosity; the markers are too sparse. A dense array can, and the resulting estimate is a closer match to the biological reality of shared ancestry in a given dog's genome than pedigree-based or STR-based estimates typically achieve. It is important to be clear about what this estimate is, though. Any inbreeding measure — pedigree, STR, or SNP — is like a thermometer. It measures a current state: the level of homozygosity a dog was born with, or the average level across a set of dogs. It does not act on the mating choices that produced that outcome, and its reading has meaning only relative to other readings taken with the same instrument. Breed averages provide that reference — they allow a breeder to see whether a specific dog is unusually inbred or unusually outbred for its population — but the relationship between a given inbreeding level and breed-specific disease burden is not simple, and no fixed threshold separates acceptable from unacceptable in a way that holds across breeds.

On a population level, breeds with higher average inbreeding show increased mean morbidity and meaningfully more need for veterinary care than mixed-breed dogs (Bannasch et al., 2021). This is a real signal and it matters for breeders thinking about long-term breed health. But a population average is still a thermometer reading, and it does not resolve the individual case. Is a dog at 17% genomic COI healthier than one at 25%? The measure on its own cannot say. Bannasch and colleagues conclude their paper by recommending careful management of breeding populations "to avoid additional loss of existing genetic diversity," through breeder education and DNA-based monitoring of inbreeding levels. The framework described in this paper addresses that recommendation from the other side: it focuses on preserving existing allelic richness as the upstream lever, with inbreeding monitoring as one of the downstream checks on whether the lever is working.

SNP arrays are also the marker technology behind most consumer dog DNA products. A SNP array can assay a long list of already-identified variants in one swab — known monogenic disease tests, coat color and coat type loci, size-related loci, and a growing set of trait and linkage tests. Batching many assays onto one chip is what breeders get from the format, and it is what the commercial products are built around. Cheaper and faster is not the same as more accurate in every case — single-variant tests from specialist laboratories can outperform array-based calls for specific variants. SNP arrays are a product like any other. They have strengths and weaknesses.

While SNP-derived genomic COI can give a breed-level average of how inbred individual dogs are, it cannot describe which alleles the breed still carries, at what frequencies, or whether allelic richness is being retained or lost over generations. Those are population-scale questions about allele-frequency distributions, and they are a different instrument's job.

7. How Breeders Use This in Practice

The measurements described in Section 4 tell a breeder what the breed carries and how common a given dog's genetics are within it. Two decision tools built on those measurements translate that information into rankings usable at the individual-dog level: the Outlier Index (OI) and Average Genetic Relatedness (AGR). OI identifies dogs whose allelic profile is less well-represented in the breed's current population, by reading where each of a dog's alleles falls in the breed's allele frequency distribution. AGR asks how genetically similar a dog is, on average, to every other dog in the enrolled population — a molecular analog of mean kinship (Ballou and Lacy, 1995). It is computed by running a pairwise genetic relatedness calculation (GR; Wang, 2002) between each dog and every other dog in the breed, then averaging the results. GR on its own — between two specific dogs being considered as a pairing — is also useful to a breeder for keeping relatedness between mates within reasonable bounds, and BetterBred reports it directly in the Comparison tool, in Genetic Relationships, and in the Breeding tools. Both OI and AGR identify the same kind of dog: one carrying genetics that are unusual in the breed today. A breeder can work from either.

7.1 Keeper selection from a litter

Every puppy in a litter has the same pedigree-based coefficient of inbreeding. The written pedigree is identical for all of them. What the pedigree cannot resolve is which portions of the parents' genetics each individual puppy actually inherited. Mendelian segregation distributes the parents' alleles unevenly across the litter — two full siblings can carry materially different genetics, and a panel that reads those alleles directly resolves the difference a pedigree cannot.

This is the decision point where the framework does its most visible work. A breeder evaluating a litter of healthy, typical-quality puppies for conformation, temperament, working qualities, and health clearances will normally have more than one candidate worth retaining for breeding. The panel adds a further piece of information to that decision: which of those candidates carries more of the genetics that are less well-represented in the breed. How a breeder uses that information varies. Some make their primary selection on phenotype and health and then use the test results as tie-breakers among close candidates. Breeders more deeply concerned about the breed's genetic diversity test the whole litter and pick their keepers from the high-OI puppies. Both approaches contribute less well-represented genetics to the next generation rather than defaulting to puppies whose genetics are more typical of what the breed already carries.

Health screening and phenotype selection continue unchanged. The diversity-informed keeper is chosen from among puppies that already meet the breeder's standards. What changes is which of the acceptable puppies enters the breeding pool. Collective action by breeders around the world, working from the same information at the same decision point, can minimize the loss of allelic richness in the breed as a whole.

7.2 Comparing dogs

The same panel data used to rank an individual dog's position in the breed can also compare any two dogs directly and return how similar they are. BetterBred offers this as the Comparison tool, free to anyone with an account. Similarity is expressed in family-relationship equivalents — full sibling, half sibling, grandparent-grandoffspring, first cousin, aunt or uncle — because those are the units of relatedness breeders already think in.

For a breeder who has just kept a puppy, or is considering a prospective mate, the Comparison tool maps the dog onto dogs the breeder already knows. Which grandsire is this puppy most like, genetically, at the panel level? Which cousin or half-sibling does he most resemble? Which great-aunt? Because pedigrees in a closed studbook breed are full of the same ancestors repeated many times, it is not unusual for a dog's panel result to look more like one grandparent than another, or to carry a surprising amount of an uncle's signature rather than a sire's. That information does not replace the pedigree — the pedigree still says who the parents and grandparents are. It tells the breeder something a pedigree cannot: which of those ancestors actually show up in this dog's genetics, and which do not.

7.3 Breeding tools

Once a breeder is considering two specific dogs as a potential pairing, BetterBred's Breeding tools simulate the pairing and return summary statistics for the expected litter. The tool generates a large set of virtual puppies from the combined genetics of the two parents, scores each one, and returns litter averages and litter ranges for OI and IR alongside the breed averages for comparison. A breeder evaluating two candidate pairings can compare them side by side: the litter averages tell her what the expected center of each litter looks like; the ranges tell her how much variation she would need to select from; the breed averages tell her how either litter would land relative to the breed as a whole.

The Breeding tools also surface the DLA haplotypes carried by both parents, alongside the other diversity information. Neither VGL nor BetterBred includes DLA in their diversity calculations, and BetterBred does not advise breeders to make breeding decisions on DLA haplotype identity alone. The reasons follow from the methodological caution discussed in §5.3. Specific haplotype-disease associations are unsettled, the carrier/clear/affected mental model that breeders apply to single-gene recessive disease tests does not transfer to a multi-gene region under strong linkage disequilibrium, and prioritizing one DLA haplotype over another can deepen the diversity loss the breed is already experiencing. The reason DLA shows the patterns it does in heavily bottlenecked breeds is the same reason much of the rest of the genome does: population-wide inbreeding and drift have reduced variation across the board. Restoring DLA diversity is not a separate problem from restoring breed-wide diversity; it is the same problem, observed at one particularly visible region.

The exception is straightforward. When a breed has a history of bottleneck or popular-sire effects and a single extended DLA haplotype has become so dominant that minority haplotypes are at risk of loss — as in some Doberman lines — preserving the minority haplotypes when other selection criteria are equal helps maintain what diversity remains in the region. This is a diversity-preservation argument, not a disease-association argument.

The Breeding tools do not make breeding decisions. They characterize what a pairing would likely produce, against what the breed already carries, in the metrics that matter for diversity management. The decision — whether to proceed, whether to select a different stud, whether to keep a specific puppy from the resulting litter — remains the breeder's.

7.4 Population trend monitoring

The first two decisions described in this section are made by individual breeders about individual dogs. The third operates at a different level. Aggregated across the dogs an enrolled breed has tested over time, the same panel data resolves how the breed as a whole is changing — whether the alleles already in the population are redistributing toward a more even frequency distribution or consolidating around the most common ones, which DLA haplotypes are gaining or losing share, and whether the dogs entering the breeding pool in any given decade are contributing less-common genetics or doubling down on what is already common. Allelic richness itself grows little after the initial sample, especially when that sample was carefully collected. What changes over time is the frequency distribution of the alleles already on the books.

For the diversity-aware breeders who do this work, the trend data is the part of our methodology that connects an individual decision to a collective outcome. A breeder who keeps a high-OI puppy is making one small contribution to a long-running aggregate. The trend data is the only place that contribution becomes visible. Over enough generations and across enough breeders, redistribution toward less-common alleles shows up in the population's effective allele number and in shifts in the haplotype frequency distribution; without that, persistent diversity-aware selection at the kennel level cannot be distinguished, in retrospect, from random mating. The trend data is what tells the community whether its collective work is showing up in the breed.

The same data is used internally at BetterBred. Aggregated trends across enrolled breeds inform which breeds need methodological attention, which warrant a closer look at DLA haplotype dynamics, and which are good candidates for the kind of generation-by-generation modeling that supports our research program.

The decision the trend data supports — at both levels — is whether the breed's current rate of change is acceptable, and whether further coordination is warranted. It is the decision that turns individual breeder choices into a breed strategy, in the population-genetics sense of strategy, rather than a sum of independent decisions.

8. Limitations

8.1 Marker coverage

The most common critique of the panel is that thirty-three loci sample roughly one percent of the dog's 2.7-gigabase genome. The version of the critique that holds up under examination is more specific. Close genetic relatedness — parent, sibling, half-sibling, grandparent, first cousin — is well-resolved by what we use; this is not in dispute. What thirty-three loci resolve less cleanly is distant relatedness. A sixth cousin and an eighth cousin look more alike on the VGL panel than they would on a hundred thousand SNPs.

In the context of breeding decisions for closed populations, that resolution gap is beside the point.

In standard pedigree theory, sixth cousins produce offspring with an inbreeding coefficient of about 0.012 percent. Eighth cousins produce offspring with an inbreeding coefficient of about 0.0008 percent. The difference between 0.012 and 0.0008 is sixteen-fold, which sounds dramatic. In absolute terms, however, it is the difference between two numbers that are both essentially zero, in a context — full siblings at 25 percent, first cousins at 6.25 percent — where the differences that matter are orders of magnitude larger.

Moreover, each breed has different population structure, different founder effects, different opportunities and urgencies for conserving existing allelic richness. Pedigree theory assumes founders are unrelated. In purebred dog breeds they are not. The realized relatedness between two "sixth cousins" in a breed founded on twenty dogs is much higher than the textbook number. The realized relatedness between two "sixth cousins" in a breed founded on five hundred dogs is closer to it. The cousin label is the wrong unit for the work.

SNP arrays do offer real advantages for some questions. With tens of thousands of markers, they cluster individuals into subpopulations with higher confidence than microsatellites, estimate admixture proportions with narrower error bars, and can sometimes detect substructure that fewer markers miss. These are useful capabilities for research into breed history, for conservation triage between divergent subpopulations, for identifying admixture in populations of unknown origin. They are not capabilities a breeder uses when deciding which puppy in a litter to keep, or which dog in the breed to breed to next. Those are allele-frequency-based ranking decisions — which mate carries less-common alleles, which puppy contributes most to preserving what is left of the breed's allelic richness — and a breeder does not need to know whether their dog is 73 percent versus 79 percent of any particular ancestry to make them.

Preserving allelic richness is the question that matters in breeds that have already lost most of theirs, and it is the question SNP arrays cannot yet answer. Single SNPs are biallelic; they do not carry the polymorphism per locus that allele-frequency-based diversity measures depend on. Microhaplotypes — short genomic regions in which several SNPs are read together as a multi-allelic unit — recover that polymorphism, and microhaplotype panels for canine populations will be welcome when they are available. They do not yet exist for the breeds we serve. The VGL panel does, and the eleven years of cross-breed allele frequency data accumulated on it since 2015 — across dozens of breeds, with established baselines and DLA haplotype inventories — is itself a methodological asset that does not transfer trivially to a new marker system. Our methodology can move to whichever qualified panel produces the most useful allele frequencies in the breeds it works in; until that panel exists, we use the one that does.

8.2 Marker neutrality

The textbook concern about microsatellite-based diversity measures is that the markers must be neutral — that their allele frequencies must reflect drift and demographic history rather than selection acting on the markers themselves or genes linked to them. The standard worry that follows is that in a heavily bottlenecked breed, with selection pressure compressing variance at some loci toward zero, a thirty-three-locus panel will lose the resolution to characterize individual relationships at all.

The Berger Picard is the empirical answer. As of March 2026, VGL data on 155 enrolled Berger Picards shows the breed is the most depleted population analyzed using the VGL test. One of the thirty-three STR loci, AHTh171-A, is fully monomorphic — a single allele at frequency 1.000. The DLA Class I region carries only two haplotypes, with DLA1#1227 at 90 percent frequency and DLA1#1052 at 10 percent. The DLA Class II region also carries only two haplotypes, with DLA2#2067 at 90 percent and DLA2#2017 at 10 percent. The breed retains a smaller fraction of canid-wide genetic diversity than any other breed analyzed using the test.

If the textbook concern were the practical concern, the Berger Picard panel data should be incoherent — too narrow to resolve who is related to whom, too compressed to support breeding decisions. It is not. Even with one locus monomorphic and the DLA near-fixed on both classes, at least ten of the thirty-three STR loci carry more than five alleles in this breed. The panel retains substantial differentiating power, and the panel's resolution does not collapse in the worst-case breed BetterBred works with.

What that resolution actually buys the breed is concrete. Without the ability to identify which dogs carry the rare DLA haplotypes — the 10-percent DLA1#1052 and DLA2#2017 — the Berger Picard would lose those haplotypes through ordinary drift within a small number of generations. The breed has two haplotypes in each DLA class. Losing one would leave a single DLA Class I and a single DLA Class II haplotype across the entire breed. The panel is what stands between the Berger Picard and that outcome. Identifying carriers of the rare haplotypes, retaining them as breeders, and pairing them appropriately is only possible because the panel resolves the haplotype information cleanly even at this level of depletion.

The structural reason the panel keeps working in this case is that every dog's OI, IR, and AGR score is calculated against the breed's own allele frequency distribution, not against an external reference. If a marker happens to be found near a gene that has run to fixation under directional selection in a particular breed, the locus becomes less variable, and therefore less informative about drift, but it does not introduce directional bias into the comparison between two dogs in the same breed. The case to flag separately is a marker mid-sweep — a selected allele that is rising in frequency but not yet fixed. There, two dogs sampled from different lines can differ at the marker for reasons other than drift, and the comparison between them carries some signal from the selection pressure as well. This is not a failure mode unique to STR panels; any frequency-based diversity tool faces it. The mitigating structural protection in our case is that the panel was selected from loci that satisfy forensic and parentage neutrality criteria across the species, so mid-sweep loci should be uncommon, and the within-breed comparison structure itself limits the influence of any single locus on the overall score. A locus losing variance under selection, once fixed, contributes less signal to a within-breed comparison, not wrong signal. The monomorphic AHTh171-A in the Berger Picard contributes nothing to differentiating one Berger Picard from another — but that is what a fixed locus should do, and the other thirty-two loci continue to do the work.

A formal study of marker neutrality across the full VGL panel does not yet exist. The selection criteria for the panel's component loci, however, all push in the direction of neutrality. Some loci come from the DogFiler forensic panel developed by Wictum et al., selected and validated under SWGDAM guidelines for high polymorphism, unlinked loci, Hardy-Weinberg conformity in reference populations, and explicit avoidance of markers associated with phenotype or disease. Other loci come from ISAG-recommended parentage panels selected under similar criteria for animal identification. AHTh171-A, the monomorphic locus in the Berger Picard, is one of the ISAG loci. Forensic and parentage panels are deliberately selected against the kinds of markers that selection operates on, because those are the markers that fail forensic and parentage standards. The criteria do not guarantee neutrality in any one breed. They do consistently produce markers most likely to be neutral, and the within-breed comparison structure of the diversity measures provides a structural protection against the failure modes that any imperfect neutrality would produce. The Berger Picard data is what that combination looks like under the hardest conditions the panel has been asked to characterize.

8.3 Limits of individual disease prediction

The panel is a population-level diversity tool. It is not a clinical risk predictor for the individual dog. The metrics it produces — OI, AGR, IR, GR, DLA haplotype identity — describe how a dog's genetics relate to the breed's allele frequency distribution, not whether that dog will or will not develop any specific disease.

Two examples illustrate why disease-association information at variants the panel reads cannot be applied as a per-dog rule. In the Pug, necrotizing meningoencephalitis associates with a specific DLA class II haplotype, identified through the same exon-2-equivalent typing approach the panel uses (Pedersen et al., 2011). Whether selecting against that haplotype is a viable breeding strategy depends on its frequency in the breed and on the breed's overall diversity state — neither of which the disease association itself tells a breeder. In the Samoyed, by contrast, an autosomal recessive mutation in SCL24A4 causing enamel hypoplasia was identified against the same panel's diversity backdrop, and the analysis indicated that breeders could eliminate the mutation through carrier identification and informed pairings without further loss of breed-wide diversity (Pedersen et al., 2017). With OI now available as a per-dog metric, that recommendation becomes more granular: a low-OI carrier — one whose genetics are already well-represented in the breed — can be removed from breeding without diversity cost, since the breed has those alleles abundantly elsewhere; a high-OI carrier requires more careful pairing decisions, because the less-common alleles it contributes are the ones most worth preserving. Same instrument, two breeds, opposite breeding implications. The disease association alone does not produce a safe recommendation, but the combination of the disease association and the breed's diversity state does.

For complex polygenic conditions in which DLA-region association signals do not exist or are weak — most cancers, most cardiac disease, most orthopedic disease — BetterBred is appropriately muted at the individual level. Risk prediction per dog for those conditions requires variant-level discovery work, direct phenotype data in enrolled families, and validation in independent populations.

This does not mean BetterBred is useless for polygenic conditions. If a line has many members reporting hip dysplasia, a dog with good hips but high genetic relatedness to that line shares more of the genetic background of affected relatives than a less-related dog would, and pairing it to a line with consistently good hips is the kind of family-history-aware decision OFA already encourages. Additionally, the dog in this case will have an OI value. A very low OI might discourage breeding from this dog, since its genetics are already well-represented in the breed; a high OI might encourage breeding it to a line with excellent hips, to keep the less-common alleles in circulation while improving the probability of producing sound puppies. The panel does not predict whether an individual dog will develop hip dysplasia. It does inform the kind of decision a careful breeder is already making about line history and pairing strategy.

8.4 Self-selection in the enrolled population

The dogs in BetterBred's database are not a random sample of any breed. Owners who enroll dogs in diversity testing are not a random cross-section of breeders. They are disproportionately breeders already concerned about their breed's genetic health, working with kennels that test, and willing to act on the information the panel returns. The dogs they submit are not random either; they tend to be breeding stock, dogs being considered for breeding, and offspring of dogs already in the database.

When BetterBred describes longitudinal trends in an enrolled breed — that effective allele number is rising over five generations, that a particular DLA haplotype is gaining share — those trends describe the enrolled subset, not the breed as a whole. The enrolled subset is the diversity-aware part of the breed, where breeders are making the kind of decisions our methodology recommends. The breed as a whole, including kennels that do not test, may be moving differently.

We cannot do anything about the dogs that are not enrolled. We can only help breeders who participate. In doing so we hope to help a subset of the breed that can eventually help the rest. We are under no illusion that the breed samples we have will always match a theoretical whole-breed sample we do not have.

We do encourage breed representatives to recruit the broadest possible initial sample. The breadth of allelic richness in a breed is largely described early — once a sufficient number of well-distributed dogs have been tested, the inventory of alleles present in the breed stabilizes, and additional samples mostly redistribute frequencies rather than add new alleles. Population structure becomes clearer subsequently, as more dogs from more lines are tested. The recommendation to recruit broadly at the outset is a methodological consequence of this: the foundational picture of what diversity exists in a breed is set by the early sample, while the picture of how that diversity is distributed across lines fills in over time.

Occasionally a breed's initial enrollment is collected by VGL through a non-BetterBred channel — a sample drawn only from dog shows, for example — and that kind of narrow sample tends not to describe the breed well. These cases are rare, clear when they appear, and outside BetterBred's recruitment process. We mitigate them case by case as they arise.

The honest position is therefore that BetterBred's enrolled samples describe the part of each breed that has chosen to be described. They are not unbiased estimates of whole-breed state. They are useful for the people doing the work, and the work they support — preserving allelic richness in the diversity-aware portion of the breed — can, over generations, change the breed as a whole.

9. Where the Field Is Going

The methodology described in this paper is what is possible with the instrument currently available. Several developments would extend it.

Microhaplotype panels for canine populations. SNP-based microhaplotypes recover the multi-allelic richness that single SNPs lack, and a microhaplotype panel for dogs would offer narrower confidence intervals on per-dog scores, more loci to characterize within-breed variation, and improved resolution at the depleted end of the breed range where current information is thinnest. Building such a panel is not a small undertaking. Microhaplotype loci that are informative in one population are often poorly informative in another, because the SNP combinations defining a useful haplotype depend on population-specific variation patterns; allele frequencies vary considerably across populations even within a single species (Oldoni et al., 2020), and panel optimization for additional populations is recognized as ongoing work in the human forensic literature (Kidd et al., 2021). Dog breeds are sharply distinct genetic populations (vonHoldt et al., 2010), and a microhaplotype panel useful across them is likely to require either separate panel validation per breed or per breed group, or a single cross-breed panel that loses substantial per-breed power. Either path is a substantial research investment — SWGDAM-equivalent validation work, repeated. When such a panel exists for the breeds we serve, our methodology can move to it.

Variant-level disease discovery in enrolled families. This approach is silent on individual risk prediction for most polygenic conditions because that work requires direct phenotype data and variant-level studies that the diversity panel was not designed to produce. As the enrolled population grows, families with documented phenotypes for specific conditions accumulate within it. That growing dataset is a resource for the kind of association work that does produce individual-level risk information.

Cross-breed comparative work. The breed-level allele frequency distributions, DLA haplotype inventories, and longitudinal records BetterBred maintains across dozens of breeds are usable for population-genetics questions beyond any single breed. How quickly do bottlenecked breeds lose alleles compared to less-bottlenecked ones, in real populations rather than simulated ones? How do DLA haplotype dynamics differ across breed groups with different working histories? These are questions any researcher could ask of the data.

These directions are not promises. They are extensions the field will eventually take, and the methodology's design — within-breed allele frequency comparison, locus-system-agnostic algorithms, accumulating longitudinal coverage — allows for various future capabilities.

10. Summary and Conclusions

Allelic richness is the genetic resource every breed has — genes that evolved over eons in every complex species, sampled into a closed pool when the studbook closed. Retaining what variation is left in modern breeds takes more than avoiding inbreeding. It takes identifying the full range of variation in representative markers and the frequencies at which they exist in the gene pool. It takes a method for finding the dogs that carry more of the breed's less well-represented variation, then breeding them with purpose to other dogs that also contribute to its preservation. And it takes selecting, from the resulting litters, the puppies that contribute as well.

Controlling inbreeding directly remains part of the work, but it is not the main driver of maintaining either allelic richness or heterozygosity, because while it can on its own increase heterozygosity, it can't preserve allelic richness to the degree necessary. Breeding with an eye on allelic richness does — and it keeps inbreeding at bay along the way. While all of that is happening, breeders continue to select for good health, soundness, and breed-specific type, using the phenotype and genotype screening tools appropriate to their breed.

The tools BetterBred offers identify the dogs that contribute most to retained variation and monitor inbreeding alongside. The Outlier Index identifies dogs whose profiles are less well-represented in the breed's current population. Average Genetic Relatedness reaches the same kind of dog from a different direction, as a molecular analog of mean kinship. Internal Relatedness is retained as a monitoring measure for individual inbreeding rather than a breeding target. Pairwise Genetic Relatedness — the calculation underlying AGR — helps a breeder keep relatedness between specific mates within reasonable bounds.

The 33-locus UC Davis VGL microsatellite panel and the eleven years of cross-breed allele frequency data accumulated on it are currently the right instrument for this work, available now. Breeders hold the power to determine what their breeds carry forward. Without tools, they operate blindly. That is unacceptable.

References

Amos W, Wilmer JW, Fullard K, Burg TM, Croxall JP, Bloch D, Coulson T. (2001). The influence of parental relatedness on reproductive success. Proceedings of the Royal Society of London B, 268(1480), 2021–2027. doi:10.1098/rspb.2001.1751
Ballou JD, Lacy RC. (1995). Identifying genetically important individuals for management of genetic variation in pedigreed populations. In: Ballou JD, Gilpin M, Foose TJ (eds.), Population Management for Survival and Recovery: Analytical Methods and Strategies in Small Population Conservation. Columbia University Press, New York, pp. 76–111.
Bannasch D, Famula T, Donner J, Anderson H, Honkanen L, Batcher K, Safra N, Thomasy S, Rebhun R. (2021). The effect of inbreeding, body size and morphology on health in dog breeds. Canine Medicine and Genetics, 8(1), 12. doi:10.1186/s40575-021-00111-4
Brinkmann B, Klintschar M, Neuhuber F, Hühne J, Rolf B. (1998). Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. American Journal of Human Genetics, 62(6), 1408–1415. doi:10.1086/301869
Brown E, Varney S, Young A, Wolf Z, Foreman O, Wade CM, Hughes A, Oberbauer A, Safra N, Lindblad-Toh K, Burton S, Bannasch D. (2026). A variant in RESF1 is associated with Addison's disease and multiple autoimmune syndrome in young Nova Scotia Duck Tolling Retrievers. Scientific Reports, 16, 13194. doi:10.1038/s41598-026-42994-y
Kidd KK, Pakstis AJ, Speed WC, et al. (2021). The population genetics characteristics of a 90 locus panel of microhaplotypes. Human Genetics, 140(11), 1517–1530. doi:10.1007/s00439-021-02382-0
Leroy G, Verrier E, Meriaux JC, Rognon X. (2009). Genetic diversity of dog breeds: within-breed diversity comparing genealogical and molecular data. Animal Genetics, 40(3), 323–332. doi:10.1111/j.1365-2052.2008.01842.x
Oldoni F, Yoon L, Wootton SC, Lagacé R, Kidd KK, Podini D. (2020). Population genetic data of 74 microhaplotypes in four major US population groups. Forensic Science International: Genetics, 49, 102398. doi:10.1016/j.fsigen.2020.102398
Parker HG. (2012). Genomic analyses of modern dog breeds. Mammalian Genome, 23(1–2), 19–27. doi:10.1007/s00335-011-9387-6
Pedersen NC, Liu H, Millon L, Greer K. (2011). Dog leukocyte antigen class II-associated genetic risk testing for immune disorders of dogs: simplified approaches using Pug dog necrotizing meningoencephalitis as a model. Journal of Veterinary Diagnostic Investigation, 23(1), 68–76. doi:10.1177/104063871102300110
Pedersen NC, Brucker L, Tessier NG, Liu H, Penedo MCT, Hughes S, Oberbauer A, Sacks B. (2015a). The effect of genetic bottlenecks and inbreeding on the incidence of two major autoimmune diseases in standard poodles, sebaceous adenitis and Addison's disease. Canine Genetics and Epidemiology, 2, 14. doi:10.1186/s40575-015-0026-5
Pedersen NC, Liu H, Leonard A, Griffioen L. (2015b). A search for genetic diversity among Italian Greyhounds from Continental Europe and the USA and the effect of inbreeding on susceptibility to autoimmune disease. Canine Genetics and Epidemiology, 2, 17. doi:10.1186/s40575-015-0030-9
Pedersen NC, Shope B, Liu H. (2017). An autosomal recessive mutation in SCL24A4 causing enamel hypoplasia in Samoyed and its relationship to breed-wide genetic diversity. Canine Genetics and Epidemiology, 4, 11. doi:10.1186/s40575-017-0049-1
Safra N, Pedersen NC, Wolf Z, Johnson EG, Liu HW, Hughes AM, Young A, Bannasch DL. (2011). Expanded dog leukocyte antigen (DLA) single nucleotide polymorphism (SNP) genotyping reveals spurious class II associations. Veterinary Journal, 189(2), 220–226. doi:10.1016/j.tvjl.2011.06.023
Sutter NB, Eberle MA, Parker HG, Pullar BJ, Kirkness EF, Kruglyak L, Ostrander EA. (2004). Extensive and breed-specific linkage disequilibrium in Canis familiaris. Genome Research, 14(12), 2388–2396. doi:10.1101/gr.3147604
vonHoldt BM, Pollinger JP, Lohmueller KE, Han E, Parker HG, Quignon P, Degenhardt JD, Boyko AR, et al. (2010). Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature, 464(7290), 898–902. doi:10.1038/nature08837
VGL canine genetic diversity statistics. (2026). UC Davis Veterinary Genetics Laboratory. https://vgl.ucdavis.edu/test/canine-genetic-diversity (accessed 2026).
Wang J. (2002). An estimator for pairwise relatedness using molecular markers. Genetics, 160(3), 1203–1215. doi:10.1093/genetics/160.3.1203
Wictum E, Kun T, Lindquist C, Malvick J, Vankan D, Sacks B. (2013). Developmental validation of DogFiler, a novel multiplex for canine DNA profiling in forensic casework. Forensic Science International: Genetics, 7(1), 82–91. doi:10.1016/j.fsigen.2012.07.001
Wijnrocx K, François L, Stinckens A, Janssens S, Buys N. (2016). Half of 23 Belgian dog breeds has a compromised genetic diversity, as revealed by genealogical and molecular data analysis. Journal of Animal Breeding and Genetics, 133(5), 375–383. doi:10.1111/jbg.12203

End of paper. Updated April 26, 2026.