Admixture graph R package

The last couple of months I have worked on and off on an R package for modelling and testing admixture graphs.

You can download it from github or install it directly in R using:

library(devtools)
devtools::install_github(“mailund/admixture_graph”)
library(admixturegraph)

I know, the github repository has an underscore and the package name does not. R packages can’t have an underscore in their name and I didn’t think about it when I made the repository, so that is how it is right now.

Building admixture graphs

I’m using the package in a couple of projects right now where I’m using it to fit graphs to data. The data I work with is D statistics — I don’t compute those in the package but use ADMIXTOOLS — and I use the package to extract equations for the expected values of these statistics and for fitting graph parameters (edge lengths and admixture proportions) to the data.

It is similar to the pqGraph tool from ADMIXTOOLS (that I have never managed to run) except that I don’t compute error bars on parameters yet. I still have to find a good way of doing that. I have some ideas, but it is a bit more complicated than you might think.

Anyway, the code for specifying graphs is a bit crude but pretty straightforward. The code below builds and plots a graph.

leaves <- c("BLK", "PB",
            "Bar", "Chi1", "Chi2", "Adm1", "Adm2",
            "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "PBBB",
                 "Adm", "Chi", "BC", "ABC",
                 "x", "y", "z",
                 "bc_a1", "pb_a1", "abc_a2", "pb_a2")

edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "pb_a1"),
                        edge("pb_a1", "pb_a2"),
                        edge("pb_a2", "PBBB"),
                        
                        edge("Chi1", "Chi"),
                        edge("Chi2", "Chi"),
                        edge("Chi", "BC"),
                        edge("Bar", "BC"),
                        edge("BC", "bc_a1"),
                        
                        edge("Adm1", "Adm"),
                        edge("Adm2", "Adm"),
                        
                        admixture_edge("bc_a1", "pb_a1", "ABC"),
                        edge("Adm", "ABC"),
                        
                        edge("ABC", "abc_a2"),
                        admixture_edge("abc_a2", "pb_a2", "x"),
                        edge("Denali", "x"),
                        
                        edge("x", "y"),
                        edge("Kenai", "y"),
                        
                        edge("y", "z"),
                        edge("Sweden", "z"),
                        
                        edge("z", "PBBB"),
                        edge("PBBB", "R")))
 

admixtures <- admixture_proportions(c(
    admix_props("bc_a1", "pb_a1", "ABC", "a"),
    admix_props("abc_a2", "pb_a2", "x", "b")))
                                
bears_graph <- agraph(leaves, inner_nodes, edges, admixtures)
plot(bears_graph, show_inner_node_labels = TRUE, show_admixture_labels = TRUE)
Admixture graph
Admixture graph

Fitting graphs

With a data frame with columns W, X, Y, Z, and D (the first four should be samples and D is then the D(W,X;Y,Z) statistics) you can then fit a graph to the data.

The interface works with magrittr or dplyr pipelines so you can write something like

data %>% fit_graph(graph) %>% plot

to fit the graph parameters and plot the fit.

 

Fitted data
Fitted data

You can also extract the fitted parameters from the result of fit_graph() using the coefficients() function, get the fitted values using the fitted() function, and in general use the usual interface for fitted models in R.

Except for confidence intervals with confint(). As I wrote above, I haven’t quite figured out how to do that yet.

It is not terribly solid code yet — it is more likely to crash with a meaningless error message than a meaningful one — but I am working on improving that. If anyone can find a use for it, and give some feedback, I would much appreciate it.

Author: Thomas Mailund

My name is Thomas Mailund and I am a research associate professor at the Bioinformatics Research Center, Uni Aarhus. Before this I did a postdoc at the Dept of Statistics, Uni Oxford, and got my PhD from the Dept of Computer Science, Uni Aarhus.

Leave a Reply