Baps Software For Mac

For example, instead of referring to the ‘BAPS News’ page everyday, you will just get a short message and link in the RSS software with the title of the updated News. RSS stands for ‘Really Simple Syndication’ and is a standard format used to share content on the internet.

Installation instructions for Mac OS X on POwerPC computers.

BAPS 5.2 is now available here. This version has the same functions as v5.1, but it contains some minor bug fixes, so it is recommended to replace v5.1 by v5.2. If you have used the version 5.1 previously, you DON't NEED TO RE-INSTALL MCR for BAPS 5.2. You simply replace the old files in your v5.1 folder by the files in the v5.2 archive and launch the software with the same script file as before (for details see below).

BAPS 5.1 for PPC Macs is available here. For installation, please refer to the instructions given below for the version 5.1, for which a simpler procedure can be used than for BAPS 4.

NB If you have the earlier MCR component (for BAPS 4.X) installed on your computer, that program must be removed before installation of the new MCR version for BAPS 5.X. This means that different generations of BAPS versions cannot be run on the same Mac OS X computer. However, as BAPS 5.X versions will contain all the functionality of versions 4.X, this restriction is relatively unimportant. Notice though that not all new program features are available in case you use old result files from versions 4.X.

Manual for parallel use of BAPS 5.2 software is here. A general manual for BAPS 5.2 is here. The examples of data formatting referred to in the manual are in this zipped folder: ExamplesDataFormatting.zip.

To make BAPS 5.1 functional on Mac OS X for Intel Mac computers you need to do the following:

1. BAPS requires the X11 application to be present on your computer. Notice that BAPS 5.1 requires the most recent upgrade of X11 to be installed from Apple's website!

2. Use of BAPS requires that you install first a free Matlab runtime component (MRC) of size ~94MB (download and save to disk without decompressing the archive yet!). WARNING! Do not use the Safari browser for the download, because it automatically decompresses the archive and does that job wrongly! For instance, use the Mozilla Firefox browser.

3. Note! When new upgrades of the SAME BAPS GENERATION (5.X) become available, you don't need to re-install the runtime component, only replace the software files.

4. To install the MRC to your system, launch the X11 terminal application, locate the downloaded zip-file and install it by using the unzip filename command. For basic Unix commands (these are used in the terminal) for orientating yourself among folders see e.g. here (for instance, cd folder_name changes the current folder to the existing subfolder folder_name, and cd . puts you in the parent folder of the current folder). The unzip command extracts the MRC package and installs to your system a folder /v76 containing the files needed by BAPS. On my system the MRC root folder will by default be:
/Users/jukkacorander/v76

5. Download the BAPS package, put it to any arbitrary folder and unzip the files in the X11 terminal. Download this script file (run_baps5_imac.sh) and put it to the same folder where you extracted the BAPS 5.1 package files.

6. You can now always launch BAPS by running the script file in the X11 application (orient first to the folder containing the file) using the following command, where you replace /Users/jukkacorander/v76 by the MRC root folder in your system:

./run_baps5_imac.sh /Users/jukkacorander/v76

If you don't have execution rights available by default, use first this command

chmod a+x run_baps5_imac.sh

and then the above command. NB put a space between 'sh'and 'Users' in the command!

BAPS 4 user manual is here .

To make BAPS 4 functional on Mac OS X for PowerPC computers you need to do the following:

1. BAPS requires the X11 application to be present on your computer. Unless you have the X11 application installed, install it from the Tiger installation DVD (to be preferred, because the DVD contains a newer version 1.1) or download it from this Apple site . This downloadable version (1.0) is actually for Panther (10.3) version of Mac OS X, so it is not quaranteed to work correctly in Tiger (10.4).

2. Use of BAPS requires that you install first a free Matlab runtime component (MRC) of size ~260MB ( download and save to disk without decompressing the archive yet! ). WARNING! Do not use the Safari browser for the download, because it automatically decompresses the archive and does that job wrongly! For instance, use the Mozilla Firefox browser.

3. Note! When new upgrades of the SAME BAPS GENERATION become available, you don't need to re-install the runtime component, only replace the software files.

4. To install the MRC to your system, launch the X11 terminal application, locate the downloaded zip-file and install it by using the unzip filename command. For basic Unix commands (these are used in the terminal) for orientating yourself among folders see e.g. here (for instance, cd folder_name changes the current folder to the existing subfolder folder_name, and cd . puts you in the parent folder of the current folder). The unzip command extracts the MRC package and installs to your system a folder /v75 containing the files needed by BAPS.

5. Download the BAPS software package, put it to any arbitrary folder and unzip the files in the X11 terminal.

6. The final step of the installation procedure is to tell your system about where BAPS can find dynamic libraries. To do this you need to know the full path to the Matlab runtime component, which is in a folder /v75.

7. WARNING! Unix is case-sensitive, so folders /Users and /users are different things!!!!! In my computer the full path to the /v75 folder equals: /Users/jukkacorander/v75 so the example of setting the paths to the libraries contains this information. You need to modify this part according to the path in your own system in the system commands below. To make the software use simpler, you should create a script file (here called runbaps) that tells X11 shell where the paths are and starts BAPS. This procedure is explained in the next step.

8. In the X11 window go to the folder where you did extract the BAPS package files. Type the command pico runbaps which opens a simple text editor window. Type the following rows:

#!/bin/bash

RTL=/Users/jukkacorander/v75

export DYLD_LIBRARY_PATH=$RTL/sys/os/mac:$RTL/bin/mac:$RTL/runtime/mac

export XAPPLRESDIR=$RTL/X11/app-defaults

./baps4

Remember to replace /Users/jukkacorander/v75 with the path to /v75 folder in your system. Each of the commands must be written exactly as shown (OS X is case-sensitive) in a separate single stretch of characters, with no additional spaces (the only space should be between export and DYLD). There should not be an empty row between the commands, here it is only included for the sake of visual clarity. When you have typed everything, press Ctrl-x to quit Pico editor and choose 'y'to save the changes.

Finally, to start the BAPS software you need to give execution rights to the script file by typing in the X11 terminal the command (remember that you need to be in the BAPS package folder): chmod a+x runbaps

The program can now be started with the command ./runbaps and a GUI appears. From now on, you can always access BAPS by: (1)starting the X11 terminal, (2) locating yourself to the folder where you did put BAPS files, (3) typing the command chmod a+x runbaps, and (4) typing the command ./runbaps

Published online 2018 Jul 30. doi: 10.12688/wellcomeopenres.14694.1
PMID: 30345380
This article has been cited by other articles in PMC.

Peer Review Summary

Review dateReviewer name(s)Version reviewedReview status
2018 Oct 5Sebastian DucheneVersion 1Approved
2018 Sep 17Emmanuel ParadisVersion 1Approved

Abstract

Identifying structure in collections of sequence data sets remains a common problem in genomics. hierBAPS, a popular algorithm for identifying population structure in haploid genomes, has previously only been available as a MATLAB binary. We provide an R implementation which is both easier to install and use, automating the entire pipeline. Additionally, we allow for the use of multiple processors, improve on the default settings of the algorithm, and provide an interface with the ggtree library to enable informative illustration of the clustering results. Our aim is that this package aids in the understanding and dissemination of the method, as well as enhancing the reproducibility of population structure analyses.

Introduction

Identifying sub-populations in collections of genetic sequences is a common problem in population genetics, molecular ecology, epidemiology and microbiology. In general, the aim of genetic clustering algorithms is to identify separate panmictic clusters within a broader, more heterogeneous population. In large sequence data sets, it is helpful to identify smaller subpopulations which can be further analysed for associations with particular phenotypes as well as recombination,, as long as potential biases introduced through taking clusters from a larger population are taken into account.

A frequently used model assumes that each individual sequence is drawn from one ofK distinct subpopulations with each cluster having its own set of allele frequencies. The aim is then to identify which cluster each sequence originates from and the corresponding allele frequencies within that cluster.

There are a number of methods for solving this problem including STRUCTURE,, snapclust and BAPS (Bayesian Analysis of Population Structure). The BAPS algorithm,11 is distinct in that it attempts to estimate the partition of individual sequences into clusters directly by analytically integrating over the allele frequencies parameters for each subpopulation. This allows for the latent number of underlying subpopulations,K, to be estimated as part of the model fitting procedure. The hierBAPS algorithm extends this approach by enabling the investigation of a population at multiple resolutions. This is achieved by initially clustering the entire dataset using the BAPS algorithm before iteratively applying the algorithm to each of the resulting clusters.

Similar to other approaches, BAPS assumes that alleles are drawn independently from a multinomial distribution with a Dirichlet prior. However, unlike STRUCTURE, which uses Gibbs sampling to estimate the posterior distribution, BAPS attempts to find the partition of the dataS that maximises the posterior probability of an allocation over all other possible allocations. A partitionS is defined as the allocation of each sequence to one ofK possible clusters. The maximum possible value ofK is given in the hierBAPS algorithm. Here indicates the set of all possible partitions with up toKmax clusters. The hierBAPS algorithm attempts to chooseS to maximise

whereP(data S) is the marginal likelihood of having the allele frequency parameters analytically integrated out leading to

P(data S)=i=1Kj=1Nl(Γ(lαijl)Γ(lαijl+nijl)l=1NA(j)Γ(αijl+nijl)Γ(αijl))

whereni jl is the count for allelel at locusj in clusteri andαi jl is the corresponding hyper-parameter for the Dirichlet prior. The BAPS algorithm attempts to find the partitionS that maximizes the posterior probability using a greedy stochastic search approach. A discretised uniform distribution of the cluster sizeK (K = 1,…,Kmax) is used in hierBAPS to provide the prior probability of eachpartitionP(S). The Dirichlet hyperparameters are set at1NA(j) whereNA(j) is the number of of distinct alleles at locusj.

Currently, hierBAPS is only available as a MATLAB binary, which can be both difficult to install and use as separate runtime libraries are generally needed for different OS versions for MacOS X, Windows and Linux systems. The documentation is also lacking, making it difficult for less computationally experienced researchers to use. There is currently no clear guide on how to use the output of the MATLAB binary to produce informative plots for interpretation. Whilst there are other algorithms available to cluster genetic data in R, such as snapclust and DAPC, neither make use of the partition approach used in BAPS. By providing an R implementation of hierBAPS, we aim to increase its usability and the reproducibility of analyses using the software.

Methods

Implementation

RhierBAPS is implemented in the R language13. The core program relies upon the R packages ape, dplyr, gmp, purrr and ggplot2. Additional plotting functionality makes use of ggtree15 and phytools16. The structure of the code is very similar to the original MATLAB code and has similar runtimes. The development version of the package can be installed using devtools.

Unlike the MATLAB version, rhierBAPS by default only considers SNP loci that have a minor allele in at least two sequences. This has been found to improve the results of the analysis as although singleton SNPs are important when constructing phylogenies they introduce noise into the model used in hierBAPS leading to poorer quality clusterings. It is currently recommended that singletons SNPs are removed before running the MATLAB version of the software.

Operation

RhierBAPS can be installed on any computer where R versions 3.5 and above can be installed. The package can be run using just a few lines of R code where the variable 'fasta.file.name' should be replaced with the location of the FASTA formated multiple sequence alignment of the sequences of interest.

Use cases

RhierBAPS requires a multiple sequence alignment in FASTA format. In all examples we make use of sequences from theBacillus cereus Multi Locus Sequence Typing website (https://pubmlst.org/bcereus/). The sequences used are included as part of the R package.

The algorithm requires an initial number of clusters to be set which should be higher than the maximum number of expected clusters in the dataset. If a dataset is likely to contain many distinct lineages, for example, if there are many samples from many locations, then a higher initial number of clusters should be set. Conversely, if the samples are from only a small number of sites and little variation is expected then a smaller initial cluster size can be set. To get an idea of a good initial cluster size, agglomerative clustering with complete linkage using pairwise SNP distances can be performed initially. The number of levels over which clustering should be performed is also required as input to the algorithm.

In the preceding example, we ran rhierBAPS with 20 initial clusters at two clustering levels. Additional parameters that can be set include the number of cores to use and whether the program should generate progress information. The hierBAPS function generates a data frame indicating the assignment of sequences to clusters at each level. This, along with the marginal log likelihoods can be saved to file.

Finally, as the program is written in R we are able to take advantage of the excellent plotting capabilities available. Given a phylogenetic tree generated using IQTREE with model selection using the command iqtrees, we can annotate it with the BAPS clusters using ggtree15 (Figure 1).

Figure 1.

Phylogenetic tree built using Iqtree and annotated with the top level clusters identified using rhierBAPS.

Additionally, the plot_sub_cluster function allows for the user to focus on one higher level cluster and investigate the sub cluster present within it. Here we investigate cluster 9 (highlighted in red), at the top most level (Figure 2).

Figure 2.

Phylogenetic tree focusing on the 9th cluster at the top level identified using rhierBAPS and plotted using the plot_sub_cluster function. The subsequent clustering at the 2nd level is indicated in the sub-tree to the right.

Summary

Clustering is an essential component of many genetic analysis pipelines. We have presented rhierBAPS, an R package that implements the hierBAPS algorithm for clustering genetic sequence data. It is both easy to install and use, whilst providing additional plotting capabilities and the ability to run using multiple cores. We believe it will aid in the reproducibility of population structure analysis.

Software availability

The package is available on CRAN:https://cran.r-project.org/web/packages/rhierbaps/index.html

Source code available from:https://github.com/gtonkinhill/rhierbaps

Archived source code as at time of publication:http://doi.org/10.5281/zenodo.131895820

License: MIT

Acknowledgments

The authors thank a number of users who highlighted small bugs in the initial version of the software.

Funding Statement

This work was supported by the Wellcome Trust [206194] and [204016; to GTH; a Wellcome Trust PhD scholarship grant]; and SDWF is supported in part by The Alan Turing Institute via an EPSRC grant EP/510129/1.

Newegg.com - A great place to buy computers, computer parts, electronics, software, accessories, and DVDs online. With great prices, fast shipping, and top-rated customer service - Newegg shopping upgraded ™If you are reading this message, Please to reload this page.(Do not use your browser's 'Refresh' button). Please if you're running the latest version of your browser and you still see this message.If you see this message, your web browser doesn't support JavaScript or JavaScript is disabled.Please enable JavaScript in your browser settings so Newegg.com can function correctly. Nvidia quadro 4000 release date.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Chewapreecha C, Harris SR, Croucher NJ, et al. :Dense genomic sampling identifies highways of pneumococcal recombination.Nat Genet.2014;46(3):305–309.10.1038/ng.2895 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
2. Marttinen P, Hanage WP, Croucher NJ, et al. :Detection of recombination events in bacterial genomes from large population samples.Nucleic Acids Res.2012;40(1):e6.10.1093/nar/gkr928 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
3. Dearlove BL, Xiang F, Frost SDW:Biased phylodynamic inferences from analysing clusters of viral sequences.Virus Evol.2017;3(2):vex020.10.1093/ve/vex020 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
4. Pritchard JK, Stephens M, Donnelly P:Inference of population structure using multilocus genotype data.Genetics.2000;155(2):945–959. [PMC free article] [PubMed] [Google Scholar]
5. Anderson EC, Thompson EA:A model-based method for identifying species hybrids using multilocus genetic data.Genetics.2002;160(3):1217–1229. [PMC free article] [PubMed] [Google Scholar]
6. Beugin MP, Gayet T, Pontier D, et al. :A fast likelihood solution to the genetic clustering problem.Methods Ecol Evol.2018;9(4):1006–1016.10.1111/2041-210X.12968 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
7. Corander J, Waldmann P, Sillanpää MJ:Bayesian analysis of genetic differentiation between populations.Genetics.2003;163(1):367–374. [PMC free article] [PubMed] [Google Scholar]
8. Corander J, Waldmann P, Marttinen P, et al. :BAPS 2: enhanced possibilities for the analysis of genetic population structure.Bioinformatics.2004;20(15):2363–2369.10.1093/bioinformatics/bth250 [PubMed] [CrossRef] [Google Scholar]
9. Corander J, Marttinen P:Bayesian identification of admixture events using multilocus molecular markers.Mol Ecol.2006;15(10):2833–2843.10.1111/j.1365-294X.2006.02994.x [PubMed] [CrossRef] [Google Scholar]
10. Cheng L, Connor TR, Sirén J, et al. :Hierarchical and spatially explicit clustering of DNA sequences with BAPS software.Mol Biol Evol.2013;30(5):1224–1228.10.1093/molbev/mst028 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
11. Corrander J, Marttinen P, Mäntyniemi S:A Bayesian method for identification of stock mixtures from molecular marker data.Fish Bull.2006;104(4):550–558.Reference Source[Google Scholar]
12. Jombart T, Devillard S, Balloux F:Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.BMC Genet.2010;11:94.10.1186/1471-2156-11-94 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
13. R Core Team:R: A language and environment for statistical computing.2013. Reference Source[Google Scholar]

The authors report an implementation of hierBAPS for R, RhierBAPS, for determining the optimal number of clusters in population DNA sequence data. The program does not extend the methods in hierBAPS, but I appreciate that R is the default language in many bioinformatics pipelines.

I have a few suggestions:

- The program is easy to use, but I think that it is worth including a few data sets that can are readily available when the package is loaded into R. This would make the examples easier to run and understand.

- I think that it would be valuable to see some results of choosing different minimum values of K to illustrate the sensitivity of the clustering to this parameter.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

The identification of groups (or clusters) with genetic or genomic data is one the basic questions asked by population geneticists. Several methods exist with various software implementations. The software described in this note is a valuable addition to the set of R packages for population genetics and related fields. I greatly appreciate to have this method implemented in R. The analytical integration is a really great feature and that makes this method attractive.

I tried the package and the examples run very smoothly as expected. I also tried the main function with a small DNA alignment (the 'woodmouse' data in ape) and the results made sense. The graphical tools provided with the package are helpful, and the results output as a list in R make easy to use a custom graphical display. For instance, I was able to make my own display with functions in ape in just four lines of code.

I have a few comments or suggestions that the Authors may find useful for future versions of their article and/or package.

At present, it seems that the package handles SNP data. Does this mean that only biallelic DNA loci can be analysed? Can other types of biallelic genetic data be handled? What if more than two alleles are observed at a DNA site?

One suggestion for future developments would be to better integrate with other data classes, particularly from ape and adegenet which are more and more widely used. Also, ape has now efficient links with the data classes used in BioConductor, which makes possible to integrate a wide range of approaches (bioinformatics, genomics, phylogenetics, population genetics) within R.

It seems that the present package does not calculate the individual relative probabilities of assignment to the different clusters as done in Coranderet al. This might be a valuable addition for future versions, and it would help to compare the results from different methods, for instance using the nice compoplot function in adegenet.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

References