about.

This is a collection of applications for genomics data processing, primarily high-throughput next-generation sequencing. There is a focus on processing data in Wiggle format, since many other tools are available for SAM/BAM (1,2), Bed (1,2), Fastq, etc. Wiggle/BigWig formats provide a compact way to store numerical data resulting from ChIP-seq, MNase-seq, FAIRE-seq, and DNase-seq experiments. This toolkit provides applications for adding, subtracting, dividing, multiplying, log-transforming, averaging, Z-scoring, and smoothing Wig files. There are also tools for performing analysis of MNase-seq (nucleosome mapping) data, creating heatmaps, and averaging the values from aligned loci.

Tools may be run from the terminal or from Galaxy.

All tools are designed to process data in chunks so that memory requirements never exceed ~1GB, regardless of genome size. Tools are intended to be modular, so that multiple tools can easily be strung together into ad hoc pipelines or workflows in Galaxy. For example, a common pipeline for our ChIP-seq experiments is: 1) map reads with bowtie, 2) calculate coverage of sequencing reads, 3) normalize by subtracting input, 4) Z-score the normalized coverage, 5) correlate replicates, 6) average multiple replicates, and 7) make a heatmap of the final result.

tools.

For an up-to-date list of available tools, search for java-genomics-toolkit in the Galaxy Tool Shed.

converters.

dna.

ngs.

nucleosomes.

visualization.

wigmath.

usage.

galaxy.

One-click installation is available for your local Galaxy instance through the Galaxy Tool Shed.

If you run a production Galaxy server, configuration files are provided for loading the applications into Galaxy manually. Unzip or check out the java-genomics-toolkit distribution into Galaxy's "tools" folder, and add the supplied tool_conf entries to your tool_conf.xml file.

shell.

Tools can also be run on the terminal, and helper scripts are provided for convenience. For more information and usage examples, see the GitHub page.

requirements.

java-genomics-toolkit requires Java 7, available at oracle.com.

download.

The recommended way to obtain the toolkit is to check out the source code from GitHub and build it using the provided Ant build script (simply call "ant").

In addition, precompiled, ready-to-use packages that include the JRE v7 are available for Linux platforms in x32 and x64 flavors. If you want to try out the toolkit, this may be the quickest option.

to.do.

java-genomics-io

Those wishing to write their own scripts may be interested in java-genomics-io, the library upon which these applications are built. This library supports iterating or querying for data from Bed, BedGraph, GeneTrack, GFF, SAM, BAM, Wiggle, BigWig, and BigBed files with a consistent interface. ASCII files are indexed with Tabix as needed to perform queries efficiently. Writers are also available for writing Bed, BedGraph, GFF, SAM, BAM, and Wig files.

licensing.info.

java-genomics-toolkit is distributed under the GNU General Public License v3. See the included license.txt for more details.

java-genomics-toolkit was created by Timothy Palpant for work in the Lieb laboratory at UNC Chapel Hill.

java-genomics-toolkit utilizes multiple external libraries including:

contact.me.

Please contact me at tim [at] palpant.us with bugs, questions, comments, or suggestions.