HMR GALA: HUMAN, MOUSE, RAT GENOME ALIGNMENT AND ANNOTATION DATABASE(S)

Belinda Giardine1,4, Yi Zhang1,3, Cathy Riemer1,2, Laura Elnitski1,3, Prachi Shah1,4, Scott Schwartz1,2, Matt Weirauch1,2, James Taylor1,2, Webb Miller1,2,4, and Ross C. Hardison1,3

Center for Comparative Genomics and Bioinformatics1, Departments of Computer Science and Engineering2, Biochemistry and Molecular Biology3 and Biology4 at Penn State University

GALA is a set of relational databases containing alignments and annotations of sequenced genomes, including human, mouse, and rat with plans to add chimp and chicken. Currently each database serves as a repository for the species of interest, however, they are all linked together by orthologous genes and alignment positions.

The GALA database(s) provide user-friendly querying capacity through a query page interface that allows a user to query on any annotation track, in any order or combination (choices include genes, promoters, CpG islands, transcription factor binding sites, restriction sites, and much more), without the need to write SQL queries. Flexibility and ease of use are two prominent features of the database(s). For instance, Blastz alignments provided for pairwise and multi-species comparisons can be combined with other annotation tracks to build a complex query that incorporates information about evolutionary divergence. These alignments are the source for computational analyses that yielded tracks such as regulatory potential score (PSU), phyloHMM (UCSC Human Genome Browser) and conserved transcription factor binding sites (PSU and UCSC), all available in GALA.

As part of tying the multi-species databases together, a query in human GALA converted to the orthologous position in the mouse sequence can be stored and used as a query in the mouse database through the history page. Furthermore, we are linking GALA and our other databases (HbVar and dbERGE) by allowing queries to be stored in the GALA history page. The original query results are incorporated as a "custom" track and can be further refined through additional complex queries.

Output options are again numerous and flexible. For example, GALA has its own text display, can generate machine readable lists, as well as porting data to other programs for graphical displays, such as bar graphs, alignments using Laj (a versatile, interactive alignment viewer implemented in Java), and custom tracks for the UCSC Genome Browser. Future plans are to add a DAS interface so that the results can be viewed at other databases such as Ensembl. The database is available on-line at http://bx.cse.psu.edu/.