ezGeno: An Automatic Model Selection Package for Genomic Data Analysis

2020 
To facilitate the process of tailor-making a deep neural network for exploring the dynamics of genomic DNA, we have developed a hands-on package called ezGeno that automates the search process of various parameters and network structure. ezGeno considers three different sets of search spaces, namely, the number of filters, dilation factors, and the connectivity between different layers. ezGeno can be applied to any kind of 1D genomic input such as genomic sequences, histone modifications, DNase feature data and so on. Combinations of multiple abovementioned 1D features are also applicable. Specifically, for the task of predicting TF binding using genomic sequences as the input, ezGeno can consistently return the best performing set of parameters and network structure, as well as highlight the important segments within the original sequences. For the task of predicting tissue-specific enhancer activity using both sequence and DNase feature data as the input, ezGeno also regularly outperforms the hand-designed models. In this study, we demonstrate that ezGeno is superior in efficiency and accuracy when compared to AutoKeras, a general open-source AutoML package. The average AUC of ezGeno is also consistently higher than the result of using a one-layer DeepBind model. With the flexibility of ezGeno, we expect that this package can provide future researchers not only support of model design in their analysis of genomic studies but also more insights into the regulatory landscape. AvailabilityThe ezGeno package can be freely accessed at https://github.com/ailabstw/ezGeno. ContactDr. Chien-Yu Chen, chienyuchen@ntu.edu.tw
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []