Chromatin accessibility in developing Drosophila melanogaster embryos at single-cell resolution

Publication
Tutorial
GitHub

Study Design

Refine cell clustering to identify cell-types within germ layersCompare chromatin of germ layers through differential accessibilityOrder cells along developmental trajectories

Refine cell clustering to identify cell-types within germ layers

Our analysis used the similarity of chromatin accessibility landscapes in individual cells to identify how cell types become epigenomically encoded in the developing fly embryo. At 6-8 hours after egg laying, we identified 18 cell clusters consistent with individual cell types. You can use the code we provide here to see how we clustered cells and explore how various features are distributed across these clusters.

Compare chromatin of germ layers through differential accessibility

Having identified clusters of cells with distinct patterns of chromatin accessibility, an obvious next concern is which sites are open in one germ layer, but not the others. To address this question, we used a logistic regression framework to identify sites that were significantly more accessible in one germ layer. We present some of the code used to do that in this section of the tutorial.

Order cells along developmental trajectories

Finally, we show in the paper that we can arrange these cells along a developmental trajectory in order to learn about the timing of individual regulatory elements opening and closing in development. For this analysis, we focused on the embryos from the earliest time point in our data set (2-4 hours after egg laying). The code for doing this kind of analysis is presented in the fourth section of the tutorial.

Download

Accessibility Matrices

Here we make peak of accessibility by cell matrices for all three time points. Each matrix is a sparse matrix file in .rds format. Requires the Matrix package in R to open.

t-SNE Coordinates

Here we make t-SNE coordinates for each timepoint available. See the manuscript for how t-SNE coordinates were calculated. Each file is a gzipped .tsv and includes 3 columns: Cell ID column, x-axis coordinate column, y-axis coordinate column.

Cluster Assignments

Here we make the density peak-based cluster assignments (derived from the t-SNE coordinates) for each timepoint available. See the manuscript for how t-SNE coordinates were calculated and how density peak clusters were identified. Each file is a gzipped .tsv and includes 2 columns: Cell ID column, cluster assignment.