Genome assembly of the mind-controlling fly pathogen E. muscae
It started out as an ordinary dishpan of rotting fruit, used to collect wild fruit flies for an experiment. However, during clean-up, Carolyn Elya noticed some dead flies at the bottom of the dishpan. With evidence of fungal growth and their wings curiously raised at a 90° angle, Carolyn knew something nefarious was going on—these were zombie flies, infected with the fungal pathogen Entomophthora muscae.
A graduate student studying how microbes influence the behavior of animals at UC Berkeley, Carolyn immediately recognized the opportunity to use the E. muscae infection in lab flies as a system to study how a fungal pathogen can induce bizarre animal behavior on a molecular level.
How does E. muscae infect fruit flies and affect behavior?
- Flies become infected with muscae when a spore lands on them, drills through its cuticle into the hemolymph and begins to grow, first in the brain and central nervous system and then in the head, abdomen and thorax.
- Once the fungus runs out of nutrients, it induces stereotype behaviors in the fly. The fly climbs up to a high point, raises it’s wings to 90° and dies. The fungus induces the flies to die in this manner in order to position them for dissemination of spores from their dorsal abdomen.
- The fungus fills up the fly’s body and emerges, forming structures to launch the spores into the environment at an impressive ~20 mph (yes, that’s right), aiming to infect additional flies.
As a first step to looking more closely at the molecular mechanisms behind this process, Carolyn and her team set out to sequence the E. muscae genome and perform de novo assembly in hopes of building a reference genome. This did not turn out to be as straightforward as she had hoped, and she encountered several challenges—from HMW DNA sample preparation issues to a genome that was about 4x as big as hypothesized to approximately 83% repeat content.
One of the first hurdles was finding a sequencing approach that would meet their needs. Illumina® TruSeq® libraries and other short read prep methods failed to give assemblies. PacBio™ data from 2 SMRT cells was good, but the coverage was too low due to the large size of the genome (~1.2 GB estimated from TruSeq library data). Turning to UC Davis to investigate possibly of using mate pair libraries, it was suggested that they consider 10x Linked-Reads. Impressed with its low input (~1ng) requirements as well as its ability to resolve haplotypes (they had no idea about the ploidy), Carolyn decided to move forward with 10x Linked-Read libraries.
Despite some issues with sequencing quality and coverage, Carolyn and her team were able to create an assembly indicating a repeat-rich, diploid genome. Improvements are still needed, and they are planning on using low-coverage PacBio™ data (~4.3x) to scaffold and correct mis-assemblies. They are also working on annotating the genome with the help of transcriptome data.
Watch Carolyn’s 10x Bay Area User Group Meeting presentation "Genome assembly of the mind-controlling fly pathogen E. muscae via 10x Genomics" for detailed assembly information and some really cool zombie fly videos! WATCH THE VIDEO →
Additional Resources
- See the Chromium™ de novo Assembly Solution
- Learn more about Linked-Reads
- Read the Chromium™ de novo Assembly Application Note
- Learn more about the Supernova™ Assembler
Check out our other de novo assembly blog posts!