Rayan Chikhi rayan.chikhi@pasteur.fr @RayanChikhi I am a researcher and new group leader at Institut Pasteur, also advising group members at University of Lille, France. In non-technical terms, my work consists of analyzing genomes using computers. Scientists can read the DNA of humans, plants, animals, using sequencing instruments. This has transformed biology in the last decade, e.g. to identify mutations in genes, including those that are linked to diseases; to study evolution; and so much more. We would like to have a complete and precise understanding of genomes, but this is not straightforward: sequencing data is challenging to process. So, people like me develop methods to do the analysis. In technical terms, my interests range from fundamental data structures and algorithms, to their implementation and execution in the context of DNA and RNA sequencing. Part of my expertise is on the de novo assembly of genomes. Recently, I contributed to the assembly of the giraffe genome and the gorilla Y-chromosome. Short bio I studied Computer Science at ENS Rennes and obtained a PhD in 2012 under the supervision of Dominique Lavenier. After a postdoc at Penn State in Paul Medvedev's lab, CNRS hired me as a junior researcher in 2014 and I was part of the Bonsai bioinformatics team. I still supervise researchers there. In 2019 I started a "Sequence Bioinformatics" research group at the Center of Bioinformatics, Biostatistics and Integrative Biology of Institut Pasteur, partly funded by the Inception program. Research topics Genome analysis Algorithms and data structures De novo assembly
Rayan Chikhi

Group members In Lille Pierre Marijon (PhD student) Mael Kerbiriou (engineer) Camille Marchet (postdoc) In Paris Yoann Dufresne (research scientist) You? Positions will open in 2019 for Postdocs and Master's students with strong computational background. Software Minia assembler Whole genome de novo assembler with very low memory usage, described in [11]. Kmergenie Automatic detection of the k-mer size for de novo assembly, described in [14]. DSK K-mer counting software, low-memory, low disk usage, supports large values of k, described in [13]. BCALM 2 Very scalable de Bruijn graph compaction, described in [24]. GATB Library C++ library for the development of reference-free Illumina data analysis software, described in [17]. Publications [34] V. Crawford, A. Kuhnle, C. Boucher, R. Chikhi, T. Gagie, Practical Dynamic de Bruijn Graphs, Bioinformatics (2018) [PDF] [33] R. Chikhi, V. Jovicic, S. Kratsch, P. Medvedev, M. Milanic, S. Raskhodnikova, N. Varma, Bipartite Graphs of Small Readability, COCOON (2018) [PDF] [32] R. Chikhi, A. Schönhuth, Dualities in Tree Representations, CPM (2018) [PDF] [31] A Kuosmanen et al., Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-Linear Chaining Extended, RECOMB (2018) [Conference PDF] TALG 2019 [Journal PDF] [30] J. Audoux et al., DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition, Genome Biology (2017) [Open-access] [29] S. Rangavittal et al., RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly, Bioinformatics (2017) [PDF] [28] A. Sczyrba et al., Critical Assessment of Metagenome Interpretation-A Benchmark of Metagenomics Software, Nature Methods (2017) [PDF] [27] A. Limasset, G. Rizk, R. Chikhi, P. Peterlongo, Fast and scalable minimal perfect hashing for massive key sets, SEA (2017) [PDF] [26] C. Sun, R. S. Harris, R. Chikhi, P. Medvedev, AllSome Sequence Bloom Trees, RECOMB (2017) [PDF] [25] The Computational Pan-Genomics Consortium, Computational pan-genomics: status, promises and challenges, Briefings in Bioinformatics (2016) [PDF] [24] R. Chikhi, A. Limasset, P. Medvedev, Compacting de Bruijn graphs from sequencing data quickly and in low memory, ISMB (2016) [PDF] [23] M. Agaba et al., Giraffe genome sequence reveals clues to its unique morphology and physiology, Nature Communications (2016) [PDF] [22] M. Tomaszkiewicz et al., A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y, Genome Research (2016) [PDF] [21] K. Sahlin, R. Chikhi, L. Arvestad, Genome scaffolding with PE-contaminated mate-pair libraries, WABI (2015) [Open-access] [20] R. Chikhi, P. Medvedev, M. Milanic, S. Raskhodnikova, On the readability of overlap digraphs, CPM (2015) and Discrete Applied Mathematics (2016) [Open-access] [19] R. Uricaru et al., Reference-free detection of isolated SNPs, Nucleic Acids Research (2014) [Open-access] [Webpage] [18] G. Rizk, A. Gouin, R. Chikhi, C. Lemaitre, MindTheGap: integrated detection and assembly of short and long insertions, Bioinformatics (2014) [Open-access] [Webpage] [17] E. Drezen et al., GATB: Genome Assembly & Analysis Tool Box, Bioinformatics (2014) [Open-access] [Webpage] [16] R. Chikhi, A. Limasset, S. Jackman, J. Simpson, P. Medvedev, On the representation of de Bruijn graphs, RECOMB (2014) [PDF] [15] K. R. Bradnam et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, GigaScience (2013) [PDF] [14] R. Chikhi, P. Medvedev, Informed and Automated k-Mer Size Selection for Genome Assembly, Bioinformatics (2013), HiTSeq (2013) Best Paper Award [PDF] [Webpage] [13] G. Rizk, D. Lavenier, R. Chikhi, DSK: k-mer counting with very low memory usage, Bioinformatics (2013) [PDF] [Webpage] [12] N. Maillet, C. Lemaitre, R. Chikhi, D. Lavenier, P. Peterlongo, Compareads: comparing huge metagenomic experiments, RECOMB Comparative Genomics (2012) [PDF] [Webpage] [11] R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI (2012) [PDF] [Webpage] [10] P. Peterlongo, R. Chikhi, Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer, BMC Bioinformatics (2012) [PDF] [Webpage] [9] G. Sacomoto et al., KisSplice: de novo calling alternative splicing events from RNA-seq data, RECOMB-seq, BMC Bioinformatics (2012) [PDF] [Webpage] [8] D. A. Earl et al., Assemblathon 1: A competitive assessment of de novo short read assembly methods, Genome Research (2011) [PDF] [7] G. Chapuis, R. Chikhi, D. Lavenier, Parallel and memory-efficient reads indexing for genome assembly, PPAM Parallel Bio-Computing Workshop (2011) [PDF] [6] R. Chikhi, D. Lavenier, Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph, WABI (2011) [PDF] [5] R. Chikhi, L. Sael, D. Kihara, Protein binding ligand prediction using moment-based methods, Protein function prediction for omics era, D. Kihara ed., Springer (2011) [PDF] [4] D. Kihara, L. Sael, R. Chikhi, J. Esquivel-Rodriguez, Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking, Curr. Protein and Peptide Science (2010) [PDF] [3] R. Chikhi, L. Sael, D. Kihara, Real-time ligand binding pocket database search using local surface descriptors. Proteins: Structure, Function, and Bioinformatics (2010) [PDF] [2] R. Chikhi, D. Lavenier, Paired-end read length lower bounds for genome re-sequencing (Meeting Abstract) BMC Bioinformatics (2009) [PDF] [1] R. Chikhi, S. Derrien, A. Noumsi, P. Quinton, Combining flash memory and FPGAs to efficiently implement a massively parallel algorithm for content-based image retrieval, International Journal of Electronics (2008) [PDF] Talks Evomics Workshop on Genomics, 2019, de novo assembly & reference-free analysis [PDF] [Lab] BiG seminar, 2018, Large genome assembly [YouTube] [PDF] CGSI, 2018, k-mer data structures [YouTube] [PDF] CGSI, 2018, Metagenome assembly methods [YouTube] [PDF] CPM, 2018, Dualities in tree representations [PDF] Mosaic Webinar, 2018, Minia's entry at Mosaic Strains1 assembly challenge [PDF] Evomics Workshop on Genomics, 2018, de novo assembly & k-mers [PDF] [Lab] RNA-Seq Nanopore @ Evry, 2017, A review of RNA-seq nanopore read correction [PDF] BiATA, 2017, Ingredients for de novo (meta)genome assembly [PDF] Colib'Read Workshop, 2016, Graph representations of reference-free sequencing data [PDF] ISMB, 2016, Compacting de Bruijn graphs from sequencing data quickly and in low memory [PDF] ALEA, 2016, On the representation of de Bruijn graphs (focusing on navigational data structures) [PDF] SMPGD keynote, 2016, de Bruijn graphs of sequencing data [PDF] Evomics Workshop on Genomics, 2016, de novo assembly [PDF] [Lab] RECOMB, 2014, On the representation of de Bruijn graphs [PDF] Evomics Workshop on Genomics, 2014, de novo assembly [PDF] [Blog post] [Lab] ISMB/HiTSeq, 2013, Informed and Automated k-Mer Size Selection for Genome Assembly [PDF] Evomics Workshop on Genomics, 2013, de novo assembly (introduction) [PDF] WABI, 2012, Space-efficient and exact de Bruijn graph representation based on a Bloom filter [PDF] Thesis slides, 2012, Computational methods for de novo assembly of NGS data [PDF] WABI, 2011, Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph [PDF] IBL, 2011, de novo assembly tools, Monument, Mapsembler [PDF] ISCBSC, 2009, Paired-end read length lower bounds for genome re-sequencing [PDF] Reports R. Chikhi, Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data, PhD Thesis, 2008-2012 [PDF] Summary: We discuss computational methods (theoretical models and algorithms) to perform the reconstruction (de novo assembly) of DNA sequences produced by high-throughput sequencers. This thesis introduces the following contributions - quantification of the maximum theoretical genome coverage achievable by recent sequencing data (Chapter 2) - theoretical models for paired-end assembly (Chapter 3) - two concepts for practical assembly: localized assembly and memory-efficient paired reads indexing (Chapter 4) - implementation details of a de novo assembly software, the Monument assembler (Chapter 5) - an algorithm that enumerates variants in sequencing data, implemented in the Mapsembler software (Chapter 6) R. Chikhi, Study of Unentanglement in Quantum Computing, Manuscript, research internship at MIT, Spring 2008 [PDF] Summary: We investigate the conjecture that one cannot simulate QMA(2) protocols in QMA using a quantum operation called a disentangler. Our results show that, when exponential precision is required, this conjecture holds unless P = NP. Moreover, also in the exponential precision case, we show that one only needs a stronger hypothesis to prove the conjecture. R. Chikhi, Protein surface descriptors for binding sites comparison and ligand prediction, Manuscript, research internship at Purdue University, Summer 2007 [PDF] Summary: We present a model for two dimensional ligand binding pockets representation and we apply it to pocket-pocket matching and binding ligand prediction. Retired software Mapsembler Targeted assembly on a desktop computer, see reference [10]. Paired reads repetitions Software package for computing the ratio of single and paired (as in paired NGS reads) exact repetitions within a genome. Useful for obtaining re-sequencing lower bounds inspired by [Whiteford 05]. See [2] and the corresponding talk for sample results and details. Monument Whole genome de novo assembler, described in [6] and [7] and [Phd Thesis]. (recommended instead: Minia) de Bruijn graph construction Hash table-free implementation of the de Bruijn graph for a set of reads. Also includes a tool that computes the union of two de Bruijn graphs and the cartesian product of abundances, useful for construction a multi-dataset de Bruijn graph. (recommended instead: BCALM 2) Pocket-Surfer Protein ligand binding pocket type prediction using a database of known binding sites. See [3] for more details.(recommended instead: 3D-Surfer)