A Dataset of Sequences with Manually Curated V(D)J Designations

A quality control of bioinformatics analysis

The dataset contains sequences with manually curated V(D)J designations. These designations were checked by hand, possibly with the help of some bioinformatics tools. Tests may range from very easy cases with unambiguous V(D)J designations to borderline or difficult cases, including incomplete or unusual recombinations or translocations.

This collection of sequences, distributed as open-source data, was initially conceived to test the robustness of the Vidjil software. It may now help to check the robustness of any software doing immune repertoire sequencing (RepSeq) analysis, and to compare human and computed annotations.

As of January 2021, the dataset contains 500+ sequences.

Documentation and reference: M. Salson and al., A Dataset of Sequences with Manually Curated V(D)J Designations, RepSeq 2016 workshop at ECCB 2016

Updates can be sent to contact@vidjil.org. The dataset and additional documentation will soon be available.

January 2021