What is a vkey?

Variant Key (vkey) is a one-to-one mapping between a genetic variant and a character string. Technically, a vkey string is a compression version of the string "genomic build + chromosome + start position + end position + alternate allele." It encodes the the string using the 64-character set:

0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz

For example: we can encode the variant chr1:876498-876498:G>A into the 14-character vkey _103KzH03KzH01, where

Similarly, for the insertion chr1:876498-876498:G>GTC the vkey will be _103KzH03KzH03q for the last three characters as 03 (alternate allele length = 3) and q (GTC = 110110 (base 2) = 54 = 'q' in character set).

Using this compression encoding, one can encode ANY the possible SNV using a 14-character string, and up to ~2,900-base insertion with 1000-character strings. In this version of DIVAS, we only allow vkey's of length < 1000 characters. Note all the imported variants are normalized based on left-alignment.

A python package that converts variant coordinate to vkey, or vkey to coordinate, is available at here.