## Release 1.11 (22nd September 2020)


Changes affecting the whole of bcftools, or multiple commands:

* Filtering -i/-e expressions

    - Breaking change in -i/-e expressions on the FILTER column.  Originally
      it was possible to query only a subset of filters, but not an exact match.
      The new behavior is:

        FILTER="A"          .. exact match, for example "A;B" does not pass
        FILTER!="A"         .. exact match, for example "A;B" does pass
        FILTER~"A"          .. both "A" and "A;B" pass
        FILTER!~"A"         .. neither "A" nor "A;B" pass

    - Fix in commutative comparison operators, in some cases reversing sides
      would produce incorrect results (#1224; #1266)

    - Better support for filtering on sample subsests

    - Add SMPL_*/S* family of functions that evaluate within rather than across
      all samples. (#1180)

* Improvements in the build system


Changes affecting specific commands:

* bcftools annotate:

    - Previously it was not possible to use `--columns =TAG` with INFO tags
      and the `--merge-logic` feature was restricted to tab files with BEG,END
      columns, now extended to work also with REF,ALT.

    - Make `annotate -TAG/+TAG` work also with FORMAT fields. (#1259)

    - ID and FILTER can be transferred to INFO and ID can be populated from
      INFO.  However, the FILTER column still cannot be populated from an INFO
      tag because all possible FILTER values must be known at the time of
      writing the header (#947; #1187)

* bcftools consensus:

    - Fix in handling symbolic deletions and overlapping variants.
      (#1149; #1155; #1295)

    - Fix `--iupac-codes` crash on REF-only positions with `ALT="."`. (#1273)

    - Fix `--chain` crash. (#1245)

    - Preserve the case of the genome reference. (#1150)

    - Add new `-a, --absent` option which allows to set positions with no
      supporting evidence to "N" (or any other character). (#848; #940)

* bcftools convert:

    - The option `--vcf-ids` now works also with `-haplegendsample2vcf`. (#1217)

    - New option `--keep-duplicates`

* bcftools csq:

    - Add `misc/gff2gff.py` script for conversion between various flavors of
      GFF files. The initial commit supports only one type and was contributed
      by @flashton2003. (#530)

    - Add missing consequence types. (PR #1203; #1292)

    - Allow overlapping CDS to support ribosomal slippage. (#1208)

* bcftools +fill-tags:

    - Added new annotations: INFO/END, TYPE, F_MISSING.

* bcftools filter:

    - Make `--SnpGap` optionally filter also SNPs close to other variant types.
      (#1126)

* bcftools gtcheck:

    - Complete revamp of the command. The new version is faster and allows
      N:M sample comparisons, not just 1:N or NxN comparisons.
      Some functionality was lost (plotting and clustering) but may be added
      back on popular demand.

* bcftools +mendelian:

    - Revamp of user options, output VCFs with mendelian errors annotation,
      read PED files (thanks to Giulio Genovese).

* bcftools merge:

    - Update headers when appropriate with the '--info-rules *:join' INFO rule.
      (#1282)

    - Local alleles merging that produce LAA and LPL when requested, a draft
      implementation of https://github.com/samtools/hts-specs/pull/434 (#1138)

    - New `--no-index` which allows to merge unindexed files. Requires the input
      files to have chromosomes in th same order and consistent with the order
      of sequences in the header. (PR #1253; samtools/htslib#1089)

    - Fixes in gVCF merging. (#1127; #1164)

* bcftools norm:

    - Fixes in `--check-ref s` reference setting features with non-ACGT bases.
      (#473; #1300)

    - New `--keep-sum` switch to keep vector sum constant when splitting
      multiallelics. (#360)

* bcftools +prune:

    - Extend to allow annotating with various LD metrics: r^2,
      Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386).

* bcftools query:

    - New `%N_PASS()` formatting expression to output the number of samples
      that pass the filtering expression.

* bcftools reheader:

    - Improved error reporting to prevent user mistakes. (#1288)

* bcftools roh:

    - Several fixes and improvements
        - the `--AF-file` description incorrectly suggested "REF\tALT" instead
          of the correct "REF,ALT". (#1142)
        - RG lines could have negative length. (#1144)
        - new `--include-noalt` option to allow also ALT=. records. (#1137)

* bcftools scatter:

    - New plugin intended as a convenient inverse to `concat`
      (thanks to Giulio Genovese, PR #1249)

* bcftools +split:

    - New `--groups-file` option for more flexibility of defining desired
      output. (#1240)

    - New `--hts-opts` option to reduce required memory by reusing one
      output header and allow overriding the default hFile's block size
      with `--hts-opts block_size=XXX`. On some file systems (lustre) the
      default size can be 4M which becomes a problem when splitting files
      with 10+ samples.

    - Add support for multisample output and sample renaming

* bcftools +split-vep:

    - Add default types (Integer, Float, String) for VEP subfields and make
      `--columns -` extract all subfields into INFO tags in one go.


## Release 1.10.2 (19th December 2019)

This is a release fix that corrects minor inconsistencies discovered in
previous deliverables.


## Release 1.10 (6th December 2019)


* Numerous bug fixes, usability improvements and sanity checks were added
  to prevent common user errors.

* The -r, --regions (and -R, --regions-file) option should never create
  unsorted VCFs or duplicates records again. This also fixes rare cases where
  a spanning deletion makes a subsequent record invisible to `bcftools isec`
  and other commands.

* Additions to filtering and formatting expressions

    - support for the spanning deletion alternate allele (ALT=*)

    - new ILEN filtering expression to be able to filter by indel length

    - new MEAN, MEDIAN, MODE, STDEV, phred filtering functions

    - new formatting expression %PBINOM (phred-scaled binomial probability),
      %INFO (the whole INFO column), %FORMAT (the whole FORMAT column),
      %END (end position of the REF allele), %END0 (0-based end position
      of the REF allele), %MASK (with multiple files indicates the presence
      of the site in other files)

* New plugins

    - `+gvcfz`: compress gVCF file by resizing gVCF blocks according to
      specified criteria

    - `+indel-stats`: collect various indel-specific statistics

    - `+parental-origin`: determine parental origin of a CNV region

    - `+remove-overlaps`: remove overlapping variants.

    - `+split-vep`: query structured annotations such INFO/CSQ created by
      bcftools/csq or VEP

    - `+trio-dnm`: screen variants for possible de-novo mutations in trios

* `annotate`

    - new -l, --merge-logic option for combining multiple overlapping regions

* `call`

    - new `bcftools call -G, --group-samples` option which allows grouping
      samples into populations and applying the HWE assumption within but
      not across the groups.

* `csq`

    - significant reduction of memory usage in the local -l mode for VCFs
      with thousands of samples and 20% reduction in the non-local
      haplotype-aware mode.

    - fixes a small memory leak and formatting issue in FORMAT/BCSQ at
      sites with many consequences

    - do not print protein sequence of start_lost events

    - support for "start_retained" consequence

    - support for symbolic insertions (ALT="<INS...>"), "feature_elongation"
      consequence

    - new -b, --brief-predictions option to output abbreviated protein
      predictions.

* `concat`

    - the `--naive` command now checks header compatibility when concatenating
      multiple files.

* `consensus`

    - add a new `-H, --haplotype 1pIu/2pIu` feature to output first/second
      allele for phased genotypes and the IUPAC code for unphased genotypes

    - new -p, --prefix option to add a prefix to sequence names on output

* `+contrast`

    - added support for Fisher's test probability and other annotations

* `+fill-from-fasta`

    - new -N, --replace-non-ACGTN option

* `+dosage`

    - fix some serious bugs in dosage calculation

* `+fill-tags`

    - extended to perform simple on-the-fly calculations such as calculating
      INFO/DP from FORMAT/DP.

* `merge`

    - add support for merging FORMAT strings

    - bug fixed in gVCF merging

* `mpileup`

    - a new optional SCR annotation for the number of soft-clipped reads

* `reheader`

    - new -f, --fai option for updating contig lines in the VCF header

* `+trio-stats`

    - extend output to include DNM homs and recurrent DNMs

* VariantKey support



## Release 1.9 (18th July 2018)

* `annotate`

    - REF and ALT columns can be now transferred from the annotation file.

    - fixed bug when setting vector_end values.

* `consensus`

    - new -M option to control output at missing genotypes

    - variants immediately following insersions should not be skipped.  Note
      however, that the current fix requires normalized VCF and may still
      falsely skip variants adjacent to multiallelic indels.

    - bug fixed in -H selection handling

* `convert`

    - the --tsv2vcf option now makes the missing genotypes diploid, "./."
      instead of "."

    - the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with
      FILTER set to "PASS" or "." were expanded and the -i/-e options dropped
      sites completely. The new behavior is to let the -i/-e options control
      which records will be expanded. In order to drop records completely,
      one can stream through "bcftools view" first.

* `csq`

    - since the real consequence of start/splice events are not known,
      the amino acid positions at subsequent variants should stay unchanged

    - add `--force` option to skip malformatted transcripts in GFFs with
      out-of-phase CDS exons.

* `+dosage`: output all alleles and all their dosages at multiallelic sites

* `+fixref`: fix serious bug in -m top conversion

* `-i/-e` filtering expressions:

    - add two-tailed binomial test

    - add functions N_PASS() and F_PASS()

    - add support for lists of samples in filtering expressions, with many
      samples it was impractical to list them all on the command line. Samples
      can be now in a file as, e.g., GT[@samples.txt]="het"

    - allow multiple perl functions in the expressions and some bug fixes

    - fix a parsing problem, '@' was not removed from '@filename' expressions

* `mpileup`: fixed bug where, if samples were renamed using the `-G`
  (`--read-groups`) option, some samples could be omitted from the output file.

* `norm`: update INFO/END when normalizing indels

* `+split`: new -S option to subset samples and to use custom file names
  instead of the defaults

* `+smpl-stats`: new plugin

* `+trio-stats`: new plugin

* Fixed build problems with non-functional configure script produced on
  some platforms


## Release 1.8 (April 2018)

* `-i, -e` filtering: Support for custom perl scripts

* `+contrast`: New plugin to annotate genotype differences between groups
  of samples

* `+fixploidy`: New options for simpler ploidy usage

* `+setGT`: Target genotypes can be set to phased by giving `--new-gt p`

* `run-roh.pl`: Allow to pass options directly to `bcftools roh`

* Number of bug fixes


## Release 1.7 (February 2018)

* `-i, -e` filtering: Major revamp, improved filtering by FORMAT fields
  and missing values. New GT=ref,alt,mis etc keywords, check the documentation
  for details.

* `query`: Only matching expression are printed when both the -f and -i/-e
  expressions contain genotype fields. Note that this changes the original
  behavior. Previously all samples were output when one matching sample was
  found. This functionality can be achieved by pre-filtering with view and then
  streaming to query. Compare
        bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
  and
        bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'

* `annotate`: New -k, --keep-sites option

* `consensus`: Fix --iupac-codes output

* `csq`: Homs always considered phased and other fixes

* `norm`: Make `-c none` work and remove `query -c`

* `roh`: Fix errors in the RG output

* `stats`: Allow IUPAC ambiguity codes in the reference file; report the number of missing genotypes

* `+fill-tags`: Add ExcHet annotation

* `+setGt`: Fix bug in binom.test calculation, previously it worked only for nAlt<nRef!

* `+split`: New plugin to split a multi-sample file into single-sample files in one go

* Improve python3 compatibility in plotting scripts



## Release 1.6 (September 2017)

* New `sort` command.

* New options added to the `consensus` command. Note that the `-i, --iupac`
  option has been renamed to `-I, --iupac`, in favor of the standard
  `-i, --include`.

* Filtering expressions (`-i/-e`): support for `GT=<type>` expressions and
  for lists and ranges (#639) - see the man page for details.

* `csq`: relax some GFF3 parsing restrictions to enable using Ensembl
  GFF3 files for plants (#667)

* `stats`: add further documentation to output stats files (#316) and
  include haploid counts in per-sample output (#671).

* `plot-vcfstats`: further fixes for Python3 (@nsoranzo, #645, #666).

* `query` bugfix (#632)

* `+setGT` plugin: new option to set genotypes based on a two-tailed binomial
  distribution test. Also, allow combining `-i/-e` with `-t q`.

* `mpileup`: fix typo (#636)

* `convert --gvcf2vcf` bugfix (#641)

* `+mendelian`: recognize some mendelian inconsistencies that were
  being missed (@oronnavon, #660), also add support for multiallelic
  sites and sex chromosomes.


## Release 1.5 (June 2017)

* Added autoconf support to bcftools. See `INSTALL` for more details.

* `norm`: Make norm case insensitive (#601). Trim the reference allele (#602).

* `mpileup`: fix for misreported indel depths for reads containing adjacent
  indels (3c1205c1).

* `plot-vcfstats`: Open stats file in text mode, not binary (#618).

* `fixref` plugin: Allow multiallelic sites in the `-i, --use-id reference`.
  Also flip genotypes, not just REF/ALT!

* `merge`: fix gVCF merge bug when last record on a chromosome opened a
  gVCF block (#616)

* New options added to the ROH plotting script.

* `consensus`: Properly flush chain info (#606, thanks to @krooijers).

* New `+prune` plugin for pruning sites by LD (R2) or maximum number of
  records within a window.

* New N_MISSING, F_MISSING (number and fraction missing) filtering
  expressions.

* Fix HMM initialization in `roh` when snapshots are used in multiple
  chromosome VCF.

* Fix buffer overflow (#607) in `filter`.


## Release 1.4.1 (8 May 2017)

* `roh`: Fixed malfunctioning options `-m, --genetic-map` and `-M, --rec-rate`,
  and newly allowed their combination. Added a convenience wrapper `misc/run-roh.pl`
  and an interactive script for visualizing the calls `misc/plot-roh.py`.

* `csq`: More control over warning messages (#585).

* Portability improvements (#587). Still work to be done on this front.

* Add support for breakends to `view`, `norm`, `query` and filtering (#592).

* `plot-vcfstats`: Fix for python 2/3 compatibility (#593).

* New `-l, --list` option for `+af-dist` plugin.

* New `-i, --use-id` option for `+fix-ref` plugin.

* Add `--include/--exclude` options to `+guess-ploidy` plugin.

* New `+check-sparsity` plugin.

* Miscellaneous bugfixes for #575, #584, #588, #599, #535.


## Release 1.4 (13 March 2017)

Two new commands - `mpileup` and `csq`:

* The `mpileup` command has been imported from samtools to bcftools. The
  reasoning behind this is that bcftools calling is intimately tied to mpileup
  and any changes to one, often requires changes to the other. Only the
  genotype likelihood (BCF output) part of mpileup has moved to bcftools,
  while the textual pileup output remains in samtools. The BCF output option
  in `samtools mpileup` will likely be removed in a release or two or when
  changes to `bcftools call` are incompatible with the old mpileup output.

  The basic mpileup functionality remains unchanged as do most of the command
  line options, but there are some differences and new features that one
  should be aware of:

  - The option `samtools mpileup -t, --output-tags` changed to `bcftools
    mpileup -a, --annotate` to avoid conflict with the `-t, --targets`
    option common across other bcftools commands.

  - `-O, --output-BP` and `-s, --output-MQ` are no longer used as they are
    only for textual pipelup output, which is not included in `bcftools
    mpileup`. `-O` short option reassigned to `--output-type` and `-s`
    reassigned to `--samples` for consistency with other bcftools commands.

  - `-g, --BCF`, `-v, --VCF`, and ` -u, --uncompressed` options from
    `samtools mpileup` are no longer used, being replaced by the
    `-O, --output-type` option common to other bcftools commands.

  - The `-f, --fasta-ref` option is now required by default to help avoid user
    errors. Can be disabled using `--no-reference`.

  - The option `-d, --depth .. max per-file depth` now behaves as expected
    and according to the documentation, and prints a meaningful diagnostics.

  - The `-S, --samples-file` can be used to rename samples on the fly. See man
    page for details.

  - The `-G, --read-groups` functionality has been extended to allow
    reassignment, grouping and exclusion of readgroups. See man page for
    details.

  - The `-l, --positions` replaced by the `-t, --targets` and
    `-T, --targets-file` options to be consistent with other bcftools
    commands.

  - gVCF output is supported. Per-sample gVCFs created by mpileup can be
    merged using `bcftools merge --gvcf`.

  - Can generate mpileup output on multiple (indexed) regions using the
    `-r, --regions` and `-R, --regions-file` options. In samtools, one
    was restricted to a single region with the `-r, --region` option.

  - Several speedups thanks to @jkbonfield (cf3a55a).

* `csq`: New command for haplotype-aware variant consequence calling.
  See man page and [paper](https://www.ncbi.nlm.nih.gov/pubmed/28205675).


Updates, improvements and bugfixes for many other commands:

* `annotate`: `--collapse` option added. `--mark-sites` now works with
  VCF files rather than just tab-delimited files. Now possible to annotate
  a subset of samples from tab file, not just VCF file (#469). Bugfixes (#428).

* `call`: New option `-F, --prior-freqs` to take advantage of prior knowledge
    of population allele frequencies. Improved calculation of the QUAL score
    particularly for REF sites (#449, 7c56870). `PLs>=256` allowed in
    `call -m`. Bugfixes (#436).

* `concat --naive` now works with vcf.gz in addition to bcf files.

* `consensus`: handle variants overlapping region boundaries (#400).

* `convert`: gvcf2vcf support for mpileup and GATK. new `--sex` option to
  assign sex to be used in certain output types (#500). Large speedup of
  `--hapsample` and `--haplegendsample` (e8e369b) especially with `--threads`
  option enabled. Bugfixes (#460).

* `cnv`: improvements to output (be8b378).

* `filter`: bugfixes (#406).

* `gtcheck`: improved cross-check mode (#441).

* `index` can now specify the path to the output index file. Also, gains the
   `--threads` option.

* `merge`: Large overhaul of `merge` command including support for merging
  gVCF files created by `bcftools mpileup --gvcf` with the new `-g, --gvcf`
  option. New options `-F` to control filter logic and `-0` to set missing
  data to REF. Resolved a number of longstanding issues (#296, #361, #401,
  #408, #412).

* `norm`: Bugfixes (#385,#452,#439), more informative error messages (#364).

* `query`: `%END` plus `%POS0`, `%END0` (0-indexed) support - allows easy BED
  format output (#479). `%TBCSQ` for use with the new `csq` command. Bugfixes
  (#488,#489).

* `plugin`: A number of new plugins:

  - `GTsubset` (thanks to @dlaehnemann)
  - `ad-bias`
  - `af-dist`
  - `fill-from-fasta`
  - `fixref`
  - `guess-ploidy` (deprecates `vcf2sex` plugin)
  - `isecGT`
  - `trio-switch-rate`

  and changes to existing plugins:

  - `tag2tag`: Added `gp-to-gt`, `pl-to-gl` and `--threshold` options and
    bugfixes (#475).
  - `ad-bias`: New `-d` option for minimum depth.
  - `impute-info`: Bugfix (49a9eaf).
  - `fill-tags`:  Added ability to aggregate tags for sample subgroups, thanks
    to @mh11. (#503). HWE tag added as an option.
  - `mendelian`: Bugfix (#566).

* `reheader`: allow muiltispace delimiters in `--samples` option.

* `roh`: Now possible to process multiple samples at once. This allows
  considerable speedups for files with thousands of samples where the cost of
  HMM is neglibible compared to I/O and decompressing. In order to fit tens of
  thousands samples in memory, a sliding HMM can be used (new `--buffer-size`
  option). Viterbi training now uses Baum-Welch algorithm, and works much
  better. Support for gVCFs or FORMAT/PL tags. Added `-o, output` and
  `-O, --output-type` options to control output of sites or regions
  (compression optional). Many bugs fixed - do not segfault on missing PL
  values anymore, a typo in genetic map calculation resulted in a slowdown and
  incorrect results.

* `stats`: Bugfixes (16414e6), new options `-af-bins` and `-af-tags` to control
  allele frequency binning of output. Per-sample genotype concordance tables
  added (#477).

* `view -a, --trim-alt-alleles` various bugfixes for missing data and more
  informative errors should now be given on failure to pinpoint problems.


General changes:

* Timestamps are now added to header lines summarising the command (#467).

* Use of the `--threads` options should be faster across the board thanks to
  changes in HTSlib meaning meaning threads are now shared by the compression
  and decompression calls.

* Changes to genotype filtering with `-i, --include` and `-e, --exclude` (#454).


## Noteworthy changes in release 1.3.1 (22 April 2016)

* The `concat` command has a new `--naive` option for faster operations on
  large BCFs (PR #359).
* `GTisec`: new plugin courtesy of David Laehnemann (@dlaehnemann) to count
  genotype intersections across all possible sample subsets in a VCF file.
* Numerous VCF parsing fixes.
* Build fix: _peakfit.c_ now builds correctly with GSL v2 (#378).
* Various bug fixes and improvements to the `annotate` (#365), `call` (#366),
  `index` (#367), `norm` (#368, #385), `reheader` (#356), and `roh` (#328)
  commands, and to the `fill-tags` (#345) and `tag2tag` (#394) plugins.
* Clarified documentation of `view` filter options, and of the
  `--regions-file` and `--targets-file` options (#357, #411).


## Noteworthy changes in release 1.3 (15 December 2016)

* `bcftools call` has new options `--ploidy` and `--ploidy-file` to make
  handling sample ploidy easier. See man page for details.
* `stats`: `-i`/`-e` short options changed to `-I`/`-E` to be consistent with
  the filtering `-i`/`-e` (`--include`/`--exclude`) options used in other
  tools.
* general `--threads` option to control the number of output compression
  threads used when outputting compressed VCF or BCF.
* `cnv` and `polysomy`: new commands for detecting CNVs, aneuploidy, and
  contamination from SNP genotyping data.
* various new options, plugins, and bug fixes, including #84, #201, #204,
  #205, #208, #211, #222, #225, #242, #243, #249, #282, #285, #289, #302,
  #311, #318, #336, and #338.


## Noteworthy changes in release 1.2 (2 February 2016)

* new `bcftools consensus` command
* new `bcftools annotate` plugins: fixploidy, vcf2sex, tag2tag
* more features in `bcftools convert` command, amongst others new
  `--hapsample` function (thanks to Warren Kretzschmar @wkretzsch)
* support for complements in `bcftools annotate --remove`
* support for `-i`/`-e` filtering expressions in `bcftools isec`
* improved error reporting
* `bcftools call`
  - the default prior increased from `-P 1e-3` to `-P 1.1e-3`, some clear
    calls were missed with default settings previously
  - support for the new symbolic allele `<*>`
  - support for `-f GQ`
  - bug fixes, such as: proper trimming of DPR tag with `-c`; the `-A` switch
    does not add back records removed by `-v` and the behaviour has been made
    consistent with `-c` and `-m`
* many bug fixes and improvements, such as
  - bug in filtering, FMT & INFO vs INFO & FMT
  - fixes in `bcftools merge`
  - filter update AN/AC with `-S`
  - isec outputs matching records for both VCFs in the Venn mode
  - annotate considers alleles when working with `Number=A,R` tags
  - new `--set-id` feature for annotate
  - `convert` can be used similarly to `view`
