pkgdown/extra.css

Skip to contents

[Experimental] The cluster_tcrs() function aggregates all of the paired TCRs found in the data, calculates pairwise similarity using the va, vb, cdr3a, and cdr3b regions (via TCRdist), and clusters the results using the Leiden algorithm.

Usage

cluster_tcrs(
  data,
  tcrdist_cutoff = 90,
  resolution = 0.1,
  with_db = TRUE,
  db = TIRTLtools::vdj_db,
  allow_self_edges = TRUE,
  remove_MAIT = TRUE
)

Arguments

data

a list of TIRTLseq TCR data for samples created with load_tirtlseq()

tcrdist_cutoff

the TCRdist() function will only record TCRdist values less than or equal to the cutoff. Default is 90. Note: Higher cutoffs will return more data, at most NxN where N is the number of unique TCRs.

resolution

the "resolution" parameter for the Leiden algorithm. A lower value will produce larger clusters and a higher value will produce smaller clusters. Typical values are in the 0.1 - 2.0 range. A higher value may be better for densely connected data while a lower value may be better for moderately connected data. Default is 0.1.

with_db

if TRUE, observed clones will be compared and clustered with a dataframe of annotated clones. By default, a dataframe with VDJ-db annotations is used.

db

a data frame with annotated TCRs. The default is the VDJ-db database.

allow_self_edges

if FALSE, only calculate TCRdist between input data TCRs and the TCR annotation database (db). If TRUE, calculate pairwise TCRdist for all of the data including the input and the annotated TCRs.

remove_MAIT

remove MAIT TCRs before clustering (default is TRUE)

Value

Returns a list with the following elements:

$df - a data frame with all unique TCRs along with cluster annotations

$dist_df - a data frame with distances (TCRdist) between TCR pairs in long format

$sparse_adj_mat - an adjacency matrix (in sparse format) marking TCR pairs with TCRdist <= tcrdist_cutoff

$graph_adj - an igraph object created from the adjacency matrix

$tcrdist_cutoff - the cutoff used for TCRdist

$resolution - the resolution parameter used for the Leiden algorithm

Details

The function also filters the dataset to TCRs that are valid for TCRdist.

The following TCRs are removed:

  • TCRs that contain stop codons (*) or frame shifts (_) in their cdr3a or cdr3b regions

  • TCRs that contain a cdr3 region with 5 or less amino acids

  • TCRs that contain a v segment allele not found in our parameter table

V-segments that do not specify an allele (e.g. "TRAV1-2" instead of "TRAV1-2*01") will be assigned to the "*01" allele.

Examples

folder = system.file("extdata/SJTRC_TIRTLseq_minimal",
  package = "TIRTLtools")
sjtrc = load_tirtlseq(folder,
  meta_columns = c("marker", "timepoint", "version"), sep = "_",
  chain = "paired", verbose = FALSE)
df = get_all_tcrs(sjtrc, chain="paired", remove_duplicates = TRUE)

result = cluster_tcrs(df)
#> Removed 1,583 MAIT TCRs (2.2%) from a total of 71,206 TCRs.
#> Removed 999 MAIT TCRs (3.1%) from a total of 32,164 TCRs.
#> Removed 452 TCRs with unknown V-segments (0.65%) from a total of 69,623 TCRs.
#> Removed 15 TCRs with short CDR3 segments (0.022%) from a total of 69,171 TCRs.
#> Removed 13,137 TCRs with non-functional CDR3 amino acid sequences (19%) from a total of 69,156 TCRs.
#> Filtered data frame contains 56,019 TCRs (80%) of original 69,623 TCRs.
#> Out of 56019 valid TCRs, 5721 clusters detected and 37992 singleton TCRs.
#> 130 clusters of size >= 10, 14 clusters of size >= 50, 5 clusters of size >=100.