Cluster TCRs (using the Leiden algorithm) based on their pairwise TCRdist values
cluster_tcrs.Rd
The cluster_tcrs()
function aggregates all of the paired TCRs found in the data,
calculates pairwise similarity using the va, vb, cdr3a, and cdr3b regions (via TCRdist),
and clusters the results using the Leiden algorithm.
Usage
cluster_tcrs(
data,
tcrdist_cutoff = 90,
resolution = 0.1,
with_vdjdb = TRUE,
allow_self_edges = TRUE
)
Arguments
- data
a list of TIRTLseq TCR data for samples created with
load_tirtlseq()
- tcrdist_cutoff
the
TCRdist()
function will only record TCRdist values less than or equal to the cutoff. Default is 90. Note: Higher cutoffs will return more data, at most NxN where N is the number of unique TCRs.- resolution
the "resolution" parameter for the Leiden algorithm. A lower value will produce larger clusters and a higher value will produce smaller clusters. Typical values are in the 0.1 - 2.0 range. A higher value may be better for densely connected data while a lower value may be better for moderately connected data. Default is 0.1.
- with_vdjdb
if TRUE, observed clones will be compared and clustered with annotated clones from VDJ-db. If parameter is a data frame, the supplied data frame will be used as the database.
- allow_self_edges
(default TRUE) if FALSE, only calculate TCRdist between members of the input data and vdj_db
Value
Returns a list with the following elements:
$df
- a data frame with all unique TCRs along with cluster annotations
$dist_df
- a data frame with distances (TCRdist) between TCR pairs in long format
$sparse_adj_mat
- an adjacency matrix (in sparse format) marking TCR pairs with TCRdist <= tcrdist_cutoff
$graph_adj
- an igraph object created from the adjacency matrix
$tcrdist_cutoff
- the cutoff used for TCRdist
$resolution
- the resolution parameter used for the Leiden algorithm
Details
The function also filters the dataset to TCRs that are valid for TCRdist.
The following TCRs are removed: - TCRs that contain stop codons (*) or frame shifts (_) in their cdr3a or cdr3b regions - TCRs that contain a cdr3 region with 5 or less amino acids - TCRs that contain a v segment allele not found in our parameter table
V-segments that do not specify an allele (e.g. "TRAV1-2" instead of "TRAV1-2*01") will be assigned to the "*01" allele.
See also
plot_clusters()
, identify_non_functional_seqs()
, TCRdist()
Other repertoire_analysis:
TCRdist()
,
TCRdist_cpp()
,
calculate_diversity()
,
get_all_div_metrics()
,
get_all_tcrs()
,
get_pair_stats()
,
get_paired_by_read_fraction_range()
,
summarize_data()