pkgdown/extra.css

Skip to contents

[Experimental] This runs the MAD-HYPE and T-SHELL algorithms to find TCRalpha-beta pairs originating from the same clone.

Usage

run_pairing(
  folder_path,
  folder_out,
  prefix,
  well_filter_thres = 0.5,
  min_reads = 0,
  min_wells = 2,
  well_pos = 3,
  wellset1 = get_well_subset(1:16, 1:24),
  compute = TRUE,
  backend = c("auto", "cpu", "cupy", "mlx"),
  pval_thres_tshell = 1e-10,
  wij_thres_tshell = 2,
  verbose = TRUE,
  write_extra_files = FALSE,
  filter_before_top3 = FALSE,
  fork = NULL,
  shared = NULL,
  chunk_size = 500,
  exclude_nonfunctional = FALSE,
  select_best_madhype = FALSE,
  select_best_tshell = FALSE
)

Arguments

folder_path

the path of the folder with well-level data

folder_out

the path of the folder to write results to. The function will create the folder if it does not exist.

prefix

a prefix for the output file names

well_filter_thres

wells are removed if they have fewer unique clones than: wellfilter_thres*(Avg. # of unique clones per well). The default value is 0.5

min_reads

minimum number of reads a chain must have in a well to be considered observed (note: actual minimum is min_reads+1. Default value is 0, i.e. chain must have >= 1 read in a well)

min_wells

minimum number of wells a chain must be observed in to be paired.

well_pos

the position of the well ID (e.g. "B5") in the file names. For example, files named "<well_id>_TCRalpha.tsv" would use well_pos=3. (default is 3)

wellset1

a vector of wells to use for the pairing

compute

whether or not to run the pairing algorithms after tabulating and writing pseudobulk data (default TRUE)

backend

the computing backend to use. The function looks for a GPU and automatically chooses an appropriate backend by default.

pval_thres_tshell

the adjusted p-value threshold for T-SHELL significance (default 1e-10)

wij_thres_tshell

the threshold for the number of wells containing both chains for T-SHELL significance (default >2 wells)

verbose

whether to print out messages (default TRUE)

write_extra_files

whether to write un-necessary intermediate files (default FALSE)

filter_before_top3

whether to filter by loss fraction before extracting top 3 correlation values for T-SHELL (default FALSE)

fork

whether to "fork" the python process for basilisk (default is NULL, which automatically chooses an appropriate option)

shared

whether to use a "shared" python process for basilisk (default is NULL, which automatically chooses an appropriate option)

chunk_size

batch size for calculations in pairing scripts

exclude_nonfunctional

whether to exclude non-functional chains before pairing (default is FALSE)

select_best_madhype

whether to use a secondary algorithm on the pairs from the MAD-HYPE algorithm to select the best pairs for each clone (default is FALSE)

select_best_tshell

whether to use a secondary algorithm on the pairs from the T-SHELL algorithm to select the best pairs for each clone (default is FALSE)

Value

A data frame with the TCR-alpha/TCR-beta pairs.

The function also writes three files to "folder_out":

  • A data frame ("_pseudobulk_TRA.tsv") of pseudobulk counts for TCRalpha chains

  • A data frame ("_pseudobulk_TRB.tsv") of pseudobulk counts for TCRbeta chains

  • A data frame ("_TIRTLoutput.tsv") of TCR-alpha/TCR-beta pairs.

These files can be loaded using the load_tirtlseq() function.

If write_extra_files is TRUE, the function also writes sparse matrices of per-well read counts (well x chain) for TCR-alpha and beta to "_alpha_mat.rds" and "_beta_mat.rds". Metadata for the chains in these matrices are written to "_alpha_meta.parquet" and "_beta_meta.parquet" and metadata for the wells is written to "_well_meta.parquet".

These files can be loaded using the load_well_counts_binary() function.