Load paired-TCR and pseudo-bulk data from TIRTLseq experiments into a TIRTLseqData object

load_tirtlseq() loads TIRTLseq data from a given directory. It can also automatically assemble metadata from the filenames.

Usage

load_tirtlseq(
  directory,
  chain = c("all", "paired", "alpha", "beta"),
  sep = "_",
  meta_columns = NULL,
  samples = NULL,
  nThread = data.table::getDTthreads(),
  verbose = TRUE,
  n_max = Inf
)

Arguments

directory: the directory to look in for ".tsv" or ".tsv.gz" files
chain: which chain data to load – all chains (alpha, beta, and paired) by default.
sep: (optional) separator in the filename for metadata information ("_" by default)
meta_columns: (optional) a vector of identifying the metadata contained in filenames, for example c("cell_type", "timepoint", "donor") for files named similar to "cd8_timepoint2_donor1_TIRTLoutput.tsv".
samples: (optional) specific sample ids (the part of the filename before "_pseudobulk" or "_TIRTLoutput") to load. Default is NULL (loads all samples in the directory).
verbose: whether to print the name of each file loaded (default is TRUE).
n_max: the maximum number of files to read in – used mostly for testing purposes (default is Inf, i.e. no maximum).

Value

The function returns a list with two objects:

$meta - a metadata table (data frame)

$data - a list with one entry for each sample. Each entry is a list with entries $alpha, $beta, and $paired, which are data frames for the alpha- and beta-chain pseudo-bulk data and the paired data respectively.

Details

The function expects ".tsv" (or ".tsv.gz") files. It looks for files ending in "_pseudobulk_TRA.tsv" (alpha-chain pseudo-bulk), "_pseudobulk_TRB.tsv" (beta-chain pseudo-bulk), and "_TIRTLoutput.tsv" (paired alpha and beta chains).

By default, the function will construct a metadata table with a row for each sample, based on unique strings at the beginning of filenames (before "_TIRTLoutput.tsv" or similar). If the filename contains sample metadata, then it can add multiple columns to the metadata table with this information. For example, if a typical file looks like "cd8_timepoint2_donor1_TIRTLoutput.tsv" and the user supplies c("cell_type", "timepoint", "donor") for meta_columns and "_" for sep, then the metadata table will look like something like this:


   sample_id             cell_type   timepoint       donor     label
     <chr>               <chr>         <chr>         <chr>     <chr>
1 cd8_timepoint2_donor1    cd8       timepoint2      donor1    cell_type: cd8 | timepoint: timepoint2 | donor: donor1
2 ...
3 cd4_timepoint1_donor3    cd4       timepoint1      donor3    cell_type: cd4 | timepoint: timepoint1 | donor: donor3

Examples

# example code
# paired = load_tirtlseq("path_to/your_directory", sep = "_", meta_columns = c("cell_type", "timepoint"))