Load paired-TCR and pseudo-bulk data from TIRTLseq experiments into a TIRTLseqData object
load_tirtlseq.Rd
load_tirtlseq()
loads TIRTLseq data from a given directory. It can also automatically
assemble metadata from the filenames.
Usage
load_tirtlseq(
directory,
chain = c("all", "paired", "alpha", "beta"),
sep = "_",
meta_columns = NULL,
samples = NULL,
nThread = data.table::getDTthreads(),
verbose = TRUE,
n_max = Inf
)
Arguments
- directory
the directory to look in for ".tsv" or ".tsv.gz" files
- chain
which chain data to load – all chains (alpha, beta, and paired) by default.
- sep
(optional) separator in the filename for metadata information ("_" by default)
- meta_columns
(optional) a vector of identifying the metadata contained in filenames, for example
c("cell_type", "timepoint", "donor")
for files named similar to "cd8_timepoint2_donor1_TIRTLoutput.tsv".- samples
(optional) specific sample ids (the part of the filename before "_pseudobulk" or "_TIRTLoutput") to load. Default is NULL (loads all samples in the directory).
- verbose
whether to print the name of each file loaded (default is TRUE).
- n_max
the maximum number of files to read in – used mostly for testing purposes (default is Inf, i.e. no maximum).
Value
The function returns a list with two objects:
$meta
- a metadata table (data frame)
$data
- a list with one entry for each sample. Each entry is a list with entries
$alpha
, $beta
, and $paired
, which are data frames for the alpha- and beta-chain
pseudo-bulk data and the paired data respectively.
Details
The function expects ".tsv" (or ".tsv.gz") files. It looks for files ending in "_pseudobulk_TRA.tsv" (alpha-chain pseudo-bulk), "_pseudobulk_TRB.tsv" (beta-chain pseudo-bulk), and "_TIRTLoutput.tsv" (paired alpha and beta chains).
By default, the function will construct a metadata table with a row for each sample, based
on unique strings at the beginning of filenames (before "_TIRTLoutput.tsv" or similar).
If the filename contains sample metadata, then it can add multiple columns to the metadata
table with this information. For example, if a typical file looks like "cd8_timepoint2_donor1_TIRTLoutput.tsv"
and the user supplies c("cell_type", "timepoint", "donor")
for meta_columns
and "_"
for sep
,
then the metadata table will look like something like this:
sample_id cell_type timepoint donor label
<chr> <chr> <chr> <chr> <chr>
1 cd8_timepoint2_donor1 cd8 timepoint2 donor1 cell_type: cd8 | timepoint: timepoint2 | donor: donor1
2 ...
3 cd4_timepoint1_donor3 cd4 timepoint1 donor3 cell_type: cd4 | timepoint: timepoint1 | donor: donor3
See also
Other data_wrangling:
add_metadata()
,
filter_dataset()
,
reorder_samples()