Read and process single-cell paired-chain TCR-seq data
read_external_paired.Rd
This function reads and processes paired TCR-sequencing data from a non-TIRTLseq assay.
Currently 10X and Parse Biosciences data are supported.
Usage
read_external_paired(
path,
format = c("auto", "10X", "ParseBio"),
id_cols = make_tcr_schema(features = c("v", "j", "cdr3_aa", "cdr3_nt"), second_alpha =
FALSE),
multi = FALSE,
separate_rows = TRUE,
productive_only = TRUE,
verbose = TRUE
)Arguments
- path
the path to the data file
- format
the format of the data, either
"10X"or"ParseBio"or"auto". If"auto"(default), the function will try to decipher which technology the data file was created by.- id_cols
a vector of column names to be used to define a clone (e.g.
c("va","vb","ja","jb","alpha_nuc", "beta_nuc", "cdr3a", "cdr3b")). This can be produced automatically using themake_tcr_schema()function.- multi
If
FALSE(default), select only the best two alpha chains for each beta chain when processing the data and creating a data frame with paired TCRs. IfTRUE, keep all alphas.- separate_rows
If
TRUE, when there are multiple alpha chains paired with one beta chain, put each pair in a separate row in the output data frame. IfFALSE, add second alpha chain in extra columns on the same row.- productive_only
If
TRUE, keep only "productive" chains- verbose
If
TRUE, print messages.
Value
A list containing the following slots:
df_pairs_complete - (data frame) paired receptors - one row for each receptor (excluding those missing an alpha or beta chain)
df_pairs - (data frame) with paired receptors - one row for each receptor (including those missing an alpha or beta chain)
df_pairs_long - (data frame) with paired receptors - one row for each cell (including those missing an alpha or beta chain)
df_pairs_long_complete - (data frame) with paired receptors - one row for each cell (excluding those missing an alpha or beta chain)
df_raw - (data frame) un-edited input data
chain_df - (data frame) summary of total number of each chain in input data
barcode_df - (data frame) summary of number of chains found in each cell
id_cols - (character vector) columns used as IDs to uniquely define receptor pairs
n_cells_total -(integer) total number of cells
n_cells_complete - (integer) number of cells with both chains
Details
Supported data types:
"10X"- "filtered_contig_annotations.csv" or "all_contig_annotations.csv" outputs from Cell Ranger (https://www.10xgenomics.com/support/software/cell-ranger/7.2/analysis/outputs/cr-5p-outputs-overview-vdj)"ParseBio"- "tcr_annotation_airr.tsv" output from Trailmaker
See also
Other data_processing:
TIRTL_process(),
add_single_chain_data(),
clean_pairs(),
combine_bulk_and_paired_data(),
filter_duplicate_tcrs(),
filter_mait(),
filter_nonfunctional_TCRs(),
filter_short_cdr3s(),
filter_v_alleles(),
identify_non_functional_seqs(),
identify_paired(),
make_tcr_schema(),
prep_for_tcrdist(),
read_external_bulk(),
remove_duplicates()