Parameters¶
A full list of parameters can be found in the table at the bottom of this page. However, in practice, only a few parameters will be relevant for most users of EUKulele. These are the required ones:
mets_or_mags: Whether the user intends to run the analysis for metatranscriptomic samples (“mets”) or metagenomic samples (“mags”)
Flag |
Configuration File Entry |
Meaning |
|---|---|---|
|
N/A |
The path to the configuration file which should be used to retrieve the equivalent of command-line arguments. |
|
|
A required flag to indicate whether metatranscriptomic (“mets”) or metagenomic (“mags”) samples are being used as input. |
|
samples |
A required flag to indicate where the samples (metagenomic or metatranscriptomic, depending on “mets_or_mags” flag) are located. |
|
output |
The path to the directory where output will be stored. Defaults to a folder called |
|
reference |
A flag to indicate where the reference FASTA is stored, or a keyword argument for the dataset to be downloaded and used. Only used if not downloading automatically. |
|
ref_fasta |
The name of the reference FASTA file in |
|
database |
An optional additional argument for specifying the database name. If the database specified is one of the supported databases (currently, “mmetsp”, “eukprot”, or “phylodb”, it will be downloaded automatically. Otherwise, MMETSP is used as a default. |
|
run_transdecoder (set to 0 or 1) |
An argument for the user to specify whether or not TransDecoder should be used to translate input nucleotide sequences, prior to |
|
nucleotide_extension |
The file extension for samples in nucleotide format (metatranscriptomes). Defaults to .fasta. |
|
protein_extension |
The file extension for samples in protein format (metatranscriptomes). Defaults to .faa. |
|
force_rerun |
If included in a command line argument or set to 1 in a configuration file, this argument forces all steps to be re-run, regardless of whether output is already present. |
|
use_salmon_counts |
If included in a command line argument or set to 1 in a configuration file, this argument causes classifications to be made based both on number of classified transcripts and by counts. |
|
salmon_dir |
If |
|
names_to_reads |
A file that creates a correspondence between each transcript name and the number of |
|
transdecoder_orfsize |
The minimum cutoff size for an open reading frame (ORF) detected by |
|
alignment_choice |
A choice of aligner to use, currently |
|
cutoff_file |
A |
|
filter_metric |
Either evalue, pid, or bitscore (default evalue) - the metric to be used to filter hits based on their quality prior to taxonomic estimation. |
|
consensus_cutoff |
The value to be used to decide whether enough of the taxonomic matches are identical to overlook a discrepancy in classification based on hits associated with a contig. Defaults to 0.75 (75%). |
|
busco_file |
Overrides specific organism and taxonomy parameters (next two entries below) in favor of a tab-separated file containing each organism/group of interest and the taxonomic level of the query. |
|
organisms |
A list of organisms/groups to test the BUSCO completeness of matching contigs for. |
|
taxonomy_organisms |
The taxonomic level of the groupings indicated in the list of |
|
individual_or_summary |
Defaults to summary. Whether BUSCO assessment should just be performed for the top organism matches, or whether the list of organisms + their taxonomies or BUSCO file (above parameters) should be used (individual). When |
|
busco_threshold |
The threshold for BUSCO completeness for a set of contigs to be considered reasonably BUSCO-complete. |
|
tax_table |
The name of the formatted taxonomy table; defaults to “tax-table.txt.”. If this file is not found, it can be generated from the reference FASTA and original taxonomy file using the provided script |
|
protein_map |
The name of the JSON file containing protein correspondences; defaults to “protein-map.json”. If this file is not found, it can be generated from the reference FASTA and original taxonomy file using the provided script |