Input/output and general control ================================ Arguments --------- ``-i / --in_file`` *INPUT_FILE* Input fasta file containing the protein sequences to align. Any gaps present in the input sequences are ignored. learnMSA uses the alphabet ARNDCQEGHILKMFPSTWYVXUO. Special characters B, Z, J are mapped to X. The sequences must not contain any other non-standard characters. ``-o / --out_file`` *OUTPUT_FILE* Output file path for the resulting multiple sequence alignment. Use ``-f`` to change the output file type. ``-f / --format`` *FORMAT* Format of the output alignment file. Per default, learnMSA outputs alignments in a2m format. This format is closely related to fasta and usually compatible with fasta parsers. In addition to fasta, a2m uses lower case letters to indicate insertions with respect to the profile HMM and uses dots (.) to represent an insertion in other sequences at the same position. It uses upper case letters for match states and dashes (-) for deletions. The format can be set to "fasta", which uses only standard dashes and upper case letters. Use ``--convert`` to quickly convert between different formats. The a2m and fasta options use a maximum line length of 80 characters. Furthermore, this option be set to any valid Biopython SeqIO format, in which case learnMSA will write a fasta file and automatically converts it. This is not recommended for large alignments, as output files can be very large and the file contents can not be streamed. Default: a2m (fasta). ``--convert`` *MSA_FILE* With this option, learnMSA does not perform any alignment, but only converts the input MSA to the format specified with ``-f``. ``-s / --silent`` Suppresses all standard output messages. ``-d / --cuda_visible_devices`` Controls the GPU devices visible to learnMSA as a comma- separated list of device IDs. The value -1 forces learnMSA to run on CPU. Per default, learnMSA attempts to use all available GPUs. Use ``-d i`` to use a specific GPU, where i is the GPU ID starting from 0. ``--work_dir`` *WORK_DIR* Directory where any secondary files are stored. Default: ./tmp ``--save_model`` *MODEL_FILE* If set, the trained model parameters will be saved to the specified file. The file format is meant to be read with the ``--load_model`` option only. ``--load_model`` *MODEL_FILE* If set, learnMSA will load the model parameters from the specified file and use them as initialization for training. Use the ``--skip_training`` option to directly align the input sequences without further training. Practical tips and example commands ----------------------------------- Standard MSA in a2m format (``--use_language_model`` is recommended but not required): .. code-block:: bash learnMSA -i INPUT_FILE -o OUTPUT_FILE --use_language_model Enforce fasta output if a2m leads to compatibility issues: .. code-block:: bash learnMSA -i INPUT_FILE -o OUTPUT_FILE --use_language_model -f fasta To control where learnMSA writes temporary files, use the ``--work_dir`` option. In particular, this is useful when aligning the same input file multiple times in parallel, to avoid conflicts between different runs: .. code-block:: bash learnMSA -i INPUT_FILE -o OUTPUT_FILE --use_language_model --work_dir ./my_temp_dir To save a trained model for later reuse, use the ``--save_model`` option: .. code-block:: bash learnMSA -i INPUT_FILE -o OUTPUT_FILE --use_language_model --save_model my_model This can be useful to reproduce alignments later on or to resume a training. To load a previously saved model, use the ``--load_model`` option. You may combine it with ``--skip_training`` to directly align the input sequences without further training: .. code-block:: bash learnMSA -i INPUT_FILE -o OUTPUT_FILE --load_model my_model --skip_training