######### # Day 2 # ######### E1A: Using ‘cat’ catenate the two files P_aeruginosa_TOprJ3-positive_part1.fasta and P_aeruginosa_TOprJ3-positive_part2.fasta into a new file called P_aeruginosa_TOprJ3-positive.fasta cat P_aeruginosa_TOprJ3-positive_part1.fasta P_aeruginosa_TOprJ3-positive_part2.fasta > P_aeruginosa_TOprJ3-positive.fasta E1B: Using cat, append the file P_aeruginosa_TOprJ3-positive_part3.fasta to P_aeruginosa_TOprJ3-positive.fasta cat P_aeruginosa_TOprJ3-positive_part3.fasta >> P_aeruginosa_TOprJ3-positive.fasta --------------------------------------------------------------------------------- E2A: Get the line with the string (word) terA from P_aeruginosa_TOprJ3-positive.fasta grep "terA" P_aeruginosa_TOprJ3-positive.fasta Know that you can also use single quotes: grep 'terA' P_aeruginosa_TOprJ3-positive.fasta OR even without quotes, though this is NOT recommended: grep terA P_aeruginosa_TOprJ3-positive.fasta E2B: Make grep ignore case and use tera when searching for the same line as before grep -i "tera" P_aeruginosa_TOprJ3-positive.fasta E2C: Get the line with terA and the following 5 lines. This is to see if the sequence is there and not only the header. grep -A 5 "terA" P_aeruginosa_TOprJ3-positive.fasta E2D: Find out which line number terA is in 392 grep -n "terA" P_aeruginosa_TOprJ3-positive.fasta E2E: Count all the sequences in P_aeruginosa_TOprJ3-positive.fasta, remember fasta sequences start with ‘>’ 106 grep -c ">" P_aeruginosa_TOprJ3-positive.fasta Extra 1A: Use grep to save the header names in a file called "header_names.txt". grep ">" P_aeruginosa_TOprJ3-positive.fasta > header_names.txt Extra 1B: Let's discover what happens when you don't use proper quotations with grep. Run the following command and then display the content of header_names.txt: grep > header_names.txt Using the command above, you save the empty grep result into the file "header_names.txt". As > is not in quatations, the command is understood as saving the result and overwriting "header_names.txt", instead of looking for '>' in the file. Remember you overwrite using only one >, and >> would be appending. So this is the reasong why it is recommended to always use quotation marks. If it's a habit, you wont accidentally empty a file. Remember to delete the file after use: rm header_names.txt Extra Exercice: Find all the sequences that has ‘ter’. What differences are there, if you add the ‘o’ options (-o)? grep ter P_aeruginosa_TOprJ3-positive.fasta grep -o "ter" P_aeruginosa_TOprJ3-positive.fasta You might ask, why did we make you prepare for this task? Well, it is to help you know it's a possibility to have a one word output. That might come in handly later --------------------------------------------------------------------------------- E3A: First things first, start out by counting how many lines there are in ‘P_aeruginosa_TOprJ3-positive.fasta. 1674 wc -l P_aeruginosa_TOprJ3-positive.fasta E3B: Let’s see how many words there are in the README.md file 131 wc -w README.md E3C: Combining ls and wc through the pipe command, count how many files there are in P_aeruginosa/assemblies 7 ls | wc -l Extra Exercise: Find out how often ‘TTTTTT’ occurs in P_aeruginosa_TOprJ3-positive.fasta 11 grep -o 'TTTTTT' P_aeruginosa_TOprJ3-positive.fasta | wc -l --------------------------------------------------------------------------------- E4A: Download the new files using the command below, and unzip them using ‘tar’. tar -zxvf SorryForAddingTheseLate.tar.gz E4B: Create a directory called ‘other,’ and within this directory, create a directory called ‘assemblies.’ Move all the files to said directory. In BacteriaData: mkdir other mkdir other/assemblies mv Proteus_terrae_20Q172tw.fasta other/assemblies/ mv P_juntendi-k37.fasta other/assemblies/ mv our_first_nanopore.fasta other/assemblies/ mv K_aeogenes_pNUITM_vk5.fasta other/assemblies/ E4C: Find out in which file and in which line the mistake was made and fix it. There are many ways of solving this. One is use the head and tail command, and change the header in nano. A more complex but faster version is using grep: grep -n "^lcl" our_first_nanopore_part* Lets break it down: grep -n: Will inform us which linenumber we will find lcl "^lcl": The quotations is just to ensure grep knows what to search for. '^' is to insure it only searches for lcl in the start of the line. our_first_nanopore_part*: Because we use a wildcard, grep will search in all files starting with 'our_first_nanopore_part' Grep will inform us which file it found the pattern in. The result: our_first_nanopore_part3.fasta:6:lcl|CP079827.1_gene_5723 [gene=rnpA] [locus_tag=KW568_28615] [location=complement(6289644..6290048)] [gbkey=Gene] It found the pattern in our_first_nanopore_part3.fasta, in the 6th line. E4D: There is no reason why the file needed to be split up for the ResFinder analysis. Combine the different parts of ‘our_first_nanopore.fasta’ into one file called ‘our_first_nanopore.fasta’. cat our_first_nanopore_part* > our_first_nanopore.fasta or cat our_first_nanopore_part1.fasta our_first_nanopore_part2.fasta our_first_nanopore_part3.fasta > our_first_nanopore.fasta E4E: Create soft links for all the assemblies from the ‘other’ directory in the ResFinder directory In BacterialData Note that one could also use wildcards, isntead of 4 lines. ln -sr other/assemblies/K_aeogenes_pNUITM_vk5.fasta Resfinder/ ln -sr other/assemblies/our_first_nanopore.fasta Resfinder/ ln -sr other/assemblies/Proteus_terrae_20Q172tw.fasta Resfinder/ ln -sr other/assemblies/P_juntendi-k37.fasta Resfinder/ --------------------------------------------------------------------------------- E5A: Go to the ResFinder directory, and using ‘touch’ create a file called README.txt cd Resfinder/ touch README.txt E5B: Open the file using nano and add the line “In this directory are the assemblies from the Daycare outbreak.” Just copy and paste the line! :) E5C: Zip the ResFinder directory using tar, and call the gz file ‘ResFinder.tar.gz’ tar -czvhf ResFinder.tar.gz Resfinder/ E5D: Remove README.txt from the ResFinder directory; no need to leave this here rm Resfinder/README.txt E5E: Get the path for the ‘ResFinder.gz’ file to send to your colleague readlink -f ResFinder.tar.gz ExtraA: Use chmod to modify the permissions of the "P_aeruginosa_TOprJ3- positive.fasta" file so that only you can change it, but everyone else can still read it. chmod 744 P_aeruginosa_TOprJ3- positive.fasta ExtraB: Use the ‘--help' on tar, and see what the option you have used means. tar --help c: create a new archive (when we extract/unzip, we use -x instead) z: zips the created archive (gzip) v: verbose - lists files while processing f: lets you specify the name of the archive ---------------------------------------------------------------------------------