#########
# Day 2 #
#########

E1A: Using ‘cat’ catenate the two files P_aeruginosa_TOprJ3-positive_part1.fasta
     and P_aeruginosa_TOprJ3-positive_part2.fasta into a new file called 
     P_aeruginosa_TOprJ3-positive.fasta 

	cat P_aeruginosa_TOprJ3-positive_part1.fasta P_aeruginosa_TOprJ3-positive_part2.fasta > P_aeruginosa_TOprJ3-positive.fasta

E1B: Using cat, append the file P_aeruginosa_TOprJ3-positive_part3.fasta 
     to P_aeruginosa_TOprJ3-positive.fasta

	cat P_aeruginosa_TOprJ3-positive_part3.fasta >> P_aeruginosa_TOprJ3-positive.fasta
---------------------------------------------------------------------------------

E2A: Get the line with the string (word) terA from P_aeruginosa_TOprJ3-positive.fasta
	grep "terA" P_aeruginosa_TOprJ3-positive.fasta

	Know that you can also use single quotes:
        grep 'terA' P_aeruginosa_TOprJ3-positive.fasta

	OR even without quotes, though this is NOT recommended:
        grep terA P_aeruginosa_TOprJ3-positive.fasta

E2B: Make grep ignore case and use tera when searching for the same line as before
	grep -i "tera" P_aeruginosa_TOprJ3-positive.fasta

E2C: Get the line with terA and the following 5 lines. This is to see if the sequence is there and not only the header. 
	grep -A 5 "terA" P_aeruginosa_TOprJ3-positive.fasta

E2D: Find out which line number terA is in
	392
	grep -n "terA" P_aeruginosa_TOprJ3-positive.fasta 

E2E: Count all the sequences in P_aeruginosa_TOprJ3-positive.fasta, remember fasta sequences start with ‘>’
	106
	grep -c ">" P_aeruginosa_TOprJ3-positive.fasta

Extra 1A: Use grep to save the header names in a file called "header_names.txt".
		grep ">" P_aeruginosa_TOprJ3-positive.fasta > header_names.txt

Extra 1B: Let's discover what happens when you don't use proper quotations with grep. 
          Run the following command and then display the content of header_names.txt:
	  grep > header_names.txt 
		
	
	Using the command above, you save the empty grep result into the file "header_names.txt".
	As > is not in quatations, the command is understood as saving the result and overwriting "header_names.txt",
	instead of looking for '>' in the file. 
	Remember you overwrite using only one >, and >> would be appending.

	So this is the reasong why it is recommended to always use quotation marks. 
	If it's a habit, you wont accidentally empty a file.

	Remember to delete the file after use:
	rm header_names.txt  		

Extra Exercice: Find all the sequences that has ‘ter’. What differences are there, if you add the ‘o’ options (-o)?
	grep ter P_aeruginosa_TOprJ3-positive.fasta 
	grep -o "ter" P_aeruginosa_TOprJ3-positive.fasta 

	You might ask, why did we make you prepare for this task? 
	Well, it is to help you know it's a possibility to have a one word output.
	That might come in handly later

---------------------------------------------------------------------------------

E3A: First things first, start out by counting how many lines there are in ‘P_aeruginosa_TOprJ3-positive.fasta.
	1674
	wc -l P_aeruginosa_TOprJ3-positive.fasta

E3B: Let’s see how many words there are in the README.md file
	131
	wc -w README.md

E3C: Combining ls and wc through the pipe command, count how many files there are in P_aeruginosa/assemblies
	7
	ls | wc -l

Extra Exercise: Find out how often ‘TTTTTT’ occurs in P_aeruginosa_TOprJ3-positive.fasta
	11
	grep -o 'TTTTTT' P_aeruginosa_TOprJ3-positive.fasta | wc -l
---------------------------------------------------------------------------------

E4A: Download the new files using the command below, and unzip them using ‘tar’. 
	tar -zxvf SorryForAddingTheseLate.tar.gz

E4B: Create a directory called ‘other,’ and within this directory, create a directory called ‘assemblies.’
     Move all the files to said directory. 
	
	In BacteriaData:
	mkdir other
	mkdir other/assemblies
	mv Proteus_terrae_20Q172tw.fasta other/assemblies/
	mv P_juntendi-k37.fasta other/assemblies/
	mv our_first_nanopore.fasta other/assemblies/
	mv K_aeogenes_pNUITM_vk5.fasta other/assemblies/

E4C: Find out in which file and in which line the mistake was made and fix it. 
	There are many ways of solving this. One is use the head and tail command, and change the header in nano. 
	A more complex but faster version is using grep:
	grep -n "^lcl" our_first_nanopore_part* 

	Lets break it down:
	grep -n: Will inform us which linenumber we will find lcl
	"^lcl": The quotations is just to ensure grep knows what to search for. 
		'^' is to insure it only searches for lcl in the start of the line.
	our_first_nanopore_part*: Because we use a wildcard, grep will search in all files starting with 'our_first_nanopore_part'
		Grep will inform us which file it found the pattern in. 

	The result:
	our_first_nanopore_part3.fasta:6:lcl|CP079827.1_gene_5723 [gene=rnpA] [locus_tag=KW568_28615] [location=complement(6289644..6290048)] [gbkey=Gene]

	It found the pattern in our_first_nanopore_part3.fasta, in the 6th line.

E4D: There is no reason why the file needed to be split up for the ResFinder analysis.
     Combine the different parts of ‘our_first_nanopore.fasta’ into one file called ‘our_first_nanopore.fasta’. 
	cat our_first_nanopore_part* > our_first_nanopore.fasta
	or
	cat our_first_nanopore_part1.fasta our_first_nanopore_part2.fasta our_first_nanopore_part3.fasta > our_first_nanopore.fasta

E4E: Create soft links for all the assemblies from the ‘other’ directory in the ResFinder directory
	In BacterialData
	Note that one could also use wildcards, isntead of 4 lines. 
	ln -sr other/assemblies/K_aeogenes_pNUITM_vk5.fasta Resfinder/
	ln -sr other/assemblies/our_first_nanopore.fasta Resfinder/
	ln -sr other/assemblies/Proteus_terrae_20Q172tw.fasta Resfinder/ 
	ln -sr other/assemblies/P_juntendi-k37.fasta Resfinder/

---------------------------------------------------------------------------------

E5A: Go to the ResFinder directory, and using ‘touch’ create a file called README.txt
	cd Resfinder/
	touch README.txt

E5B: Open the file using nano and add the line “In this directory are the assemblies from the Daycare outbreak.”
	Just copy and paste the line! :)

E5C: Zip the ResFinder directory using tar, and call the gz file ‘ResFinder.tar.gz’ 
	tar -czvhf ResFinder.tar.gz Resfinder/

E5D: Remove README.txt from the ResFinder directory; no need to leave this here
	rm Resfinder/README.txt

E5E: Get the path for the ‘ResFinder.gz’ file to send to your colleague
	readlink -f ResFinder.tar.gz


ExtraA: Use chmod to modify the permissions of the "P_aeruginosa_TOprJ3- positive.fasta" file so that only you can change it, but everyone else can still read it.
	chmod 744 P_aeruginosa_TOprJ3- positive.fasta

ExtraB: Use the ‘--help' on tar, and see what the option you have used means. 
	tar --help
	c: create a new archive (when we extract/unzip, we use -x instead)
	z: zips the created archive (gzip) 
	v: verbose - lists files while processing
	f: lets you specify the name of the archive

---------------------------------------------------------------------------------