Uppmax Structure

Since all our ancient data from UU and SU are processed through the same pipeline we have a central project, or a aDNA umbrella project where all our files are stored.
However, in July 2020, the Human Evolution program (HE) at Uppsala University decided to request a large shared storage project on UPPMAX which contains all the program's research projects. This change in structure makes it very important that only people employed by HE use this project for downstream analysis. Peolpe not affiliated with HE are therefore welcome to read from this project, but not to work in it (This is not to be cruel, but the space is limited and we need to make sure that people at HE have the space to do their downstream analysis).

Storage project:

If you are affiliated to HE, you can do your downstram analyses in /proj/snic2020-2-10/private/Analyses/, just create a subdirectory with your name (mkdir /proj/snic2020-2-10/private/Analyses/<Name>) and work there.
If you are NOT affiliated to HE, you need to apply for your own storage projects (at SUPR SNIC , Rounds --> Storage Rounds -->SNIC Small Storage) and work there.

Compute project (core hours) for downstream analyses:

Everyone needs their own compute project (at SUPR SNIC , Rounds --> Compute Rounds -->SNIC Small Compute) in order to submit jobs to slurm.

There is an Uppmax introductionary course held several times per year which we recommend you sign up to. You might also find material from past workshops there.

To access our data you need to apply for an UPPMAX account, and then request access to the human_evolution project. The Dnr name changes once a year (see below), but the uppmax project name remains the same (snic2020-2-10)

2020-07-01 - 2021-06-30: SNIC 2020/2-10
2021-07-01 - 2022-06-30: SNIC 2021/2-17

Carolina or Mattias will then approve you and you can log onto Rackham by writing:

ssh -AX username@rackham.uppmax.uu.se

Below you have information about how to access the umbrella folders and how the structure is set up

Private

The private folder can be reached with any of the following paths:

/proj/snic2020-2-10/private/Data/Human/Ancient/ (This is the real path to the folder)
/proj/snic2020-2-10/1000AncientGenomes/ (This is the symbolic link to the folder which makes it easier to type the path)

animals:	Link to folder where animals are processed.
bin:	All shared scripts and documentation that we use are in here.
comparative_seq/comparative_seqs:	Comparative sequences in bam format published by other groups but downloaded and at least mapped with same criteria as we use. (Technically located in `/proj/snic2020-2-10/private/Data/Human/comparative_seqs/`). For more details about samples, click here.
fromINBOX:	folder where data from SNP&SEQ (UU) or NGI (SU) is downloaded from GRUS. When data has been processed, the data is moved to our offload project on lutra for archiveing.
hg19bams:	All sample libraries mapped to hg19/hs37d5.fa. PCR-duplicates and reads shorter than 35 bp are removed.
hg19bams/mapped:	“Raw” mapped reads. No filtering done, just all reads that mapped against hg19.
merge_bams_hg19:	Once you merged a sample publication worthy you should always copy it here as well so everyone can access to it.
mergedfastqs:	The mergedfastq-files. For more info see aDNA Pipeline
ref_seqs:	Our reference sequences (also contains a link to animal reference folder).
SNPref:	plink files containing modern populations.
vcfs:	VCF files should be placed here.

Nobackup

The nobackup folder can be reached with any of the following paths:

/proj/snic2020-2-10/nobackup/private/Data/Human/Ancient/ (This is the real path to the folder)
/proj/snic2020-2-10/private/Data/Human/Ancient/nobackup/ (This is the real path to private folder and then a symbolic link to the nobackup folder)
/proj/snic2020-2-10/1000AncientGenomes/nobackup/ (This is based on symbolic links which makes it easier to type the path)

animals:	Link to area for animal processing.
damage_plots:	Damage Pattern plots for all sample libraries processed.
hg20bams:	Bams mapped against the hg38-version of the reference genome. Sample libraries were previously processed once a year, but nothing have been processed since 2019. Does anyone use data in this folder, in that case please let me know???
mt_consensus:	Mt consensus generated using samtools, bcftools and vcfutils . Contains ambiguous bases (such as R, Y etc.), if you need a consensus without see Haplofind.
mt_consensus_strict:	Same as above but positions covered by less than 3x is set to N.
RL_plots:	All Read Length plots.
(temp_merging:)	Area previosuly used for people to merge samples in. This subfolder is no longer in use and will be removed as soon as I know that it doesn't contain any important data (I will start with making it unavailable and then later remove it completely)

There are some other folders as well but they are primarily for the pipeline. If you have any questions, just ask Carolina.