Getting Started With Submissions

Submitter Guide
Getting Started With Submissions

Spreadsheet Submission

Overview

In order to make your data accessible, searchable and assessable you should submit as much metadata as possible to the 4DN system along with the raw files you have generated in your experiments.

These pages are designed to

show you how to find out what kind of metadata we collect for your particular type of experiment
introduce the mechanisms by which you can submit your metadata and data to the 4DN data portal.

For an overview of the metadata structure and relationships between different items please see the slides available on the metadata introductory page.

We have three primary ways that you can submit data to the 4DN data portal.

Notes for prospective submitters

If you would like submit data to the portal:

You will need to create a user account.
Please skim through the metadata structure.
Check out the other pages in the Help menu for detailed information on the submission process.
Of note are the required metadata for the biological samples used in experiments, which is specified on this page.
We like to know about submissions beforehand, and we will need to grant your account submitter privileges. If you contact us at support@4dnucleome.org we can set up a Zoom call to discuss the details of the submission process and the most convenient approach for your existing system.
IMPORTANT: If you are planning to submit experiments that include genomic data from human patient samples please let us know as soon as possible. This data likely requires controlled access and dbGaP registration. If you are not sure if the data you are generating should be considered controlled access please contact the relevant offices at your institute, your NIH program officers or Ian Fingerman, who coordinates controlled data issues for 4DN, with questions. Any personal health information (PHI) should not be submitted with your experimental metadata. Generally any genomic data generated from human tissue or cell lines must be explicitly consented for broad sharing of genomic information and be considered controlled access data. For more info consult the NIH Genomic Data Sharing Policy

Web Submission

The online web submission forms are best used

To submit one or a few experiments.
To edit one or a few fields of an already submitted but not yet released item.
As a hands on way to gain familiarity with the 4DN data model.

Documentation on how to get started with this interface is here.

Data Submission via Spreadsheet

The excel metadata workbooks

Are useful for submitting metadata and data for several experiments or biosamples
Can be used to make bulk edits of submitted but not yet released metadata
Contain multiple sheets where each sheet corresponds to an object type and each column a field of metadata
Can be generated using the Submit4DN software
Are used as input to the Submit4DN software which validates submissions and pushes the content of the forms to our database.

Documentation of the data submission process using these forms can be found here.

REST API

For both meta/data submission and retrival, you can also access our database directly via the REST-API.

Data objects exchanged with the server conform to the standard JavaScript Object Notation (JSON) format.
Our implementation is analagous to the one developed by the ENCODE DCC.

If you would like to directly interact with the REST API for data submission see the documentation here.

Notes on Experiments and Replicate Sets

Biological replicates

The 4DN Consortium strongly encourages that experiments be performed using at least two different preparations of the same source biomaterial - i.e. bioreplicates.
When submitting metadata you should submit two Experiments that use the same Biosource, but have different Biosamples.
In many cases the only difference between Biosamples may be the dates at which the cell culture or tissue was harvested.
The experimental techniques and parameters will be shared by all experiments of the same bioreplicate set.

Technical replicates

Multiple sequencing runs performed at different times using a library prepared from the same Biosample and the same methods up until the sample is sent to the sequencer - i.e. technical replicates.

Submitting replicate information

The replicate information is stored and represented as a set of experiments that includes labels indicating the replicate type and replicate number of each experiment in the set.
The mechanism that you use to submit your metadata will dictate the type of item that you will associate replicate information with
- In excel workbooks bioreplicate and technical replicate numbers are entered in the Experiment sheet.
- Using the API you directly associate the replicate information (i.e. replicate number and the experiment identifier) with the ExperimentSetReplicate objects.
- Using the web submission interface the replicate numbers and linked experiments are added from the ExperimentSetReplicate page
In the database the information will always end up directly associated with ExperimentSetReplicate objects.
Specific details on formatting information regarding replicates is given in the Spreadsheet Submission page.
When submitting using the REST API you should format your json according to the specifications in the schema as described in the REST API page.

Referencing existing objects

Using aliases

Aliases are a convenient way for you to refer to other items that you are submitting or have submitted in the past.

An alias is a lab specific identifier that you can assign to any item
An alias takes the form of lab:id_string eg. peter-park-lab:my-alias.
An alias must be unique within all items.
Generally it is good practice to assign an alias to any item that you submit
If you use the Online Submission Interface to create new items, designating an alias is the first required step.
Once you submit an alias for an Item, that alias can be used as an identifier for that Item in the current submission as well as in any subsequent submission.

Other ways to reference existing items

You don't need to use an alias if you are referencing an item that already exists in the database.

Any of the following can be used to reference an existing item in an excel sheet or when using the REST-API.

accession - Objects of some types (eg. Files, Experiments, Biosamples, Biosources, Individuals...) are accessioned, e.g. 4DNEX4723419.
uuid - Every item in our database is assigned a “uuid” upon its creation, e.g. “44d3cdd1-a842-408e-9a60-7afadca11575”.
type/id in a few cases object specific identifying terms are also available, eg. award number for awards, or lab name for labs. (see table below)

Object	Field	type/ID	ID
Lab	name	/labs/peter-park-lab/	peter-park-lab
Award	number	/awards/ODO1234567-01/	ODO1234567-01
User	email	/users/test@test.com/	test@test.com
Vendor	name	/vendors/fermentas/	fermentas
Enzyme	name	/enzymes/HindIII/	HindIII
Construct	name	/constructs/GFP-H1B/	GFP-H1B

Many of the objects that you may need for your submissions may already exist on the 4DN web site.
We encourage submitters to use existing database items as much as possible.
Common reusable items include:
- Vendors
- Enzymes
- Biosources
- Protocols
For example, if there is an existing biosource (e.g. accession 4DNSRV3SKQ8M for H1-hESC (Tier 1) ) for the new biosample you are creating, you should reference the existing one instead of creating a new one.

Getting Connection Keys for the 4DN-DCIC servers

If you have been designated as a submitter for the project and plan to use either our spreadsheet-based submission system or the REST-API an access key and a secret key are required to establish a connection to the 4DN database and to fetch, upload (post), or change (patch) data. Please follow these steps to get your keys.

Log in to the 4DN website with your username (email) and password. If you have not yet created an account, see this page for instructions.
Once logged in, go to your ”Profile” page by clicking Account on the upper right side of the page.
In your profile page, click the green “Add Access Key” button, and copy the “access key ID” and “secret access key” values from the pop-up page. Note that once the pop-up page disappears you will not be able to see the secret access key value. However, if you forget or lose your secret key you can always delete and add new access keys from your profile page at any time.

access_key

Create a file to store this information.
- The default parameters used by the submission software is to look for a file named "keypairs.json" in your home directory.
- However, you can also specify your own filename and file location as parameters to the software (see below).
- The key information is stored in json format and is used to establish a secure connection.
- The json must be formatted as shown below - replace key and secret with your new “Access Key ID” and “Secret Access Key”.
- You can use the same key and secret to use the 4DN REST-API.

Sample content for keypairs.json

{
  "default": {
    "key": "ABCDEFG",
    "secret": "abcdefabcd1ab",
    "server": "https://data.4dnucleome.org/"
  }
}

Tip: If you don’t want to use that filename or keep the file in your home directory you can use:

the --keyfile parameter as an argument to any of the scripts to provide the path to your keypairs file.
the --key parameter to indicate a stored key name.

console import_data --keyfile Path/name_of_file.json --key NotDefault

Schema information

Schema Filename	Worksheet Name	Collection Name(s)
analysis_step.json	AnalysisStep	analysis-steps, analysis_step
award.json	Award	award(s)
biosample.json	Biosample	biosample(s)
biosample_cell_culture.json	BiosampleCellCulture	biosample-cell-cultures, biosample_cell_culture
biosource.json	Biosource	biosource(s)
construct.json	Construct	construct(s)
document.json	Document	document(s)
enzyme.json	Enzyme	enzyme(s)
experiment_atacseq.json	ExperimentAtacseq	experiments-atacseq, experiment_atacseq
experiment_capture_c.json	ExperimentCaptureC	experiments-capture-c, experiment_capture_c
experiment_chiapet.json	ExperimentChiapet	experiments-chiapet, experiment_chiapet
experiment_hi_c.json	ExperimentHiC	experiments-hi-c, experiment_hi_c
experiment_mic.json	ExperimentMic	experiments-mic, experiment_mic
experiment_repliseq.json	ExperimentRepliseq	experiments-repliseq, experiment_repliseq
experiment_seq.json	ExperimentSeq	experiments-seq, experiment_seq
experiment_set.json	ExperimentSet	experiment-sets, experiment_set
experiment_set_replicate.json	ExperimentSetReplicate	experiment-set-replicates, experiment_set_replicate
file_calibration.json	FileCalibration	files-calibration, file_calibration
file_fasta.json	FileFasta	files-fasta, file_fasta
file_fastq.json	FileFastq	files-fastq, file_fastq
file_processed.json	FileProcessed	files-processed, file_processed
file_reference.json	FileReference	files-reference, file_reference
file_set.json	FileSet	file-sets, file_set
file_set_calibration.json	FileSetCalibration	file-set-calibrations, file_set_calibration
genomic_region.json	GenomicRegion	genomic-regions, genomic_region
image.json	Image	image(s)
imaging_path.json	ImagingPath	imaging-paths, imaging_path
individual_human.json	IndividualHuman	individuals-human, individual_human
individual_mouse.json	IndividualMouse	individuals-mouse, individual_mouse
lab.json	Lab	lab(s)
modification.json	Modification	modification(s)
ontology.json	Ontology	ontology(s)
ontology_term.json	OntologyTerm	ontology-terms, ontology_term
organism.json	Organism	organism(s)
protocol.json	Protocol	protocol(s)
publication.json	Publication	publication(s)
publication_tracking.json	PublicationTracking	publication-trackings, publication_tracking
quality_metric_bamqc.json	QualityMetricBamqc	quality-metrics-bamqc, quality_metric_bamqc
quality_metric_fastqc.json	QualityMetricFastqc	quality-metrics-fastqc, quality_metric_fastqc
quality_metric_flag.json	QualityMetricFlag	quality-metric-flags, quality_metric_flag
quality_metric_pairsqc.json	QualityMetricPairsqc	quality-metrics-pairsqc, quality_metric_pairsqc
software.json	Software	software(s)
sop_map.json	SopMap	sop-maps, sop_map
summary_statistic.json	SummaryStatistic	summary-statistics, summary_statistic
summary_statistic_hi_c.json	SummaryStatisticHiC	summary-statistics-hi-c, summary_statistic_hi_c
target.json	Target	target(s)
treatment_chemical.json	TreatmentChemical	treatments-chemical, treatment_chemical
treatment_rnai.json	TreatmentRnai	treatments-rnai, treatment_rnai
user.json	User	user(s)
vendor.json	Vendor	vendor(s)
workflow.json	Workflow	workflow(s)
workflow_mapping.json	WorkflowMapping	workflow-mappings, workflow_mapping
workflow_run.json	WorkflowRun	workflow-runs, workflow_run
workflow_run_sbg.json	WorkflowRunSbg	workflow-runs-sbg, workflow_run_sbg

Getting Started With Submissions

Next

Spreadsheet Submission

Overview

Notes for prospective submitters

Web Submission

Data Submission via Spreadsheet

REST API

Notes on Experiments and Replicate Sets

Referencing existing objects

Using aliases

Other ways to reference existing items

Getting Connection Keys for the 4DN-DCIC servers

Schema information

Next

Spreadsheet Submission