Dataset¶
-
class
pyrefinebio.
Dataset
(id=None, data=None, aggregate_by=None, scale_by=None, is_processing=None, is_processed=None, is_available=None, has_email=None, email_address=None, email_ccdl_ok=None, expires_on=None, s3_bucket=None, s3_key=None, success=None, failure_reason=None, created_at=None, last_modified=None, start=None, size_in_bytes=None, sha1=None, quantile_normalize=None, quant_sf_only=None, svd_algorithm=None, download_url=None, notify_me=False)¶ Datasets are collections of experiments and their samples. A Dataset needs to be constructed and then processed before it can be downloaded. Downloading a Dataset requires an activated API token. See pyrefinebio.Token for more details
Create and save a Dataset
>>> import pyrefinebio >>> dataset = pyrefinebio.Dataset(email_address="example@refine.bio", data={"SRP003819": ["SRR069230"]}) >>> dataset = dataset.save()
Get a Dataset that has been saved
>>> import pyrefinebio >>> id = "dataset id <guid>" >>> dataset = pyrefinebio.Dataset.get(id)
Start processing a Dataset
>>> import pyrefinebio >>> dataset = pyrefinebio.Dataset(...) >>> dataset.process()
Check if a Dataset is finished processing
>>> import pyrefinebio >>> dataset = pyrefinebio.Dataset(...) >>> dataset.process() >>> dataset.check()
Download a processed Dataset
>>> import pyrefinebio >>> dataset = pyrefinebio.Dataset(...) >>> dataset.process() >>> dataset.download("~/datasets/my_dataset.zip")
-
add_samples
(experiment, samples=['ALL'])¶ Add samples to a dataset
- Returns:
Dataset
- Parameters:
experiment (str) –
- accession code for the Experiment related to the Samples you
are adding to the dataset
- (Experiment): Experiment object related to the Samples you are adding
the dataset
samples (list) – list of Sample objects or Sample accession codes for the samples you are adding to the dataset
-
check
()¶ Check to see if a Dataset has finished processing
- Returns:
bool
-
download
(path, prompt=True)¶ Download a processed Dataset
The path that the dataset is downloaded to is stored in _downloaded_path
- Returns:
Dataset
- Parameters:
path (str) – the path that the Dataset should be downloaded to
prompt (bool) – if true, will prompt before downloading files bigger than 1GB
-
extract
()¶ Extract a downloaded Dataset
- Returns:
Dataset
-
classmethod
get
(id)¶ Retrieve a specific Dataset based on id
- Returns:
Dataset
- Parameters:
id (str) – the guid id for the computed file you want to get
-
process
()¶ Start processing a Dataset
In order for a Dataset to be processed, its data and email_address attributes must be properly set.
- Returns:
void
-
save
()¶ Save a Dataset
In order for a dataset to be saved its data attribute must be properly set. The data attribute should be a dict with experiment accession codes as the keys and lists of sample accession codes as the values. If you want all samples associated with the experiment, you can use the value “ALL”.
Example
>>> data = { >>> "SRP003819": [ >>> "SRR069230", >>> "SRR069231" >>> ], >>> "SRP003820": ["ALL"] >>> }
- Returns:
Dataset
-