Dataset

class pyrefinebio.Dataset(id=None, data=None, aggregate_by=None, scale_by=None, is_processing=None, is_processed=None, is_available=None, has_email=None, email_address=None, email_ccdl_ok=None, expires_on=None, s3_bucket=None, s3_key=None, success=None, failure_reason=None, created_at=None, last_modified=None, start=None, size_in_bytes=None, sha1=None, quantile_normalize=None, quant_sf_only=None, svd_algorithm=None, download_url=None, notify_me=False)

Datasets are collections of experiments and their samples. A Dataset needs to be constructed and then processed before it can be downloaded. Downloading a Dataset requires an activated API token. See pyrefinebio.Token for more details

Create and save a Dataset

>>> import pyrefinebio
>>> dataset = pyrefinebio.Dataset(email_address="example@refine.bio", data={"SRP003819": ["SRR069230"]})
>>> dataset = dataset.save()

Get a Dataset that has been saved

>>> import pyrefinebio
>>> id = "dataset id <guid>"
>>> dataset = pyrefinebio.Dataset.get(id)

Start processing a Dataset

>>> import pyrefinebio
>>> dataset = pyrefinebio.Dataset(...)
>>> dataset.process()

Check if a Dataset is finished processing

>>> import pyrefinebio
>>> dataset = pyrefinebio.Dataset(...)
>>> dataset.process()
>>> dataset.check()

Download a processed Dataset

>>> import pyrefinebio
>>> dataset = pyrefinebio.Dataset(...)
>>> dataset.process()
>>> dataset.download("~/datasets/my_dataset.zip")
add_samples(experiment, samples=['ALL'])

Add samples to a dataset

Returns:

Dataset

Parameters:
  • experiment (str) –

    accession code for the Experiment related to the Samples you

    are adding to the dataset

    (Experiment): Experiment object related to the Samples you are adding

    the dataset

  • samples (list) – list of Sample objects or Sample accession codes for the samples you are adding to the dataset

check()

Check to see if a Dataset has finished processing

Returns:

bool

download(path, prompt=True)

Download a processed Dataset

The path that the dataset is downloaded to is stored in _downloaded_path

Returns:

Dataset

Parameters:
  • path (str) – the path that the Dataset should be downloaded to

  • prompt (bool) – if true, will prompt before downloading files bigger than 1GB

extract()

Extract a downloaded Dataset

Returns:

Dataset

classmethod get(id)

Retrieve a specific Dataset based on id

Returns:

Dataset

Parameters:

id (str) – the guid id for the computed file you want to get

process()

Start processing a Dataset

In order for a Dataset to be processed, its data and email_address attributes must be properly set.

Returns:

void

save()

Save a Dataset

In order for a dataset to be saved its data attribute must be properly set. The data attribute should be a dict with experiment accession codes as the keys and lists of sample accession codes as the values. If you want all samples associated with the experiment, you can use the value “ALL”.

Example

>>> data = {
>>>     "SRP003819": [
>>>         "SRR069230",
>>>         "SRR069231"
>>>     ],
>>>     "SRP003820": ["ALL"]
>>> }
Returns:

Dataset