Datasets

Creating a Dataset

Datasets are encrypted tables stored in ThoughtSpot and refreshed at a user-defined interval. Datasets can be created by either querying a database or uploading a Google Sheet. To create a new Dataset, select the Create button in the top right corner of your Analyst Studio home screen and choose Make a reusable Dataset.

create dataset

Alternatively, go to My Work on the left nav and click the shortcut tile for Make a reusable Dataset.

create dataset

This will open up the Datasets editor where you can write a new SQL query or insert SQL snippets using Definitions.

create dataset

Adding calculated fields to the Dataset

You can add new calculated fields to the Dataset from the Fields tab. Doing so will add the calculated field to the list of available fields, and also make it available in any reports created from the Dataset.

To add a new calculated field, first select the New field button.

Adding calculated fields to the dataset

Then enter the formula for your calculated field, along with adding a name. To save the calculated field, select the Apply and Close button.

Adding calculated fields to the dataset

Viewing the source syntax

The source syntax of the query run can be accessed from the Source tab.

Viewing the source syntax

Adding a name and description to the Dataset

You can add a name and description to the Dataset. To do so, select the caret next to the placeholder name, “Untitled Dataset.” From the dropdown, select Rename.

Adding name and description

Enter the desired Dataset name and description. Then select Save.

Adding name and description

We recommend using consistent naming conventions and adding detailed descriptions to your Datasets. Doing so will help other team members find and understand how to use the Dataset.

Scheduling a Dataset

You can set a schedule for your Dataset to refresh. When a Dataset refreshes, all associated Reports built using that Dataset will receive a prompt to pull in the fresh data.

To create a new schedule, select the caret next to the Dataset name and choose Schedule.

Schedule a dataset

Then, select Create New Schedule to open the scheduling options. From here, you can set the refresh frequency, as well as the specific time and timezone.

Schedule a dataset

Publishing a Dataset to ThoughtSpot

Once you’ve created a Dataset and set a schedule, follow these steps to publish to ThoughtSpot:

  1. Click the blue Publish button in the upper right side of the top menu.

    Publish a Dataset
    You can only publish Datasets you created.
  2. In the Publish to the Data workspace window, enter the Dataset name and click Publish.

    Publish to Workspace
  3. Your Dataset is published to ThoughtSpot as a table. You can access it from the Datasets section in the Data workspace.

    Only users with can manage data privileges can access the Data workspace.
  4. Unlike items from ThoughtSpot Connections, which are live, a Dataset is an extract that runs on a schedule. Your Dataset in ThoughtSpot will refresh according to the schedule(s) you set in Analyst Studio.

Note that if you want to edit an existing, published Dataset, it may break the content built on it. We recommend creating a new Dataset with the desired edits and publishing it. If you delete a Dataset after publishing, you must also delete the Dataset in the ThoughtSpot Data workspace.

Dataset dependencies

When you update a published dataset in Analyst Studio, the following behaviors occur in the Data workspace.

Update to published dataset in Analyst Studio Resulting behavior to dataset in the Data workspace

Data Run outcomes

Dataset run succeeds (manual or via a schedule).

New cached results are made available automatically.

Google Sheet sync succeeds (manual or via a schedule).

New cached results are made available automatically.

Dataset run fails (manual or via a schedule).

Updates are paused—table will fallback to the last successful results until the issue is resolved.

Dataset run succeeds, but exceeds the 10GB result set limit.

Updates are paused—table will fallback to the last successful results until the issue is resolved.

Dataset run succeeds, but exceeds the total 25GB/50GB Workspace limit.

Updates are paused—table will fallback to the last successful results until the total amount of published data is reduced.

Google Sheet sync fails.

Updates are paused—table will fallback to the last successful results until the issue is resolved.

Dataset structure changes

Column is added.

Change is not reflected automatically, requires the Dataset to be republished.

Calculated field is added.

Change is not reflected automatically, requires the Dataset to be republished.

Aggregate calculated field is added.

Not supported in the table view.

Field description is added.

Not supported in the table view.

Column is removed.

Change is not reflected automatically, but will surface an error in the table and on any dependent content. To reflect the change, remove any dependencies leveraging the column and republish the dataset.

Calculated field is removed.

Change is not reflected automatically, but will surface an error in the table and on any dependent content. To reflect the change, remove any dependencies leveraging the column and republish the dataset.

Aggregate calculated field is removed.

Not supported in the table view.

Field description is removed.

Not supported in the table view.

Data type of existing column is changed.

Change is not reflected automatically, but will surface an error in the table and on any dependent content. To reflect the change, remove any dependencies leveraging the column and republish the dataset.

Dataset Lifecycle changes

Dataset name is changed.

The table will break unless the dataset is republished with its original name and Collection schema.

Dataset is moved to a different Analyst Studio Collection.

Moving Collections will update the table’s schema, breaking the table and content that’s built on it unless the schema is updated manually via TML.

Dataset is unpublished.

The table will break unless the Dataset is republished with its original name and Collection schema.

Dataset is deleted in Analyst Studio.

The table will break unless the Dataset is republished with its original name and Collection schema.

If you delete a dataset directly from the Data workspace, then you must republish the dataset from Analyst Studio for it to appear again in the Data workspace.