With the Daily Activity Stream, you can export Dynamic Yield raw data and ingest it into your own analytics platform. This enables you to connect the Dynamic Yield data to additional data sources in your database and build customized reports that are tailored to your unique business needs.
The export consists of two datasets:
- Raw data: Includes raw interactions with Dynamic Yield variations, events, and pageviews.
- Attribution data: Includes the relationship between events and the variations they were attributed to according to your experience settings.
Data is exported daily into a secured Amazon S3 bucket, where it's stored for 30 days.
Data is exported as Apache Parquet files, a format optimized for large data sets.
Data sets
Raw data
This data set contains all interactions of the following types:
- Variation engagement: Variation clicks and impressions. This can include information such as the campaign and variation names and the recommendation strategy.
- Events: Any event that was fired, such as Purchase or Newsletter Subscription, including the properties sent in the event object.
- Pageviews: Sent upon every page load, includes information about the page context (HOMEPAGE, CATEGORY).
All interaction in the data set include some common attributes (such as DYID, session, and event timestamp). In addition, each data type includes some attributes (columns) that are unique. These do not contain data in rows of other data types, which remain empty—this does not indicate a problem or missing information.
The column listing the interaction type in the data set is eventType. The event types are UIA (pageviews), DPX (event hits), and VARIATION_ENGAGEMENT for variation impressions or clicks.
Here are some sample attributes that are unique to each type:
Data type | eventType | Additional Attributes (Examples) |
---|---|---|
Pageview | UIA | URL, page context |
Event hit | DPX | Event properties, event value |
Variation impression or click | VARIATION_ENGAGEMENT | Campaign name, variation name |
For example, if a user views 3 pages, triggers 2 events, and has 1 impression of a variation, the export contains 6 rows.
The following tables include the full list of attributes available for each eventType.
Represents a page viewed by the user.
Attribute | Description | Example |
---|---|---|
eventType String |
The type of activity. For pageview, the value is UIA (as opposed to VARIATION_ENGAGEMENT or DPX). | "UIA" |
interactionId |
The interaction’s identifier (for a specific event, for example, a specific purchase). It can be used to join this data set to the attribution data set. |
1234567890 |
contextType String |
The page type, according to the page context. | "HOMEPAGE" |
contextData String array |
Data about the page type.
|
["Women","Shoes"] |
dyId Long |
The internal identifier Dynamic Yield assigns to each visitor to the site or app, unique per device. | 123456789012345678 |
timestamp Long |
The time the activity occurred, in milliseconds, from the UNIX epoch. | 1621798861400 |
sessionId Integer |
The internal identifier Dynamic Yield assigns to a visitor's session. | 1234567890 |
url String |
The URL from which the event was fired. | "https://www.example.com/?url_params=123" |
urlClean String |
The URL from which the event was fired, after removing any URL parameters. | "https://www.example.com/" |
audiences Integer array |
The list of identifiers for audiences the user is a member of at the time of firing the event. | [1234567, 9876543] |
browser String |
The browser type from which the event was fired. | "Safari" |
device String |
The type of device that triggered the event. | "Tablet" |
operatingSystem String |
The operation system of the device that fired the event. | "Mac OS X" |
screenResolution String |
The screen resolution of the device that fired the event. | "Low (1024px and below)" |
reqTimestamp Long |
Internal request timestamp to the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
procTimestamp Long |
Internal processing timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
resTimestamp Long |
Internal resolution timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
Represents a Dynamic Yield event triggered by the user.
Attribute | Description | Example |
---|---|---|
eventType String |
The type of activity. For an event hit, the value is DPX (as opposed to VARIATION_ENGAGEMENT or UIA). | "DPX" |
interactionId |
The interaction’s identifier (for a specific event, for example, a specific purchase). It can be used to join this data set to the attribution data set. |
1234567890 |
eventId Integer |
A unique identifier for each event explicitly fired from the site. | 12345 |
eventName String |
The event name as written in the event API. | "Purchase" |
eventProperties String (JSON) |
The event properties as written in the event API. These differ depending on the eventType value. | { "transaction_id": "ABC123456", "value": 100.0, "currency": "USD", "dyType": "purchase-v1", "Brands": "Nike", "Categories": "Sneakers", "Number_of_items": 1.0, "cart": [{ "productId": "AIR-123", "quantity": 1.0, "itemPrice": 100.0 }] } |
eventValue Long |
The total value of all items in the cart, as they appear in the eventProperties attribute. The value is in cents. |
10000 |
uniqueTransactionId String |
The transaction ID, as it appears in the eventProperties attribute for a purchase event. |
ABC123456 |
productIds |
The list of product IDs upon which an action was done. |
[“12345”] |
dyId Long |
The internal identifier Dynamic Yield assigns to each visitor to the site or app, unique per device. | 123456789012345678 |
timestamp Long |
The time of the activity has occurred, in milliseconds, from the UNIX epoch. | 1621798861400 |
sessionId Integer |
The Internal identifier Dynamic Yield assigns to a visitor's session | 1234567890 |
url String |
The URL from which the event was fired. | "https://www.example.com/?url_params=123" |
urlClean String |
The URL from which the event was fired, after removing any URL parameters. | "https://www.example.com/" |
audiences Integer array |
The list of identifiers of audiences the user is a member of at the time of firing the event. | [1234567, 9876543] |
browser String |
The browser type from which the event was fired. | "Safari" |
device String |
The type of device that triggered the event | "Tablet" |
operatingSystem String |
The operation system of the device that fired the event. | "Mac OS X" |
screenResolution String |
The screen resolution of the device that fired the event. | "Low (1024px and below)" |
reqTimestamp Long |
Internal request timestamp to the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
procTimestamp Long |
Internal processing timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
resTimestamp Long |
Internal resolution timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
Represents an impression of a variation or a click on a variation.
Attribute | Description | Example |
---|---|---|
eventType String |
The type of activity. For variation click or impression, the value is VARIATION_ENGAGEMENT (as opposed to UIA or DPX). | "VARIATION_ENGAGEMENT" |
interactionId |
The interaction’s identifier (for a specific event, for example, a specific purchase). It can be used to join this data set to the attribution data set. |
1234567890 |
engagementType String |
The type of engagement with the variation. Possible values:
|
"IMPRESSION" |
campaignId Integer |
The ID of the campaign that this variation is part of. For engagement with Experience Email blocks, this represents the ID of the block, while parentCampaignId contains the ID of the Experience Email campaign. | 123456 |
campaignName String |
The name of the campaign that this variation is part of. For engagement with Experience Email blocks, this represents the name of the block, while parentCampaignName contains the name of the Experience Email campaign. | "Homepage Banner" |
experienceId Integer |
The ID of the experience that this variation is part of. | 123456 |
experienceName String |
The name of the experience that this variation is part of. | "Summer Promo" |
experimentId Integer |
The unique identifier of the test. | 123456 |
versionId Integer |
The unique identifier of the test version. An A/B test might have multiple versions. | 245467 |
variationIds Integer array |
The ID of the variation that the user was served with (if the type is IMPRESSION) or clicked on (if the type is CLICK). Usually, the list contains a single ID, but if the campaign type is "Dynamic Content Item List", it contains a list of variations, separated by commas. |
[1234567, 9876543] |
attributionVariationIds Integer array |
The ID of the variation in the attribution dataset. This identifier should be used to join the attribution dataset to the raw dataset, where raw.attributionVariationIds = attribution.variations. Usually, the list contains a single ID, but if the campaign type is "Dynamic Content Item List", it contains a list of variations, separated by commas. |
[1234567, 9876543] |
variationNames String array |
The name of the variation that the user was served (if the type is IMPRESSION) or clicked on (if the type is CLICK). Usually, the list contains a single ID, but if the campaign type is "Dynamic Content Item List", it contains a list of variations, separated by commas. |
["Blue Button", "Red Button"] |
sku String array |
If the event is a view of or click on a recommendation widget - i.e. engagementType is "CLICK" or "STRATEGY_IMPRESSION" -, it's the list of SKUs that were recommended or the SKU that was clicked. |
["1234", "9876"] |
strategyId Integer array |
If the event is a view of or click on a recommendation widget, it's the ID of the Strategy that was served. A single variation can include multiple widgets with multiple strategies. |
[126651,426356] |
strategyName |
If the event is a view of or click on a recommendation widget, it's the name of the Strategy that was served. A single variation can include multiple widgets with multiple strategies. |
["Most Popular","Affinity"] |
touchpointId Integer |
In touchpoints only: The ID of the touchpoint. | 245467 |
touchpointName |
In touchpoints only: The name of the touchpoint. | ["Hero Banner"] |
parentVariationId Integer |
In touchpoints only: The ID of the variation that serves this touchpoint in the multi-touch campaign. | [9876543] |
parentVariationName |
In touchpoints only: The name of the variation that serves this touchpoint in the multi-touch campaign. | "Blue Design" |
parentCampaignId |
If the engagement is with an Experience Email block, this field contains the Campaign ID of the Experience Email campaign | 123456 |
parentCampaignName |
If the engagement is with an Experience Email block, this field contains the campaign name of the Experience Email campaign | "Experience Email Campaign 1" |
dyId Long |
The internal identifier Dynamic Yield assigns to each visitor to the site or app, unique per device. | 123456789012345678 |
timestamp Long |
The time the activity occurred, in milliseconds, from the UNIX epoch. | 1621798861400 |
sessionId Integer |
The internal identifier Dynamic Yield assigns to a visitor's session | 1234567890 |
url String |
The URL from which the event was fired. | "https://www.example.com/?url_params=123" |
urlClean String |
The URL from which the event was fired, after removing any URL parameters. | "https://www.example.com/" |
audiences Integer array |
The list of identifiers of audiences the user is a member of at the time of firing the event. | [1234567, 9876543] |
reqTimestamp Long |
Internal request timestamp to the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
procTimestamp Long |
Internal processing timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
resTimestamp Long |
Internal resolution timestamp of the analytics pipeline, in milliseconds, from the UNIX epoch. | 1621798861400 |
Attribution data
Learn more about conversion attribution on DynamicYield.com: Understanding conversion attribution scoping in A/B testing
This data set contains the relationship between a variation and the events attributed to it according to the experience settings.
Each data set record represents a distinct event-variation combination.
Note that there is usually a one-to-many relationship between events and variations. For example, if 1 purchase was attributed to 2 variations (control of test A, variation of test B), the data set contains 2 rows.
Field | Description |
---|---|
interactionId (int) |
The interaction’s identifier (for a specific event, for example, a specific purchase). It can be used to join this data set to the attribution data set. |
eventType (varchar) |
The type of event being attributed to a variation. Currently, only DPX events are exported. |
dyid (int) |
The identifier of the user who triggered the attributed event. |
experimentId (int) |
The Experiment ID to which the event was attributed. |
versionId (int) |
The Version ID of the Experiment to which the event was attributed. |
variations (array) |
The array of Variation IDs to which the event was attributed. This array usually contains a single value, but can contain multiple values if the experience is an Item List. This identifier corresponds to attributionVariationIds in the raw dataset. |
eventId (int) |
The ID of the DPX event attributed to the variation. For example, for a purchase, this would be the event ID of the Purchase event. |
eventValue (int) |
The value of the DPX event attributed to the variation. For example, for a purchase, this would represent revenue connected to the Purchase event. The value is in cents. |
Turning on the Daily Activity Stream
To turn on the Daily Activity Stream:
- Go to Settings › General Settings › Daily Activity Stream.
- Click Turn on daily export.
- Copy the S3 bucket path and credentials to a secure location. Note that it's only displayed once.
Lost your credentials?
Click the additional options iconand then click Generate New Credentials. Keep in mind that you can only generate new credentials once.
- That's it! The first data set will be exported after midnight, following which new data will be added daily.
Note: If a section has no traffic on a specific day, there will be no export of data or folder for that day in the AWS S3 bucket.
After activating the Daily Activity Stream (turning the toggle on), the page also displays the status of the export.
You can always disable the export in the options menu. If you disable the export and then enable it again, your S3 bucket remains the same, but you'll be given a new set of credentials.
Accessing your S3 bucket
After you activate the Daily Activity Stream, you'll be given access to an S3 bucket with the following path:
- s3://dy-raw-data-export/sectionId=1234567, for the US data center
- s3://dy-raw-data-export-eu/sectionId=1234567, for the EU data center.
Under this path, you'll find subfolders for each date, where parquet files are stored.
For example, this path contains all the data collected on 2022-01-01, according to the time zone configured in your General Settings:
s3://dy-raw-data-export-eu/sectionId=1234567/date=2022-01-01
Under this path, you'll find subfolders for each type of report
- This path includes the currently exported raw data:
s3://dy-raw-data-export/sectionId=1234567/date=2023-01-01/reportType=raw
- This path includes the new attribution data:
s3://dy-raw-data-export/sectionId=1234567/date=2023-01-01/reportType=attribution
Download the files locally via Terminal
- Create a folder on your computer to contain the downloaded files.
- Open Terminal, and navigate to the folder you created.
- Enter the command aws configure.
- When prompted, enter your access key and secret key, as generated in your feed sync.
- When prompted, enter your data center region: eu-central-1 or us-east-1, as appropriate.
- Upon the next prompt (default output), just press Enter.
-
When prompted, enter aws s3 sync followed by the relevant S3 path (see Accessing your S3 bucket) and the local folder name created in step 1.
For example:
aws s3 sync s3://dy-raw-data-export/sectionId=123456/date=2024-01-01/reportType=raw/raw_data - The files download. You can see them in Terminal, and upon completion, you'll see them in the folder you created in step 1.
Import data into your analytics platform
To integrate the Daily Activity Stream data into your systems automatically, you might want to create a job to extract the Parquet files from S3 and load them into your database of choice. Here are some links to relevant articles for popular solutions:
- Google BigQuery
- Microsoft SQL server
- Amazon Athena
- Amazon Redshift
- Snowflake
- Oracle 12c
- Databricks
- IBM Infosphere
- Informatica
SQL examples to process Daily Activity Stream data
After you've successfully imported the Daily Activity Stream data from the S3 bucket into your analytics platform, you can start processing it.
Note: the following examples might require adaptation to the syntax and supported functions of your analytics platform.
Parse the Raw data set Parquet file
-- parse parquet files in raw folder
create or replace table parsed_raw_data_daily_activity_stream
as
select
try_to_date(replace(SPLIT_PART(file_name, '/',2), 'date=','')) as date
,$1:audiences::variant as audiences
,$1:browser::string as browser
,$1:contextData::variant as context_data
,$1:contextType::string as context_type
,$1:device::string as device
,$1:dyid::int as dy_id
,$1:eventId::int as event_id
,$1:eventName::string as event_name
,$1:eventValue::string as event_value
,replace($1:eventProperties::string,'\\','') as event_properties
,$1:eventType::string as event_type
,$1:operatingSystem::string as operating_system
,$1:procTimestamp::int as proc_timestamp
,$1:productIds::variant as product_ids
,$1:reqTimestamp::int as req_timestamp
,$1:screenResolution::string as screen_resolution
,$1:sessionId::int as session_id
,$1:timestamp::int as timestamp
,$1:url::string as url
,$1:urlClean::string as url_clean
,$1:userAgent::string as user_agent
,file_date
,$1:campaignId::int as campaign_id
,$1:campaignName::string as campaign_name
,$1:engagementType::string as engagement_type
,$1:experienceId::int as experience_id
,$1:experienceName::string as experience_name
,$1:experimentId::int as experiment_id
,$1:versionId::int as experiment_version_id
,$1:variationIds::variant as variation_ids
,$1:attributionVariationIds::variant as attribution_variation_ids
,$1:variationNames::variant as variation_names
,$1:skus::variant as skus
,$1:strategyId::int as strategy_id
,$1:strategyName::string as strategy_name
,$1:interactionId::int as interaction_id
,file_name
,$1 as json_object
from daily_activity_stream
where contains(file_name, '/reportType=raw/')
;
Explore the imported raw data set
-- preview first 100 records of raw data, ordered by timestemp
select *
from parsed_raw_data_daily_activity_stream
order by timestamp
limit 100
;
-- see all event_types
select distinct
event_type
from parsed_raw_data_daily_activity_stream
limit 100
;
-- count events (type DPX)
select
event_id
,event_name
,count(*)
from parsed_raw_data_daily_activity_stream
where event_type = 'DPX'
group by all
;
-- see all purchases
select *
from parsed_raw_data_daily_activity_stream
where event_id = {value_event_id}
limit 1000
;
-- aggregate raw by strategy
select
strategy_id
,strategy_name
,count(distinct dy_id) as users
,sum(iff(engagement_type = 'STRATEGY_IMPRESSION', 1, 0)) as impressions
,sum(iff(engagement_type = 'CLICK', 1, 0)) as clicks
from parsed_raw_data_daily_activity_stream
where event_type = 'VARIATION_ENGAGEMENT'
and strategy_id is not null
group by strategy_id, strategy_name
order by impressions desc
;
-- aggregate raw by strategy and sku to get the number of impressions and clicks on each sku
select
r.strategy_id
,r.strategy_name
,trim(p.value) as sku
,sum(iff(engagement_type = 'STRATEGY_IMPRESSION', 1, 0)) as impressions
,sum(iff(engagement_type = 'CLICK', 1, 0)) as clicks
from parsed_raw_data_daily_activity_stream r
,lateral flatten (input => r.skus) p
where event_type = 'VARIATION_ENGAGEMENT'
and strategy_id is not null
group by strategy_id, strategy_name, sku
;
Parse the Attribution data set Parquet file
-- parse parquet files in attribution folder
create or replace table parsed_attribution_daily_activity_streamd
as
select
try_to_date(replace(SPLIT_PART(file_name, '/',2), 'date=','')) as date
,$1:dyid::int as dy_id
,$1:eventId::int as event_id
,$1:eventType::varchar as event_type
,$1:eventValue::int as event_value
,$1:experimentId::int as experiment_id
,$1:interactionId::int as interaction_id
,$1:variations[0]::int as variation_id
,$1:versionId::int as version_id
,file_name
from daily_activity_stream
where contains(file_name, '/reportType=attribution/')
;
Explore the imported attribution data set
-- preview first 100 records of attribution
select * from parsed_attribution_daily_activity_streamd
limit 100
;
-- preview the number of events and sum of events value per variation and event id
select
variation_id
,event_id
,count(*) as event_count
,sum(event_value) as event_value
from parsed_attribution_daily_activity_streamd
where version_id = {value_version_id}
and date between {value_from_date} and {value_to_date}
group by variation_id, event_id
;
Join the Raw and Attribution data sets together
-- aggregate raw by variation
select
date
,campaign_id
,campaign_name
,experience_name
,experiment_version_id
,attribution_variation_ids[0]::int as attribution_variation_id
,trim(variation_names[0]) as variation_name
,strategy_id
,strategy_name
,count(distinct dy_id) as users
,sum(iff(engagement_type = 'IMPRESSION', 1, 0)) as impressions
,sum(iff(engagement_type = 'CLICK', 1, 0)) as clicks
from parsed_raw_data_daily_activity_stream
where event_type = 'VARIATION_ENGAGEMENT'
and campaign_id is not null
group by date, campaign_id, campaign_name, experience_name, experiment_version_id, attribution_variation_id, variation_name, strategy_id, strategy_name
;
-- aggregate attribution by variation
select
date
,version_id
,variation_id
,event_id
,count(distinct dy_id) as unique_converters
,count(*) as event_count
,sum(event_value) as event_value
from parsed_attribution_daily_activity_streamd
group by date, version_id, variation_id, event_id
;
-- put it together
with engagement as
(
select
date
,campaign_id
,campaign_name
,experience_name
,experiment_version_id
,attribution_variation_ids[0]::int as attribution_variation_id
,trim(variation_names[0]) as attribution_variation_id
,count(distinct dy_id) as users
,sum(iff(engagement_type = 'IMPRESSION', 1, 0)) as impressions
,sum(iff(engagement_type = 'CLICK', 1, 0)) as clicks
from parsed_raw_data_daily_activity_stream
where event_type = 'VARIATION_ENGAGEMENT'
and campaign_id is not null
group by date, campaign_id, campaign_name, experience_name, experiment_version_id, attribution_variation_id, attribution_variation_id
)
,attribution as
(
select
date
,version_id
,variation_id
,event_id
,count(distinct dy_id) as unique_converters
,count(*) as event_count
,sum(event_value) as event_value
from parsed_attribution_daily_activity_streamd
group by date, version_id, variation_id, event_id
)
,event_metadata as
(
select
event_id
,event_name
,count(*) as event_count
,sum(event_value) as event_value
from parsed_raw_data_daily_activity_stream
where event_type = 'DPX'
group by event_id, event_name
)
select
e.date
,e.campaign_id
,e.campaign_name
,e.experience_name
,e.experiment_version_id
,e.attribution_variation_id
,e.variation_name
,e.users
,e.impressions
,e.clicks
,ev.event_name
,a.event_id
,a.unique_converters
,a.event_count
,a.event_value
from engagement e
left join attribution a
on a.date = e.date
and a.version_id = e.experiment_version_id
and a.variation_id = e.attribution_variation_id
left join event_metadata ev
on ev.event_id = a.event_id
order by e.campaign_id, e.experiment_version_id, a.event_id, e.attribution_variation_id
;