AEP: Play with schema and dataset

September 24, 2024

In this post I will call few of the basic of datasets in AEP. To start with here is the schema I created

This is based on a custom class if Record Type (this class comes with one attribute Identifier shown as locked in the schema).

I made the identifier as primary identity.

Created a dataset with the same name as class.

Data Loading

1. Delimitted File

Created a simple delimitted file as below and created a workflow to load it into the dataset

Sample Data:

id,dataset_name,last_snapshot_id,process_timestamp,process_status,failure_reason

journey_step_events_27878,journey_step_events,27878,2024-09-12T19:19:50.036Z,SUCCESSFUL,Not applicable

aa_stitched_events_23451,aa_stitched_events,23451,2024-09-17T19:19:50.036Z,FAILED,There was no snapshot available

journey_step_events_27891,journey_step_events,27891,2024-09-16T19:19:50.036Z,In Process,Not applicable

2. Using JSON

Download the JSON format from the schma page and create JSON file using the format. For multiple records it needs to be array as below.

[{

"_mytechnologys": {

"dataset_name": "aa_stitched_events",

"failure_reason": "Not applicable JSON",

"last_snapshot_id": 23049,

"process_status": "WIP",

"process_timestamp": "2018-11-12T20:20:39+00:00"

"_id": "aa_stitched_events_23047"

}]

3. Using SQL

You can insert the record using SQL as well. Here is the sample SQL I was using for this schema

INSERT INTO tg_checkpoint_log

SELECT

'journey_step_events_27880' as _id,struct('journey_step_events' AS dataset_name,

27880 as last_snapshot_id,cast(CURRENT_TIMESTAMP AS TIMESTAMP) as process_timestamp, 'WIP' as process_status) as _mytechnologys;

Note: the step and 2 can also be done using API. For file loading, it is recommended to use Parquet file. I didn't try to load csv file using API though.

Here the data is always inserted. Even after making the _id field as primary identifier, while inserting with same values, it does not update records. If you select, you will find duplciate records in the schema.

Search This Blog

Campaign Management Tool Advanced Topics