AEP: Play with schema and dataset

 In this post I will call few of the basic of datasets in AEP. To start with here is the schema I created




This is based on a custom class if Record Type (this class comes with one attribute Identifier shown as locked in the schema).

I made the identifier as primary identity.

Created a dataset with the same name as class.

Data Loading

1. Delimitted File

Created a simple delimitted file as below and created a workflow to load it into the dataset
Sample Data:
id,dataset_name,last_snapshot_id,process_timestamp,process_status,failure_reason
journey_step_events_27878,journey_step_events,27878,2024-09-12T19:19:50.036Z,SUCCESSFUL,Not applicable
aa_stitched_events_23451,aa_stitched_events,23451,2024-09-17T19:19:50.036Z,FAILED,There was no snapshot available
journey_step_events_27891,journey_step_events,27891,2024-09-16T19:19:50.036Z,In Process,Not applicable

2. Using JSON

Download the JSON format from the schma page and create JSON file using the format. For multiple records it needs to be array as below. 
[{
    "_mytechnologys": {
        "dataset_name": "aa_stitched_events",
        "failure_reason": "Not applicable JSON",
        "last_snapshot_id": 23049,
        "process_status": "WIP",
        "process_timestamp": "2018-11-12T20:20:39+00:00"
    },
    "_id": "aa_stitched_events_23047"
}]

3. Using SQL

You can insert the record using SQL as well. Here is the sample SQL I was using for this schema
INSERT INTO tg_checkpoint_log
SELECT
  'journey_step_events_27880' as _id,struct('journey_step_events' AS dataset_name,
  27880 as last_snapshot_id,cast(CURRENT_TIMESTAMP AS TIMESTAMP) as process_timestamp, 'WIP' as process_status) as _mytechnologys;

Note: the step and 2 can also be done using API. For file loading, it is recommended to use Parquet file. I didn't try to load csv file using API though.

Here the data is always inserted. Even after making the _id field as primary identifier, while inserting with same values, it does not update records. If you select, you will find duplciate records in the schema.

Comments

Popular posts from this blog

Avoid Proxy for HttpClientRequest - IOB-090007 Network error (send(), errno=10054: an existing connection was forcibly closed by the remote host

Base 64 Encoding using Adobe Campaign Standard

Campaign classic sequence of execution of different Javascript Code Bloc inside a workflow delivery