# 2.4 Data Ingestion from Offline Sources

In this exercise, the goal is to onboard external data like CRM Data in Platform.

## Learning Objectives

* Learn how to generate test data
* Learn how to ingest CSV
* Learn how to use the web UI for data ingestion through Workflows
* Understand the data governance features of Experience Platform

## Resources

* Mockaroo UI: <https://www.mockaroo.com/>
* Experience Platform UI: <https://experience.adobe.com/platform/>

## Tasks

* Create a CSV file with demo date. Ingest the CSV file in Adobe Experience Platform by making use of the available workflows.
* Understand data governance options in Adobe Experience Platform

## 2.4.1 Create your CRM Dataset through a data generator tool

For this you need 1000 sample lines of CRM Data.

Open the Mockaroo Template by going to <https://www.mockaroo.com/12674210>.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-ab7d69ab398fd7b3ce03c9d05323fd51f4d5db96%2Fmockaroo.png?alt=media)

On the template, you'll notice the following fields:

* id
* first\_name
* last\_name
* email
* gender
* birthDate
* home\_latitude
* home\_longitude
* country\_code
* city
* country

All these fields have been defined to produce data that is compatible with Platform.

To generate your CSV-file, click the **Download Data** button which will give you a CSV-file with 1000 lines of demo-data.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-b7f92b242fd05f68f466447ea0b2ce55f15d8d7e%2Fdd.png?alt=media)

Open your CSV-file in Microsoft Excel to visualize its contents.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-b0a29bd5df1bfc884b33e42536855265a62ad86e%2Fexcel.png?alt=media)

With your CSV-file ready, you can proceed with mapping it against XDM.

### 2.4.2 Verify the CRM Onboarding Dataset in Adobe Experience Platform

Open [Adobe Experience Platform](https://experience.adobe.com/platform) and go to **Datasets**.

Before you continue, you need to select a **sandbox**. The sandbox to select is named `--module2sandbox--`. You can do this by clicking the text **Production Prod** in the blue line on top of your screen. After selecting the appropriate sandbox, you'll see the screen change and now you're in your dedicated sandbox.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-f8fe0607dc3e272ebd2416a98220bb1d82c1a166%2Fsb1.png?alt=media)

In Adobe Experience Platform, click on **Datasets** in the menu on the left side of your screen.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-582e17b4e0ea5ed30e75a09b92b9983fcac1a05e%2Fmenudatasetssb.png?alt=media)

You're going to use a shared dataset based in this enablement. The shared dataset has been created already and is called **Demo System - Profile Dataset for CRM (Global v1.1)**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-6c6472af04115e7fe85270f7209f128de5169a25%2Femeacrm.png?alt=media)

Open the dataset **Demo System - Profile Dataset for CRM (Global v1.1)**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-8a1425cc551e56294da4e09588bd948865127dee%2Femeacrmoverview.png?alt=media)

On the overview screen, you can see 3 main pieces of information.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-54b42d4502c0500cfa659c73a5b242f98aa4de97%2Fdashboard.png?alt=media)

First of all, the Dataset Activity dashboard shows the total number of CRM records in the dataset and the ingested batches and their status

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-0b88ffe9a68678f34a08cff994798d6ba74807cf%2Fbatchids.png?alt=media)

Second, by scrolling down on the page you can check when batches of data were ingested, how many records were onboarded and also, whether or not the batch was successfully onboarded. The **Batch ID** is the identifier for a specific batch job, and the **Batch ID** is important as it can be used for troubleshooting why a specific batch was not successfully onboarded.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-6717819cf89114ef235ad004e579a1917c903a7f%2Fdatasetsettings.png?alt=media)

Lastly, the Dataset Info tab shows important information like the Dataset ID (again, important from a troubleshooting perspective), the Dataset's Name and whether the dataset was enabled for Profile.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-edbb8f2054a13a6783a16be10d24c4c190555050%2Fds_ups_link.png?alt=media)

The most important setting here is the link between the dataset and the Schema. The Schema defines what data can be ingested and how that data should look like.

In this case, we're using the **Demo System - Profile Schema for CRM (Global v1.1)**, which is mapped against the class of **Profile** and has implemented extensions, also called field groups.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-1b0f65c637856dac7ae5e76d40a58b51d922d59b%2Fds_schemalink.png?alt=media)

By clicking on the name of the schema, you're taken to the Schema overview were you can see all the fields that have been activated for this schema.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-0f8bbd27c407f106ee98659bd62270898ef22844%2Fschemads.png?alt=media)

Every schema needs to have a custom, primary descriptor defined. In the case of our CRM dataset, the schema has defined that the field **crmId** should be the primary identifier. If you want to create a schema and link it to the Real-time Customer Profile, you need to define a custom Field Group that refers to your primary descriptor.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-4c126f31255db27366fff8ef82b84032e20c1729%2Fschema_descriptor.png?alt=media)

In the above screenshot, you can see that our descriptor is located in `--aepTenantId--.identification.core.crmId`, which is set as the Primary Identifier, linked to the namespace of **Demo System - CRMID**.

Every schema and as such, every dataset that should be used in the Real-time Customer Profile should have one Primary identifier. This Primary identifier is the identifier user by the brand for a customer in that dataset. In the case of a CRM dataset it might be the email-address or the CRM ID, in the case of a Call Center dataset it might be the mobile number of a customer.

It is best practice to create a separate, specific schema for every dataset and to set the descriptor for every dataset specifically to match how the current solutions used by the brand operate.

### 2.4.3 Using a workflow to map a CSV file to an XDM Schema

The goal of this is to onboard CRM data in Platform. All the data that is ingested in Platform should be mapped against the specific XDM Schema. What you currently have is a CSV dataset with 1000 lines on the one side, and a dataset that is linked to a schema on the other side. To load that CSV file in that dataset, a mapping needs to take place. To facilitate this mapping exercise, we have **Workflows** available in Adobe Experience Platform.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-f20a22d15d4e9c4eaf683e013e485c378dd3efff%2Fworkflows.png?alt=media)

The workflow that we'll use here, is the workflow named **Map CSV to XDM Schema** in the Data Ingestion menu.

Click the **Map CSV to XDM Schema** button. Click **Launch** to start the process.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-d174530ed6c1f95a9d50b1f413c09a8d9b0660de%2Fmapcsvxdm.png?alt=media)

On the next screen, you need to select a dataset to ingest your file in. You have the choice between selecting an already existing dataset or creating a new one. For this exercise, we'll reuse an existing one: please select **Demo System - Profile Dataset for CRM (Global v1.1)** as indicated below and leave the other settings set to default.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-2f643fa202dc08d4548bc7cfb36bcc7006eae52c%2Fdatasetselection.png?alt=media)

Click **Next** to go to the next step.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-67fb329a137fc51fa857fe53ca3f20e871af8923%2Fnext.png?alt=media)

Drag & Drop your CSV-file or click **Browse** and navigate on your computer to your desktop and select your CSV-file.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-8425e429bc789ced1c987bdb9df86d58388ec604%2Fdragdrop.png?alt=media)

After selecting your CSV-file it will upload immediately and you will see a preview of your file within seconds.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-2278ba3625c07ddc3ff4ffd4804928575d36504b%2Fpreviewcsv.png?alt=media)

Click **Next** to go to the next step. It can take a few seconds while the file is processed completely.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-67fb329a137fc51fa857fe53ca3f20e871af8923%2Fnext.png?alt=media)

You now need to map your CSV Column Headers with an XDM-property in your **Demo System - Profile Dataset for CRM**.

Adobe Experience Platform has already made some proposals for you, by trying to link the Source Attributes with the Target Schema Fields.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-2fabe3c8c6c138e67ee5a95021f959ac99414592%2Fmapschema.png?alt=media)

For the Schema Mappings, Adobe Experience Platform has tried to link fields together already. However, not all proposals of mapping are correct. You now need to **Accept target fields** one-by-one.

#### birthDate

The Source Schema field **birthDate** should be linked to the target field **person.birthDate**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-fa1821dce2088476f8c8fad0106d3e3b8de563bf%2Ftfbd.png?alt=media)

#### city

The Source Schema field **city** should be linked to the target field **homeAddress.city**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-97429a3a0992c583b44e87a1e6fa3f0a50715e8a%2Ftfcity.png?alt=media)

#### country

The Source Schema field **country** should be linked to the target field **homeAddress.country**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-c63c5f2e05d66177592c809b9f1c8731b69319cb%2Ftfcountry.png?alt=media)

#### country\_code

The Source Schema field **country\_code** should be linked to the target field **homeAddress.countryCode**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-a8402c6480f12f8144addcd2f343a7c05fd9f436%2Ftfcountrycode.png?alt=media)

#### email

The Source Schema field **email** should be linked to the target field **personalEmail.address**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-25d8a0fd2e58c4b53faf81da497c7f45be76abd8%2Ftfemail.png?alt=media)

#### crmid

The Source Schema field \*\* crmid\*\* should be linked to the target field **`--aepTenantId--`.identification.core.crmId**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-1d68cf101f7a369e5c238d9b4618fcb9e4a8d86a%2Ftfemail1.png?alt=media)

#### first\_name

The Source Schema field **first\_name** should be linked to the target field **person.name.firstName**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-418178f739eb0c57f1e8dc994baad01319270f99%2Ftffname.png?alt=media)

#### gender

The Source Schema field **gender** should be linked to the target field **person.gender**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-029907ff9198a1503b2e8529428c272e84c03c84%2Ftfgender.png?alt=media)

#### home\_latitude

The Source Schema field **home\_latitude** should be linked to the target field **homeAddress.\_schema.latitude**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-7fcb0c009357e9760c27ab24e1b6f93d66964196%2Ftflat.png?alt=media)

#### home\_longitude

The Source Schema field **home\_longitude** should be linked to the target field **homeAddress.\_schema.longitude**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-ab2a7e691d1f8adb7efe38e0fe4190f7fda0d64d%2Ftflon.png?alt=media)

#### id

The Source Schema field **id** should be linked to the target field **\_id**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-9511fdf9ae9e165973c45f058950dc1b75396028%2Ftfid1.png?alt=media)

#### last\_name

The Source Schema field **last\_name** should be linked to the target field **person.name.lastName**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-42adeb7a9342b20192eae23d00f7ca85ca41c0d1%2Ftflname.png?alt=media)

You should now have this:

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-699e7613014c569150043a68c462db729fb6cd63%2Foverview.png?alt=media)

Click the **Finish** button to finish the workflow.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-7816623ffa15b9ae5a3fbef04c18d1f7431012cd%2Ffinish.png?alt=media)

After clicking **Finish**, you'll then see the **Dataflow** overview, and after a couple of minutes you can refresh your screen to see if your workflow completed successfully. Click your **Target dataset name**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-6cdcfc77d5fea63a6649199618018a4221d5ac1c%2Fdfsuccess.png?alt=media)

You'll then see the dataset where your ingestion has processed.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-45c3b690105f3fb988a065e700952048b7e93af8%2Fingestdataset.png?alt=media)

On the dataset, you'll see a Batch ID that has been ingested just now, with 1000 records ingested and a status of **Success**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-f63b8fffd2346429175376a9bce46f90bc74b16f%2Fbatchsuccess1.png?alt=media)

Click on the **Preview Dataset**- button to get a quick view of a small sample of the dataset to ensure that the loaded data is correct.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-024e58d34bcb84beb5212da08b159735f23b6a7f%2Fpreview.png?alt=media)

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-0f8962d0d4a888a04f5f6a640dc2a8feb98c5c05%2Fpreviewdata.png?alt=media)

Once data is loaded, you can define the correct data governance approach for our dataset.

### 2.5.4 Adding data governance to your dataset

Now that your customer data is ingested, you need to make sure that this dataset is properly governed for usage and export control. Click on the **Data Governance** tab and observe that you can set three types of restrictions: Contractual, Identity, and Sensitive Data.

You can find more info on the different labels and how they will be enforced in the future through the policy framework on this link: <https://www.adobe.io/apis/experienceplatform/home/dule/duleservices.html>

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-8a967ddde6f557586973b994da4e10dea3cdbc7c%2Fdsgovernance.png?alt=media)

Let's restrict identity data for the entire dataset. Hover over your dataset name, and click the Pencil icon to edit the settings.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-56e36274a1bf186eb23d573d10dbf54cc3c41957%2Fpencil.png?alt=media)

Go to **Identity Data** and you'll see that the **I2** option is checked - this will assume that all pieces of information in this dataset are at least indirectly identifiable to the person.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-db057e0680bd79e5546598411b4c7908dc75ab16%2Fidentity.png?alt=media)

Click **Save Changes** and observe that **I2** is now set for all data fields in the dataset.

You can also set these flags for individual data fields - for example, the **firstName** field is likely to be classified as an **I1** level for directly identifiable information.

Select the field **firstName** by checking the checkbox and click on **Edit Governance Labels** in the upper right corner of your screen.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-c389ff48af22e7cc12aba5e1f38e9abb7210de87%2Feditfirstname.png?alt=media)

Go to **Identity Data** and you'll see that the **I2** option is already checked (inherited from the dataset). The field firstName also has a field-specific configuration and is set as **I1 - Directly Identifiable Data**.

![Data Ingestion](https://858372621-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpBC8bA57il8Sj47B7QPJ%2Fuploads%2Fgit-blob-ff5c276589c6d05f74ffd2d77924234ca79d1f88%2Ffndii.png?alt=media)

With this, you've now successfully ingested and classified CRM Data in Adobe Experience Platform.
