4.3 Queries, queries, queries... and churn analysis
Query Service - Queries, queries, queries... and churn analysis
Objective
Write queries for data analyses
Write SQL queries combining online, call center and loyalty data available in Adobe Experience Platform
Learn about Adobe Defined Functions
Context
In this exercises you will write queries to analyze product views, product funnels, churn etc.
All queries listed in this chapter will be executed in your PSQL command-line interface. You should copy (CTRL-C) the statement blocks indicated with SQL and paste (CTRL-V)them in the PSQL command-line interface. The Query Result blocks show the pasted SQL statement and the associated query result.
4.3.1 Write basic queries for data analysis
Timestamp
Data captured in Adobe Experience Platform is time stamped. The timestamp attribute allows you to analyze data over time.
How many product views do we have on a daily basis?
SQL
selectdate_format( timestamp , 'yyyy-MM-dd') ASDay,count(*) AS productViewsfrom demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and eventType ='commerce.productViews'group byDaylimit10;
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
aepenablementfy21:all=> select date_format( timestamp , 'yyyy-MM-dd') AS Day,
aepenablementfy21:all-> count(*) AS productViews
aepenablementfy21:all-> from demo_system_event_dataset_for_website_global_v1_1
aepenablementfy21:all-> where --aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')
aepenablementfy21:all-> and eventType = 'commerce.productViews'
aepenablementfy21:all-> group by Day
aepenablementfy21:all-> limit 10;
Day | productViews
------------+--------------
2020-07-31 | 2297
(1 row)
Top 5 products viewed
What are the top 5 products viewed?
SQL
select productListItems.name, count(*)from demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and eventType ='commerce.productViews'group by productListItems.nameorder by2 desclimit5;
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
aepenablementfy21:all=> select productListItems.name, count(*)
aepenablementfy21:all-> from demo_system_event_dataset_for_website_global_v1_1
aepenablementfy21:all-> where --aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')
aepenablementfy21:all-> and eventType = 'commerce.productViews'
aepenablementfy21:all-> group by productListItems.name
aepenablementfy21:all-> order by 2 desc
aepenablementfy21:all-> limit 5;
name | count(1)
---------------------------------------+----------
Google Pixel XL 32GB Black Smartphone | 938
SIM Only | 482
Samsung Galaxy S8 | 456
Samsung Galaxy S7 32GB Black | 421
(4 rows)
Product Interaction funnel, from viewing to buying
SQL
select eventType, count(*)from demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and eventType is not nulland eventType <>''group by eventType;
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
aepenablementfy21:all=> select eventType, count(*)
aepenablementfy21:all-> from demo_system_event_dataset_for_website_global_v1_1
aepenablementfy21:all-> where --aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')
aepenablementfy21:all-> and eventType is not null
aepenablementfy21:all-> and eventType <> ''
aepenablementfy21:all-> group by eventType;
eventType | count(1)
------------------------------+----------
commerce.productViews | 2297
commerce.productListAdds | 494
commerce.purchases | 246
(3 rows)
Identify visitors with risk to Churn (visit page => Cancel Service)
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
aepenablementfy21:all=> select distinct --aepTenantId--.identification.core.ecid
aepenablementfy21:all-> from demo_system_event_dataset_for_website_global_v1_1
aepenablementfy21:all-> where --aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')
aepenablementfy21:all-> and web.webPageDetails.name = 'Cancel Service'
aepenablementfy21:all-> group by --aepTenantId--.identification.core.ecid
aepenablementfy21:all-> limit 10;
ecid
----------------------------------
67802232253493573025911610627278
27147331741697745713411940873426
19806347932758146991274525406147
06339676267512351981624626408225
23933440740775575701680766564499
11860828134020790182705892056898
04258863338643046907489131372300
90257333076958492787834714105751
66695181015407529430237951973742
19103852558440070949457567094096
(10 rows)
In the next set of queries we will extend the above query, in order to get a complete view on the customers and their behavior that have been visiting the "Cancel Service" page. You will learn how to use the Adobe Defined Function to sessionize information, identify the sequence and timing of events. You will also join datasets together to further enrich and prepare the data for analysis in Microsoft Power BI.
4.3.2 Advanced Queries
The majority of the business logic requires gathering the touch-points for a customer and ordering them by time. This support is provided by Spark SQL in the form of window functions. Window functions are part of standard SQL and are supported by many other SQL engines.
Adobe Defined Functions
Adobe has added a set of Adobe Defined Functions to the standard SQL syntax that allow you to better understand your experience data. In the next couple of queries you will learn about these ADF functions. You can find more information and the complete list in the documentation.
What do people do on the site before reaching the "Cancel Service" page as the 3rd page in a session?
With this query you will discover the first two Adobe Defined Functions SESS_TIMEOUT and NEXT
The SESS_TIMEOUT() reproduces the visit groupings found with Adobe Analytics. It performs a similar time-based grouping, but customizable parameters.
NEXT() and PREVIOUS() help you to understand how customers navigate your site.
SQL
SELECT webPage, webPage_2, webPage_3, webPage_4,count(*) journeysFROM ( SELECT webPage, NEXT(webPage, 1, true) OVER(PARTITION BY ecid, session.num ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).valueAS webPage_2, NEXT(webPage, 2, true) OVER(PARTITION BY ecid, session.num ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).valueAS webPage_3, NEXT(webPage, 3, true) OVER(PARTITION BY ecid, session.num ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).valueAS webPage_4, session.depth AS SessionPageDepth FROM (select a.--aepTenantId--.identification.core.ecid as ecid, a.timestamp, web.webPageDetails.nameas webPage, SESS_TIMEOUT(timestamp, 60*30) OVER (PARTITION BY a.--aepTenantId--.identification.core.ecid ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ASsessionfrom demo_system_event_dataset_for_website_global_v1_1 awhere a.--aepTenantId--.identification.core.ecid in ( select b.--aepTenantId--.identification.core.ecidfrom demo_system_event_dataset_for_website_global_v1_1 bwhere b.--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and b.web.webPageDetails.name ='Cancel Service' ) ))WHERE SessionPageDepth=1and webpage_3 ='Cancel Service'GROUP BY webPage, webPage_2, webPage_3, webPage_4ORDER BY journeys DESCLIMIT 10;
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
webPage | webPage_2 | webPage_3 | webPage_4 | journeys
---------------------------------------+---------------------------------------+----------------+------------+----------
Citi Signal Sport | Google Pixel XL 32GB Black Smartphone | Cancel Service | Call Start | 2
SIM Only | Citi Signal Shop | Cancel Service | | 2
SIM Only | Telco Home | Cancel Service | | 2
TV & Broadband Deals | Samsung Galaxy S7 32GB Black | Cancel Service | | 2
Telco Home | Citi Signal Sport | Cancel Service | Call Start | 2
Google Pixel XL 32GB Black Smartphone | Broadband Deals | Cancel Service | | 2
Broadband Deals | Samsung Galaxy S7 32GB Black | Cancel Service | | 2
Broadband Deals | Samsung Galaxy S8 | Cancel Service | | 1
Samsung Galaxy S8 | Google Pixel XL 32GB Black Smartphone | Cancel Service | | 1
SIM Only | Google Pixel XL 32GB Black Smartphone | Cancel Service | Call Start | 1
(10 rows)
How much time do we have before a visitor calls the call center after visiting the "Cancel Service" Page?
To answer this kind of query will we use the TIME_BETWEEN_NEXT_MATCH() Adobe Defined Function.
Time-between previous or next match functions provide a new dimension, which measures the time that has elapsed since a particular incident.
SQL
select*from (select--aepTenantId--.identification.core.ecid as ecid, web.webPageDetails.nameas webPage, TIME_BETWEEN_NEXT_MATCH(timestamp, web.webPageDetails.name='Call Start', 'seconds') OVER(PARTITION BY --aepTenantId--.identification.core.ecid ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)AS contact_callcenter_after_secondsfrom demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and web.webPageDetails.namein ('Cancel Service', 'Call Start')) rwhere r.webPage ='Cancel Service'limit15;
Copy the statement above and execute it in your PSQL command-line interface.
Query Result
ecid | webPage | contact_callcenter_after_seconds
----------------------------------+----------------+----------------------------------
00331886620679939148047665693117 | Cancel Service |
00626561600197295782131349716866 | Cancel Service |
00630470663554417679969244202779 | Cancel Service | -797
00720875344152796154458668700428 | Cancel Service | -519
00746064605049656090779523644276 | Cancel Service | -62
00762093837616944422322357210965 | Cancel Service |
00767875779073091876070699689209 | Cancel Service |
00798691264980137616449378075855 | Cancel Service |
00869613691740150556826953447162 | Cancel Service | -129
00943638725078228957873279219207 | Cancel Service | -750
01167540466536077846425644389346 | Cancel Service |
01412448537869549016063764484810 | Cancel Service |
01419076946514450291741574452702 | Cancel Service | -482
01533124771963987423015507880755 | Cancel Service |
01710651086750904478559809475925 | Cancel Service |
(15 rows)
And what is the outcome of that contact?
Explain that we are joining datasets together, in this case we join our demo_system_event_dataset_for_website_global_v1_1 with demo_system_event_dataset_for_call_center_global_v1_1. We do this to know the outcome of the call center interaction.
SQL
select distinct r.*, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callFeeling, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callTopic, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callContractCancelledfrom (select--aepTenantId--.identification.core.ecid ecid, web.webPageDetails.nameas webPage, TIME_BETWEEN_NEXT_MATCH(timestamp, web.webPageDetails.name='Call Start', 'seconds') OVER(PARTITION BY --aepTenantId--.identification.core.ecid ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)AS contact_callcenter_after_secondsfrom demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and web.webPageDetails.namein ('Cancel Service', 'Call Start')) r, demo_system_event_dataset_for_call_center_global_v1_1 cwhere r.ecid = c.--aepTenantId--.identification.core.ecidand r.webPage ='Cancel Service'and c.--aepTenantId--.interactionDetails.core.callCenterAgent.callContractCancelled IN (true,false)and c.--aepTenantId--.interactionDetails.core.callCenterAgent.callTopic IN ('contract', 'invoice','complaint','wifi')limit15;
Copy the statement above and execute it in your PSQL command-line interface.
Lets include the geographical info, like longitude, attitude, city, countrycode, captured by the Adobe Experience Platform in order to get some geographical insights about churning customers.
SQL
select distinct r.ecid, r.city, r.countrycode, r.lat as latitude, r.lon as longitude, r.contact_callcenter_after_seconds as seconds_to_contact_callcenter, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callFeeling, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callTopic, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callContractCancelled, l.--aepTenantId--.loyaltyDetails.level, l.--aepTenantId--.identification.core.loyaltyIdfrom (select--aepTenantId--.identification.core.ecid ecid, placeContext.geo._schema.latitude lat, placeContext.geo._schema.longitude lon, placeContext.geo.city, placeContext.geo.countryCode, web.webPageDetails.nameas webPage, TIME_BETWEEN_NEXT_MATCH(timestamp, web.webPageDetails.name='Call Start', 'seconds') OVER(PARTITION BY --aepTenantId--.identification.core.ecid ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)AS contact_callcenter_after_secondsfrom demo_system_event_dataset_for_website_global_v1_1where--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and web.webPageDetails.namein ('Cancel Service', 'Call Start') ) r , demo_system_event_dataset_for_call_center_global_v1_1 c , demo_system_profile_dataset_for_loyalty_global_v1_1 lwhere r.ecid = c.--aepTenantId--.identification.core.ecidand r.webPage ='Cancel Service'and l.--aepTenantId--.identification.core.ecid = r.ecidand c.--aepTenantId--.interactionDetails.core.callCenterAgent.callTopic IN ('contract', 'invoice','complaint','wifi','promo')limit15;
Copy the statement above and execute it in your PSQL command-line interface.
In the queries above we only looked at the visitors that ended up contacting the call center in case of service cancellation. We want to take this a bit broader and take into account all call center interaction including (wifi, promo, invoice, complaint and contract).
You will need to edit a query, so let's first open notepad or brackets.
On Windows click "search"-icon (1) in the windows toolbar, type notepad in the "search"-field (2), click (3) the "notepad" result:
On Mac
Copy the following statement to notepad/brackets:
select/* enter your name */ e.--aepTenantId--.identification.core.ecid as ecid, e.placeContext.geo.city as city, e.placeContext.geo._schema.latitude latitude, e.placeContext.geo._schema.longitude longitude, e.placeContext.geo.countryCode as countrycode, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callFeeling as callFeeling, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callTopic as callTopic, c.--aepTenantId--.interactionDetails.core.callCenterAgent.callContractCancelled as contractCancelled, l.--aepTenantId--.loyaltyDetails.level as loyaltystatus, l.--aepTenantId--.loyaltyDetails.points as loyaltypoints, l.--aepTenantId--.identification.core.loyaltyId as crmidfrom demo_system_event_dataset_for_website_global_v1_1 e ,demo_system_event_dataset_for_call_center_global_v1_1 c ,demo_system_profile_dataset_for_loyalty_global_v1_1 lwhere e.--aepTenantId--.demoEnvironment.brandName IN ('Luma Telco', 'Citi Signal')and e.web.webPageDetails.name in ('Cancel Service', 'Call Start')and e.--aepTenantId--.identification.core.ecid = c.--aepTenantId--.identification.core.ecidand l.--aepTenantId--.identification.core.ecid = e.--aepTenantId--.identification.core.ecid;
And replace
enter your name
Do not remove /\* and \*/. Your modified statement in notepad should look like:
Copy your modified statement from notepad into the PSQL command line window and hit enter. You should see the following result in the PSQL command line window:
In the next exercise, you will persist your query (also known as create table as select or CTAS) as a new dataset that you will use in Microsoft Power BI.