UserGuide Sensors Analytics & Personas

Documents

Product Categories

Sensors Analytics & Personas

Oct 30, 2025

=1.14 Grouping user_group_${user_group_name} user_group_abc >=1.14 Labeling user_tag_${user_tag_name} user_tag_abc The description of the specific fields in the table is as follows: Field Description ⽰例 user_id User id -9220214159525537212Field Description ⽰例 distinct_id Associated with distinct_id in the event table 3f840957485db9a9 values User Grouping/Tag Value 1 base_time Base time for user grouping/labeling calculation, new since 1.14 and later 1547015611000 Among them, base_time is stored in milliseconds, so when querying, the user can convert the date into milliseconds by using the unix_timestamp_ms function, for example, as follows. SELECT * FROM user_group_fenqun9 WHERE base_time=unix_timestamp_ms(''2019-01-17 00:00:00'') Copy SQL Items Table Field Name Description ⽰例 $item_type Type of item table apple $item_id Indicates the id of the item 123 $is_valid Whether the item is valid or not, default is true if not passed in 1 $receive_time The arrival time of the item 1575604527772 $update_time The update time of the item, default is write time if not passed 1575604527772 Data Type For the consideration of query efficiency, the self-defined query function has different processes for different data types, while certain data types have some restrictions on their uses, as explained below. Number Numeric type, does not distinguish between floating point numbers and integers, the output format will be automatically converted according to whether there are decimal places or not. StringString type. Date Note: The time field is unique and can be used directly without conversion. Date type, in the self-defined query, is expressed as milliseconds Timestamp, for example, 1442937600000. If necessary, you can use the EPOCH_TO_TIMESTAMP function to convert it to the Timestamp type, for example: SELECT EPOCH_TO_TIMESTAMP($signup_time / 1000) FROM users LIMIT 100; Copy SQL An example for the use of conditional filtering is as follows. SELECT COUNT(*) AS cnt FROM users WHERE EPOCH_TO_TIMESTAMP($signup_time / 1000) > ''2017-01-01''; Copy SQL Datetime The date and time type, like the Date type, also uses the millisecond timestamp, for example: 1442592138000. The EPOCH_TO_TIMESTAMP type can also be used for type conversion. Bool Boolean type, using 0/1 for False/True. List List type, which supports filtering operations using the CONTAINS function or LIKE function in the Where condition. For example: SELECT FavoriteFruits from users where CONTAINS(''橘子'', FavoriteFruits); Copy SQL The /*EXPLODE_LIST_COLUMN=${table.columnName}*/ annotation can also be used to break List type data into multiple rows of string type data. For example:SELECT list_property FROM events /*EXPLODE_LIST_COLUMN=events.list_property*/ Copy SQL Function Use Basic Functions SQL Query Enter the SQL to be queried in the edit area. SQL keyword association and formatting are supported. Check the Progress of Your InquirySupport Viewing SQL Query Progress. Query Results 1、Support data analysis by using the visualization tools in the query results. 2、The query results support downloading Excel and Csv files. For the consideration of performance, the maximum result displayed by the front-end is only 1k, while the maximum result downloaded by CSV is 100w. If you need to download more data, please use the Query API. Save SQL1、Supports saving the SQL for the current query, customizing the SQL name, and adding comments. 2、Double-click the SQL statement in the saved query list to view and run the SQL statement. View History Support to view history SQL query logs and support to search for history logs. Add to Reports Support for the current SQL query results, quick save to the report. Date FilteringThe date field indicates the date when the event occurred, precisely to the day, and can be used to quickly filter the data. Note in particular that the date field should always be used for filtering, not the time field, whenever possible. Due to the special qualities of the date field, there are some restrictions on the support of SQL operations and functions: • CURRENT_DATE() function, which returns the day, e.g. 2016-08-23. • CURRENT_WEEK() function, which returns the Monday of the week, e.g. 2016-08-22. • CURRENT_MONTH() function, which returns the first of the month, e.g. 2016-08-01. • INTERVAL expressions，such as CURRENT_DATE() - INTERVAL ''1'' DAY represents yesterday. The following are some specific examples: • Precise filtering of data for a particular day SELECT COUNT(*) FROM events WHERE date = ''2016-01-01'' Copy SQL • Search the data of the day SELECT COUNT(*) FROM events WHERE date = CURRENT_DATE() Copy SQL • Search the data of the last 3 days SELECT COUNT(*) FROM events WHERE date BETWEEN CURRENT_DATE() - INTERVAL ''2'' DAY AND CURRENT_DATE() Copy SQL • Search data of the last natural month SELECT COUNT(*) FROM events WHERE date BETWEEN CURRENT_MONTH() - INTERVAL ''1'' MONTH AND CURRENT_MONTH() - INTERVAL ''1'' DAY Copy SQL Since date is a special field designed for quick data filtering, most time functions are not supported. Therefore, if you wish to use other time functions, please use the time field instead, for instance:SELECT datediff(now(), trunc(time, ''DD'')), COUNT(*) FROM events WHERE date >= CURRENT_DATE() - INTERVAL ''100'' day GROUP BY 1 Copy SQL • Number of events after 2018-09-01 aggregated by month SELECT date_sub(date,dayofmonth(date)-1) the_month,count(*) event_qty FROM events WHERE date>''2018-09-01'' GROUP BY the_month ORDER BY the_month; Copy SQL • Number of events after 2018-09-01 aggregated by week SELECT date_sub(date,mod(dayofweek(date)+5,7)) the_week,count(*) event_qty FROM events WHERE date>''2018-09-01'' GROUP BY the_week ORDER BY the_week; Copy SQL Explanation of Commonly Used Functions The following functions are often used with self-defined queries. • Time and date functions • String Functions • Mathematical Functions For more Impala functions, please refer to: Impala Function Reference Documentation Time and Date Functions The fields related to the time and date function in the self-defined query are divided into the following three types: 1. The time field in the events table Time is a millisecond Timestamp type, which can directly use all the time date functions.2. The date field in the events table date is a day-level type of Timestamp, so it is more efficient to use this field if you do not need the hour, minute, and second information. date is also an index field, so you should try to use this field for date range filtering, for details please refer to "Date Filtering". Note: Before version 1.10, the date field does not support the use of self-defined functions, you can use time instead. 3. Other self-defined Date/Datetime type attributes These attributes are represented as millisecond Unix timestamps in self-defined queries, and need to be converted to Timestamp using the EPOCH_TO_TIMESTAMP function, as described in "Data Types". adddate(timestamp startdate, int days), adddate(timestamp startdate, bigint days) Usage: Add a given number of days to a TIMESTAMP value Parameter: • startdate: The starting timestamp of the timestamp type. • days: The number of days to be added, a positive number means a few days later, a negative number means a few days earlier. Return Value: The timestamp after adding the number of days, timestamp type datediff(timestamp enddate, timestamp startdate) Usage: Returns the number of days between two timestamps, for example: Parameter: • enddate: End time • startdate: Start timeReturn Value: End time minus start time in days, int type. Returns a positive number if the date of the first parameter time is later than the date of the second parameter time; conversely, returns a negative number if the date of the first parameter time is earlier than the date of the second parameter time. extract(unit FROM timestamp), extract(timestamp, string unit) Usage: Intercept numeric time fields from TIMESTAMP values, such as year, month, date, hour, minute, second/microsecond Parameter: • The values that can be selected for the unit string are: year, month, day, hour, minute, second, and millisecond. Return Value: The integer value of the time field Example: Number of all paid orders so far by year and month SELECT extract(Year from time) AS Year, extract(Month from time) AS Month, COUNT(*) FROM events WHERE event = ''payOrder'' GROUP BY Year, Month ORDER BY Year, Month Copy SQL trunc(timestamp, string unit) Usage: Intercept the time field from the given timestamp Parameter: • unit：Time Unit • SYYYY, YYYY, YEAR, SYEAR, YYY, YY, Y：Year • Q：Quarter • MONTH, MON, MM, RM: Month • WW, W: Date of the first day of the corresponding week• DDD, DD, J: Date • DAY, DY, D: Date of the first day of the corresponding week • HH, HH12, HH24: Hour • MI: Minute Return Value: Intercept the date after the time field Example: Number of events that occurred each day in the last 100 days and the number of days between the event and the current date SELECT datediff(now(), trunc(time, ''DD'')), COUNT(*) FROM events WHERE date >= CURRENT_DATE() - INTERVAL ''100'' day GROUP BY 1 Copy SQL String Functions concat(string a, string b…) Usage: Concatenate all string type parameters into one string type Parameter: • string (unlimited number): the string to be concatenated Return Value: An as-a-whole string For example: search for the address of users born after 2000, the address is the province and region splicing SELECT concat($province, $city) As Address FROM users WHERE yearofbirth > 2000 Copy SQL regexp_like(string source, string pattern[, string options])Usage: Determine if the source string contains a regular expression in pattern. Parameter: • source：String to be checked • pattern：Regular expressions • option (optional)：Option • c: Case sensitive • i: Non-case sensitive • m: Matching multiple lines, the ^ and $ operators will match for each line, rather than the beginning and end of the entire line for multiple lines. • n: New line match, dot (.) operator will match the new line. Repeat operators such as . can match multiple lines in the source string (several lines can be skipped by using . ) Return Value: Match or not, Boolean type For example, the number of users using QQ Mail for email SELECT COUNT(*) FROM users WHERE regexp_like(email, ''@qq.com$'') Copy SQL parse_url(string urlString, string partToExtract [, string keyToExtract]) Usage: Returns the intercept value by specifying a particular part of the URL Parameter: • urlString：URL • partToExtract: The part to be intercepted. The values that can be specified are ''PROTOCOL'', ''HOST'', ''PATH'', ''REF'', ''AUTHORITY'', ''FILE'', ''USERINFO'', ''QUERY'' • PROTOCOL: Protocol, such as HTTP, HTTPS, FTP, etc. • HOST：Host name • PATH: Path • REF: Anchor (“Also known as reference”), That is, the string after # in the URL • AUTHORITY：Authorization• FILE：File name • USERINFO: User Information • QUERY: Query parameters, namely the string following the URL • keyToExtract (Selectable): When partToExtract is ''QUERY'', you can specify the key in the query key-value pair and get the value of the specified parameter. Return Value: The intercept value of the specified part of the URL For example: the distribution of visits by path of the day in page view events SELECT parse_url(url, ''PATH''), COUNT(*) FROM events WHERE date = CURRENT_DATE() AND event = ''$pageview'' GROUP BY 1 Copy SQL Mathematical Functions Mathematical functions are used for some numerical operations. In particular, use the pow() function instead of the exponentiation operator ''**'' when doing depowering operations. pow(double a, double p), power(double a, double p), dpow(double a, double p), fpow(double a, double p) Usage: Exponentiation, for example: Parameter: • a：Base • b：Exponent Return Value: b power of a For example: search for the number of users whose total principal and interest amount exceeds 100,000 after expiration of financial products. SELECT count(distinct(user_id)) FROM events WHERE event = ''buyProduct'' AND (capital + capital * pow(rateofinterest,duration)) > 100000 CopySQL round(double a), round(double a, int d), round(decimal a, int_type d), dround(double a), dround(double a, int d) Usage: Returns a rounded value, for example: Parameter: • a：Value to be rounded • d (Selectable): Number of decimal places to keep, or to the integer part if this parameter is not available Return Value: Rounded values For example, check the number of users whose financial products have a yield of more than 0.45 percentage points. SELECT count(distinct(user_id)) FROM events WHERE event = ''buyProduct'' AND round((income/capital),4) * 100 > 0.45 Copy SQL truncate(double_or_decimal a[, digits_to_leave]), dtrunc(double_or_decimal a[, digits_to_leave]) Usage: The value with the decimal part removed, for example: Parameter: • a：The intercepted value • digits_to_leave (Selectable) : Number of decimal digits to be retained, or to the integer part if this parameter is not available. Return Value: The intercepted value Advanced Options• Enable the Quick Distinct algorithm, which can greatly speed up the calculation like COUNT(DISTINCT user_id) and support multiple COUNT(DISTINCT) expressions (1.17+ version, without adding this comment, can also support multiple COUNT(DISTINCT), but 1.16 and earlier versions, you must add this comment) to support multiple COUNT(DISTINCT)), the disadvantage is that you will get results that are not completely precise. For instance: SELECT COUNT(DISTINCT user_id) FROM events WHERE date = CURRENT_DATE() /*ENABLE_APPROX_DISTINCT*/ Copy SQL • Enable dimensional dictionary mapping and dimensional table association, and turn off by default. For example: SELECT $model FROM events WHERE date = CURRENT_DATE() /*ENABLE_DIMENSION_DICT_MAPPING*/ Copy SQL • If the SQL is querying the data of a specified distinct_id, you can use this option to query the query. For example: SELECT event, time FROM events WHERE date = CURRENT_DATE() AND distinct_id=''abcdef'' /*DISTINCT_ID_FILTER=abcdef*/ Copy SQL • SQL will be killed by default after 10 minutes of execution, if you want to increase the timeout, you can use the following method: SELECT * FROM events WHERE date = CURRENT_DATE() LIMIT 1000 /*MAX_QUERY_EXECUTION_TIME=1800*/ Copy SQL • For JOIN type queries, you can use Join Hint to specify the join execution mode, either SHUFFLE or BROADCAST, especially if you encounter out-of-memory errors during execution, you can consider forcing the SHUFFLE mode to be assigned: SELECT COUNT(*) AS cnt FROM events JOIN /* +SHUFFLE */ users ON events.user_id = users.idWHERE date = CURRENT_DATE() Copy SQL Common Cases Search for the specific behavior of a user on a certain day based on the distinct_id of the user Use distinct_id to search directly: SELECT * FROM events WHERE distinct_id = ''wahaha'' AND date = ''2015-09-10'' LIMIT 100 Copy SQL Search for the number of users who placed orders between 10am and 11am each day Use the standard SQL date function EXTRACT to retrieve the hourly information. SELECT date, COUNT(*) FROM events WHERE EXTRACT(HOUR FROM time) IN (10, 11) AND event = ''SubmitOrder'' GROUP BY 1 Copy SQL Search the distribution of the number of orders placed by users over a period of time The number of orders placed by each user is first calculated and then grouped using the CASE...WHEN syntax. SELECT CASE WHEN c < 10 THEN ''<10'' WHEN c < 20 THEN ''<20'' WHEN c < 100 THEN ''<100'' ELSE ''>100''END, COUNT(*) FROM ( SELECT user_id, COUNT(*) AS c FROM events WHERE date BETWEEN ''2015-09-01'' AND ''2015-09-20'' AND event = ''SubmitOrder'' GROUP BY 1 )a GROUP BY 1 Copy SQL Search for the number of users who did behavior A but not behavior B Use LEFT OUTER JOIN to calculate the difference set. SELECT a.user_id FROM ( SELECT DISTINCT user_id FROM events WHERE date=''2015-10-1'' AND event = ''BuyGold'' ) a LEFT OUTER JOIN ( SELECT DISTINCT user_id FROM events WHERE date=''2015-10-1'' AND event = ''SaleGold'' ) b ON a.user_id = b.user_id WHERE b.user_id IS NULL Copy SQL Calculate the User’s Duration of Use Using the analysis function, the cumulative usage time is estimated based on the interval between two adjacent events for each user, and is not counted if the interval between two uses exceeds 10 minutes. SELECT user_id, SUM( CASE WHEN end_time - begin_time < 600 THEN end_time - begin_time ELSE 0 END ) FROM ( SELECT user_id,EXTRACT(EPOCH FROM time) AS end_time, LAG(EXTRACT(EPOCH FROM time), 1, NULL) OVER (PARTITION BY user_id ORDER BY time asc) AS begin_time FROM events WHERE date=''2015-5-1'' ) a GROUP BY 1 Copy SQL Obtain the User’s First Behavioral Attributes Use the first_time_value(time, other attributes) aggregation function to get the relevant attributes when an action first occurs -- Example 1: Get the URL of the page where the user was when the first page view occurred SELECT user_id, first_time_value(time, $url) FROM events WHERE event = ''$pageview'' GROUP BY user_id -- Example 2: Get the amount of money spent on the first purchase by the user SELECT user_id, first_time_value(time, order_amount) first_order_amount FROM events WHERE event = ''payOrder'' GROUP BY user_id Copy SQL Version Change The query is switched to impala mode directly. Impala version is upgraded to 4.8. Currently there are a few incompatible syntax changes. • date has been changed to datetime type, which is no longer supported as number type for number type calculation, e.g. + - * /. • When using events.date field to join two tables, the query performance will be affected to different degrees depending on the number of data joined. • The alias substitution logic in GROUP BY, HAVING, ORDER BY is more consistent with standard SQL behavior, which means that aliases are only valid in top-level expressions, but not in sub-expressions. For instance: /* Syntax Supported */ SELECT NOT bool_col AS nb FROM tGROUP BY nb HAVING nb; /* Syntax No Longer Supported */ SELECT int_col / 2 AS x FROM t GROUP BY x HAVING x > 3; Copy SQL • A new series of reserved fields have been added that cannot be used directly as identifiers. If they need to be used as identifiers, they must be enclosed in backquotes, for example: /* Syntax Supported */ SELECT `position` FROM events /* Syntax No Longer Supported */ SELECT position FROM events Copy SQL • New reserved fields include: allocate、any、api_version、are、array_agg、array_max_cardinality、asensitive、asymmetric、at、atomic、authorization、begin_frame、begin_partition、blob、block_size、both、called、 cardinality、cascaded、character、clob、close_fn、collate、collect、commit、condition、connect、constraint、contains、convert、copy、corr、corresponding、covar_pop、covar_samp、cube、 current_date、current_default_transform_group、current_path、current_role、current_row、current_schema、current_time、current_transform_group_for_type、cursor、cycle、deallocate、dec、 decfloat、declare、define、deref、deterministic、disconnect、dynamic、each、element、empty、end-exec、end_frame、end_partition、equals、escape、every、except、exec、execute、fetch、 filter、finalize_fn、foreign、frame_row、free、fusion、get、global、grouping、groups、hold、indicator、init_fn、initial、inout、insensitive、intersect、intersection、json_array、json_arrayagg、 jso、n_exists、json_object、json_objectagg、json_query、json_table、json_table_primitive、json_value、large、lateral、leading、like_regex、listagg、local、localtimestamp、log10、match、 match_number、match_recognize、matches、merge、merge_fn、method、modifies、multiset、national、natural、nchar、nclob、no、none、normalize、nth_value、nth_value、occurrences_regex、 octet_length、of、off、omit、one、only、out、overlaps、overlay、pattern、per、percent、percentile_cont、percentile_disc、portion、position、position_regex、precedes、prepare、prepare_fn、 procedure、ptf、reads、recursive、ref、references、regr_avgx、regr_avgy、regr_count、regr_intercept、regr_r2、regr_slope、regr_sxx、regr_sxy、regr_syy、release、rollback、rollup、running、 savepoint、scope、scroll、search、seek、serialize_fn、similar、skip、some、specific、specifictype、sqlexception、sqlexception、sqlwarning、static、straight_join、submultiset、subset、 substring_regex、succeeds、symmetric、system_time、system_user、timezone_hour、timezone_minute、trailing、translate_regex、translation、treat、trigger、trim_array、uescape、unique、unnest、 update_fn、value_of、varbinary、varying、versioning、whenever、width_bucket、window、within、withoutImpala Reserved Field Reference FileDistribution Analysis 1. Explanation Video Explanation Video on Distribution Analysis 2. Distribution Analysis Overview In version 1.6, "Return Analysis" was upgraded to "Distribution Analysis", which not only tells you how much users rely on your products but also presents you the distribution of users for a certain event indicator. For example, you can examine the distribution of users with orders below $100, between $100 and $200, and above $200. Specify a user behavior event and then select metrics for the event. Distribution analysis can help reveal the following: • Did the number of times a user used the product per day increase before and after the notification policy was adjusted? • Do users make repeat purchases after their first purchase? • Assuming that users who use a key feature more than 3 times a day are considered core users, what is the trend in the composition of core users? View a Sample Application of the Distribution Analysis Function 3. Distribution Analysis by Time3.1. Select User Behavior Select an event as the behavior to be examined. For example, select "applet sharing." 3.2. Select Metrics Here you can select the indicator for distribution analysis, choose "hours" or "days", and set a customized grouping interval. We use "hours" for the following example. 3.3. Select Time Unit Select the statistical time unit in which the behavior occurred (choose whether to analyze by time period or number of times at Figure 3.3). Options Available: • Within a day： The distribution of user behavior during the 24-hour period 0:00-23:59. • Within a week： The distribution of user behavior from Monday to Sunday. • Within a month： The distribution of user behavior from the first day to the last day of the month.3.4. Set Filter Conditions for the Selected Events For example, view only the behavior distribution for the iOS version. 3.5. Set User Filtering Conditions Filter the appropriate objects of analysis for user attributes. For example, view only the behavior distribution of female users. 3.6. Behavior Distribution Table The first column is the time period selected by the user for the query, divided by the time unit in which the event occurred (days/weeks/months). The second column is the total number of users whose behavior occurred in the corresponding time range. The subsequent columns, respectively, are the number and percentage of users with the corresponding frequency of occurrence of the behavior. Hover your mouse over the cell to see the detailed meaning of each cell number represents. Click on the number to see the details of those users and the detailed sequence of behaviors of individual users within them.3.7. Customize Grouping Intervals If the property selected here is a number type, you can customize the grouping interval. If not set, the query engine will dynamically calculate the grouping interval. This setting takes effect only for the current query, and also in the bookmark after saving the query as a bookmark. When the mouse hovers over a cell in the table, a download link will appear and can be clicked to download the user details. 3.8. Set Up Simultaneous Display of Indicators In the distribution analysis, you can add indicators that are displayed at the same time, and also calculate the performance of the indicators for the corresponding user subgroups. For example, while viewing the distribution of the number of times spent, you can also monitor the amount spent per capita. 4. Distribution Analysis by Number of Times4.1. Select “Number of Times” The selection count is based on the actual number of occurrences, without de-duplication. Customized grouping intervals can also be set. 4.2. Behavior Distribution Table The screenshot above shows that 4542 users have done ''applet sharing'' on 08-03(Wed), and 4533 of them have done it 1 time ~ 3 times (excluding 3 times). 5. Distribution Analysis by Event MetricsFor the indicator "Total amount of orders", 1004 people submitted orders on 09-13, and 3 people paid between 10 RMB and 30 RMB (not including 30RMB ). 6. How Distribution Analysis Is Calculated There are three statistical methods for distribution analysis, analysis by time period, analysis by number of times and statistical indicators of event attributes. The intervals can be customized when selecting the metrics by number and by event. 6.1. Analysis by Time Period Calculates the number of natural time periods (hours/days) in a day/week/month when a user performs an action. Let''s take the analysis of "hours" of "clicks on ads" in a day as an example. If a user clicks on an ad 3 times between 15:00 and 16:00 and 1 time between 17:00 and 18:00, then it is counted as "at least 2 times a day". If a user clicks on an ad once or more times within a certain hour, it is counted as 1 time. For checking the number of days in a week/month that a user has triggered a certain event, we count the number of days in a week/month that a user has triggered a certain event.6.2. Analysis by Number of Times Counts the number of times a user performs a certain action in a day/week/month, and records it once if it happens. 6.3. Analysis by Statistical Indicators of Event Attributes This is the value of a statistical indicator for an attribute of an event that occurs in a day/week/month of a user. The statistical metrics for the attributes are consistent with event analysis, with sum, mean, maximum, minimum, and de-duplicated numbers. 7. FAQ 7.1. How Is Distribution Analysis Superior to Looking at Daily Activity Users (DAU) Alone? DAU only tells you the change in the number of users, while distribution analysis reveals the change in the dependency of individual users on the product. For example, if the number of active users for a product does not increase significantly between March and April, but the frequency of key user behaviors (such as placing an order or posting content) increases substantially, the value of the product to the user has increased. Conversely, if the number of daily active users is growing rapidly, but the frequency of the behavior performed is at a lower level than before, it is likely that the new active users are not really seeing the value of the product.">

To view the full page, please visit: Sensors Analytics & Personas Product Userguide

Sensors Analytics & Personas

The product can merge multi-source data, identify unique users and help enterprises build a labeling and portrait system to enable businesses to achieve refined user operations and precise marketing.

Buy now