Easy Answers - Insights Page
Breadcrumbs

Quick Insight Models - Overview

Intended audience: end-users data science developers

AO Easy Answers: 4.4

October 2025

Overview

The Quick Insights library contains 60+ variations of Machine Learning Models, divided into the following 8 groups.

Statistical Hypothesis Testing

Generalized Correlations

Generalized Regressions

Causal Inference Analysis

image-20251024-120330.png
image-20251024-120352.png
image-20251024-122026.png
image-20251024-120551.png

Time-to-Event Analysis

Geospatial Correlation Analysis

Univariate Time Series Modeling

Multivariate Time Series Modeling

image-20251024-120420.png
image-20251024-120608.png
image-20251024-120514.png
image-20251024-121735.png

Within each group, several variations exist. For example, Causal Analysis supports binary, categorical, and continuous treatments and binary and continuous outcomes.


Details for Each Quick Insight Group

Statistical Hypothesis Testing - 13 Variations

  • Normal Likelihood should be used when:

    • The data is roughly normally distributed and has light tails (i.e., not too many extreme values).

    • You have a large sample size (usually greater than 30).

  • Poisson Likelihood should be used when:

    • The data represents count outcomes (e.g., number of incidents, number of occurrences) that are non-negative integers

    • The mean and variance of the data are approximately equal

  • Student-T Likelihood should be used when:

    • The data is roughly normally distributed but has heavier tails (i.e., more extreme values or outliers).

    • You have a small sample size (less than 30), where the normal distribution might not be a good approximation.

Model Name

Description

Data Type Requirement

Example

One Sample T Test - Normal Likelihood

Tests if a sample mean differs from a known value.

Continuous numeric data.

Testing if the average salary differs from $50,000.

Two Sample T Test - Normal Likelihood

Tests if two sample means are significantly different.

Continuous numeric data.

Comparing average test scores between two schools.

Multi Sample T Test - Normal Likelihood

Tests if means across multiple groups are significantly different.

Continuous numeric data.

Comparing average heights across three different age groups.

One Sample T Test - Poisson Likelihood

Tests if a sample mean count differs from a known value.

Discrete count data.

Testing if the average number of calls per day differs from 10.

Two Sample T Test - Poisson Likelihood

Tests if two sample mean counts are significantly different.

Discrete count data.

Comparing the number of accidents in two regions.

Multi Sample T Test - Poisson Likelihood

Tests if mean counts differ across multiple groups.

Discrete count data.

Comparing accident rates across different districts.

One Sample T Test - Student-T Likelihood

Tests if a sample mean differs from a known value with heavy-tailed data.

Continuous numeric data.

Testing if the mean income differs from $40,000 in a small sample.

Two Sample T Test - Student-T Likelihood

Tests if two sample means are significantly different with heavy-tailed data.

Continuous numeric data.

Comparing income between two small groups.

Multi Sample T Test - Student-T Likelihood

Tests if means across multiple groups differ with heavy-tailed data.

Continuous numeric data.

Comparing income across several cities with heavy variability.

One Sample Binomial Test

Tests if a sample proportion matches a known proportion.

Binary categorical data.

Testing if 60% of the population prefers coffee.

Two Sample Binomial Test

Tests if two sample proportions differ.

Binary categorical data.

Comparing the proportion of smokers in two cities.

Multi Sample Binomial Test

Tests if proportions across multiple groups differ.

Binary categorical data.

Comparing the proportion of defective products across different factories.

Multi Sample Chi2 - Multinomial Test

Tests if proportions across multiple categories differ.

Categorical data with more than two categories.

Testing if the distribution of favorite colors differs across groups.

 

Generalized Correlations - 11 Variations

Model Name

Description

Data Type Requirement

Example

Bayesian Correlation Test (Binary – Binary)

Measures the degree of association between two binary variables (special case of Pearson’s correlation).

Both variables must be binary (0/1).

Relationship between gender (male/female) and whether someone owns a car (yes/no).

Bayesian Correlation Test (Binary – Nominal)

Quantifies the strength of association between a binary and a nominal variable using chi-square statistics.

One binary and one nominal variable.

Association between smoking status (smoker/non-smoker) and favorite beverage type (coffee/tea/soda).

Bayesian Correlation Test (Binary – Continuous)

Measures the correlation between a binary variable and a continuous variable. Equivalent to Pearson’s correlation when one variable is dichotomous.

One binary and one continuous numeric variable.

Correlation between gender (male/female) and salary.

Bayesian Correlation Test (Binary – Ordinal)

Measures the association between a binary and an ordinal variable by comparing rank distributions across the binary groups.

One binary variable and one ordinal variable.

Relationship between treatment type (control/treated) and pain severity (mild/moderate/severe).

Bayesian Correlation Test (Nominal – Nominal)

Measures the strength of association between two nominal variables using chi-square statistics.

Both variables must be nominal (categorical without order).

Association between hair color and favorite cuisine type.

Bayesian Correlation Test (Nominal – Continuous)

Assesses the strength of association between a nominal and a continuous variable, based on ANOVA concepts.

One nominal categorical and one continuous numeric variable.

Relationship between occupation type and annual income.

Bayesian Correlation Test (Ordinal – Continuous)

Estimates the correlation between an ordinal and a continuous variable, assuming the ordinal variable arises from a latent continuous distribution.

One ordinal and one continuous variable.

Association between education level (high school, college, graduate) and test score.

Bayesian Correlation Test (Ordinal – Ordinal)

Nonparametric measure assessing monotonic association between two ordinal variables.

Both variables must be ordinal or converted to ranks.

Correlation between satisfaction level (low, medium, high) and perceived quality (poor, fair, good, excellent).

Bayesian Correlation Test (Continuous – Continuous)

Computes the correlation between two continuous variables.

Continuous numeric data.

Examining the correlation between height and weight.

Bayesian Correlation Test (Nominal – Ordinal)

Uses Chi2-based association measure between a nominal and an ordinal variable.

One nominal and one ordinal variable.

Relationship between region (North, South, East, West) and education level (primary, secondary, tertiary).

Correlation matrix analysis

Computes the correlation matrix of a set of variables across various data types (binary, nominal, ordinal, and continuous). This can be used to assess variable is highly correlated with a variable of interest (such as a key performance metric).

Requires tabular data of a set of variables. For practical purposes, the number of variables is restricted to 10.

Understanding which of my personal and medical features are highly correlated to my heart health.

 

Generalized Regressions - 14 Variations

Note: There are two variations in Regressions based on the type of relationship between inputs and output: Linear models (GLM) and Non-linear models (Neural GLM). In GLM, the model parameters have interpretability. Interpretability of parameters in Neural GLM is hard. 

  • Poisson Regression

    • The response (output) variable represents counts of events occurring in a fixed time period or a fixed area

    • The mean and variance of counts are approximately equal

    • Example: Number of accidents, Number of visits

  • Beta Regression

    • The response variable is continuous and is between 0 and 1

    • Example: Proportions, Rates

  • Binomial Regression

    • The response variable represents the number of successful outcomes (or outcomes of interest to us) out of a given number of attempts or trials

    • Example: Number of defective items in a production batch

  • Gamma Regression

    • The response variable is continuous, strictly positive, and often right-skewed (heavy right tail)

    • Example: Waiting time

  • Negative Binomial Regression

    • The response variable represents count data (similar to Poisson regression)

    • In Poisson regression, the mean and variance are approximately equal

    • Here, the variance is more than the mean (often called overdispersion)

    • Example: Number of insurance claims (with several zeros, some members make more claims than others)

  • Bernoulli (Logistic) Regression

    • The response variable is binary

    • We estimate the probability of one outcome as a function of other input predictors

    • Example: Assess whether a customer will purchase a product

  • Normal (Linear) Regression

    • The response variable is continuous, and can take any value (positive or negative)

    • Example: Estimating house prices based on size and location

Model Name

Description

Data Type Requirement

Example

Poisson Regression (GLM)

Models count data where events occur at a constant rate.

Output Data Type: Discrete count data.

Estimating the number of events (e.g., accidents) at different locations, and modeling the number of customer complaints per day based on weather conditions.

Beta Regression (GLM)

Models proportions or rates that are bounded between 0 and 1.

Output Data Type: Continuous data between 0 and 1.

Estimating the proportion of land covered by vegetation in different regions, and modeling the win percentage in sports based on team statistics.

Binomial Regression (GLM)

Models success/failure counts out of a fixed number of trials.

Output Data Type: Count data representing successes out of n trials.

Modeling the number of students who pass an exam based on study hours.

Gamma Regression (GLM)

Models continuous data with a skewed distribution, often for time or rate data. Models continuous data that have increasing variance with larger values.

Output Data Type: Continuous positive data.

Estimating the time it takes for an event to occur across different locations, and modeling insurance claim amounts based on policyholder age.

Bernoulli (Logistic) Regression (GLM)

Models binary outcome data with a logistic distribution.

Output Data Type: Binary categorical data.

Estimating the likelihood of disease occurrence in different regions.

Normal (Linear) Regression (GLM)

Models continuous outcome data assuming a normal distribution.

Output Data Type: Continuous numeric data.

Estimating the average income across different spatial locations.

Negative Binomial Regression (GLM)

A regression model for count data with over-dispersion (variance exceeds mean).

Output Data Type: Discrete count data.

Modeling the number of customer complaints per week based on weather conditions and staffing levels, when complaints show high variability.

Poisson Regression (Neural GLM)

Models count data where events occur at a constant rate.

Output Data Type: Discrete count data.

Estimating the number of events (e.g., accidents) at different locations, and modeling the number of customer complaints per day based on weather conditions.

Beta Regression (Neural GLM)

Models proportions or rates that are bounded between 0 and 1.

Output Data Type: Continuous data between 0 and 1.

Estimating the proportion of land covered by vegetation in different regions, and modeling the win percentage in sports based on team statistics.

Binomial Regression (Neural GLM)

Models success/failure counts out of a fixed number of trials.

Output Data Type: Count data representing successes out of n trials.

Modeling the number of students who pass an exam based on study hours.

Gamma Regression (Neural GLM)

Models continuous data with a skewed distribution, often for time or rate data. Models continuous data that have increasing variance with larger values.

Output Data Type: Continuous positive data.

Estimating the time it takes for an event to occur across different locations, and modeling insurance claim amounts based on policyholder age.

Bernoulli (Logistic) Regression (Neural GLM)

Models binary outcome data with a logistic distribution.

Output Data Type: Binary categorical data.

Estimating the likelihood of disease occurrence in different regions.

Normal (Linear) Regression (Neural GLM)

Models continuous outcome data assuming a normal distribution.

Output Data Type: Continuous numeric data.

Estimating the average income across different spatial locations.

Negative Binomial Regression (Neural GLM)

A regression model for count data with over-dispersion (variance exceeds mean).

Output Data Type: Discrete count data.

Modeling the number of customer complaints per week based on weather conditions and staffing levels, when complaints show high variability.


Causal Inference Analysis - 6 Variations: 3 Each for Binary and Continuous Outcomes

Note: In Causal Analysis, we can have binary, categorical, or continuous treatments. Outcome can be binary or continuous. To avoid a lot of combinations, only treatment variations are shown below. 

Model Name

Description

Data Type Requirement

Example

Causal Inference with Binary Treatment

Analyzes the effect of a two-level treatment on an outcome while accounting for factors that influence both treatment assignment and outcome (control variables).

Treatment: Binary (0/1).

Outcome: Continuous or binary.

Control Variables: Any combination of continuous, categorical, or binary.

Estimating the effect of a new drug (treatment vs. control) on blood pressure, considering age, gender, and pre-existing conditions as confounders.

Causal Inference with Categorical Treatment

Examines the impact of a multi-level treatment on an outcome while controlling for variables that affect both treatment selection and outcome.

Treatment: Categorical (>2 levels).

Outcome: Continuous or binary.

Control Variables: Any combination of continuous, categorical, or binary.

Assessing the effect of different teaching methods (lecture, group work, online) on student test scores, accounting for factors like prior academic performance, socioeconomic status, and school resources.

Causal Inference with Continuous Treatment

Examines the impact of a continuous treatment on an outcome while controlling for variables that affect both treatment selection and outcome.

Treatment: Continuous.

Outcome: Continuous or binary.

Control Variables: Any combination of continuous, categorical, or binary.

Assessing the effect of exercise time on heart health, accounting for factors like age, education, prior medical conditions, and gender.


Time-to-Event Analysis - 4 Variations

Model Name

Description

Data Type Requirement

Example

Survival Model with Survival Curve

Analyzes time until an event occurs, showing the probability of survival over time.

Time-to-event data, event indicator (0/1).

Estimating patient survival rates after cancer diagnosis over a 5-year period.

Survival Model + Cohort/Date Effects

Examines survival patterns across different groups or time periods.

Time-to-event data, event indicator, cohort/date information.

Comparing the survival rates of patients diagnosed with a disease in different decades.

Survival Model + Covariates

Assesses how various factors influence survival time.

Time-to-event data, event indicator, relevant covariates (numeric/categorical).

Analyzing how age, gender, and treatment type affect survival time after heart surgery.

Survival Model + Cohort/Date Effects + Covariates

Combines cohort analysis with factor influence on survival time.

Time-to-event data, event indicator, cohort/date information, covariates.

Studying how survival rates for a disease change over decades, considering factors like age, lifestyle, and treatment advances.

 

Geospatial Correlation Analysis - 5 Variations

Model Name

Description

Data Type Requirement

Example

Geospatial Gaussian Process Normal

A model that predicts continuous outcomes at locations by capturing spatial patterns, assuming a normal distribution.

Response: Continuous data with approximately normal distribution.

Inputs: Geographic coordinates (latitude/longitude).

Additional Inputs (Optional): Any numeric or categorical, or binary.

Predicting temperature at different locations in a region, assuming a normal distribution.

Geospatial Gaussian Process Student-T

A model that predicts continuous outcomes at locations by capturing spatial patterns, robust to outliers and heavy-tailed distributions.

Response: Continuous data with potential outliers.

Inputs: Geographic coordinates (latitude/longitude).

Additional Inputs (Optional): Any numeric or categorical, or binary.

Predicting soil contamination levels across different locations in an industrial area, where some measurements might be extreme.

Geospatial Gaussian Process Poisson

A model that predicts counts at locations by capturing spatial patterns, suitable for event occurrence data.

Response: Count data with variance equal to the mean.

Inputs: Geographic coordinates (latitude/longitude).

Additional Inputs (Optional): Any numeric or categorical, or binary.

Predicting the number of traffic accidents at different intersections in a city.

Geospatial Gaussian Process Binomial

A model that predicts binary outcomes at locations by capturing spatial patterns, suitable for presence/absence data.

Response: Binary data (0/1).

Inputs: Geographic coordinates (latitude/longitude).

Additional Inputs (Optional): Any numeric or categorical.

Predicting the presence or absence of a specific plant species across different locations in a forest.

Geospatial Gaussian Process Negative Binomial

A model that predicts counts at locations by capturing spatial patterns, allowing for overdispersed count data (high variability).

Response: Count data with variance greater than the mean.

Inputs: Geographic coordinates (latitude/longitude).

Additional Inputs (Optional): Any numeric or categorical, or binary.

Predicting the number of disease cases across different locations in a city, where cases tend to cluster spatially and show high variability.

 

Univariate Time Series Modeling - 6 Variations

Model Name

Description

Data Type Requirement

Example

Univariate Time Series Model with Outlier Detection

Analyzes a single variable over time, identifying unusual data points, robust to extreme values.

Single time-ordered variable, timestamps, potential outliers.

Detecting abnormal spikes in daily stock prices.

Univariate Time Series Model with Change Point

Detects significant shifts in time series patterns, robust to extreme values.

Single time-ordered variable, timestamps, potential outliers.

Identifying when a company's sales pattern fundamentally changed, accounting for occasional extreme sales days.

Univariate Time Series Model with Seasonality/Special Days

Captures recurring patterns and specific event impacts in time series, robust to extreme values.

Single time-ordered variable, timestamps, special day indicators.

Analyzing retail sales considering both yearly seasons and holiday effects, accounting for occasional extreme shopping days.

Univariate Time Series Model with Trend

Identifies long-term directional movement in time series, robust to extreme values.

Single time-ordered variable, timestamps, potential outliers.

Analyzing global temperature changes over decades, accounting for occasional extreme weather events.

Univariate Time Series Model with Exogenous Variables

Predicts a single variable over time using external factors, robust to outliers, assuming a heavy-tailed distribution (Student-T distribution).

Time-ordered target variable, relevant external variables.

Forecasting energy consumption using temperature and economic indicators, accounting for extreme weather events.

Univariate Time Series Model Forecasting

Projects future values of a single variable based on its past behavior and patterns.

Time-ordered single variable data, sufficient historical data to capture patterns.

Predicting future stock prices based on historical price data.

 

Multivariate Time Series Modeling - 10 Variations

Model Name

Description

Data Type Requirement

Example

MV Time Series Model with Outlier Detection - Student-T Likelihood

A model that analyzes multiple related time series, identifies unusual data points, and handles heavy-tailed distributions.

Multiple related time series, potential outliers, and non-normal distribution.

Analyzing stock prices of multiple companies in the same industry, detecting market anomalies while accounting for extreme price movements.

MV Time Series Model with Correlations - Student-T Likelihood

A model that captures relationships between multiple time series while accommodating heavy-tailed distributions.

Multiple related time series, potential correlations, and non-normal distribution.

Studying the interdependencies of various economic indicators across different countries, allowing for extreme events.

MV Time Series Model with Change Point - Student-T Likelihood

A model that detects significant shifts in multiple related time series while handling heavy-tailed distributions.

Multiple related time series, potential structural changes, and non-normal distribution.

Identifying policy changes affecting multiple economic indicators simultaneously, accounting for extreme fluctuations.

MV Time Series Model with Seasonality/Special Days - Student-T Likelihood

A model that accounts for recurring patterns and specific events in multiple time series while handling heavy-tailed distributions.

Multiple related time series, seasonal patterns, special events, and non-normal distribution.

Analyzing retail sales across multiple product categories, considering holidays and seasonal trends, while allowing for extreme sales days.

MV Time Series Model with Global and Local Trend - Student-T Likelihood

A model that captures both overall and series-specific trends in multiple time series while accommodating heavy-tailed distributions.

Multiple related time series, global and local trends, non-normal distribution, minimum of 3 years of data.

Studying global and country-specific GDP growth trends, allowing for extreme economic events.

MV Time Series Model with Outlier Detection - Normal Likelihood

A model that analyzes multiple related time series and identifies unusual data points, assuming normal distribution.

Multiple related time series, potential outliers, and approximately normal distribution.

Monitoring sensor readings from multiple machines in a factory, detecting anomalies that may indicate equipment failure.

MV Time Series Model with Correlations - Normal Likelihood

A model that captures relationships between multiple time series, assuming a normal distribution.

Multiple related time series, potential correlations, and approximately normal distribution.

Analyzing the relationships between temperature, humidity, and energy consumption in multiple buildings.

MV Time Series Model with Change Point - Normal Likelihood

A model that detects significant shifts in multiple related time series, assuming a normal distribution.

Multiple related time series, potential structural changes, and approximately normal distribution.

Identifying changes in consumer behavior across multiple product categories following a major marketing campaign.

MV Time Series Model with Seasonality/Special Days - Normal Likelihood

A model that accounts for recurring patterns and specific events in multiple time series, assuming normal distribution.

Multiple related time series, seasonal patterns, special events, and approximately normal distribution.

Forecasting electricity demand for multiple regions, considering seasonal patterns and holidays.

 






Contact App Orchid | Disclaimer