Documentation
1. Installation
Download the installer for your operating system from Gumroad. On macOS, open the .dmg file and drag Scarab into your Applications folder. On Windows, run the .exe installer. On Linux, run the .AppImage directly.
2. Loading Data
Start your session by providing a dataset to Scarab. Click the "load" button or drag and drop a .csv or Excel file into the app. Scarab runs completely locally, instantly reading the file into memory without any telemetry or remote calls.
3. ScarabQL Overview
ScarabQL is an intuitive, plain-text query language designed specifically for lightning-fast exploratory data analysis. The general structure of a query is:
[action] [target] [where filters...] [|> pipe operations...]
Basic Exploration
peek [n]| Preview the top n rows of the dataset. Example:peek 10.missing| Display a report of missing (null) values in every column, sorted by severity.describe *| Generate standard statistical summaries (mean, std, quartiles, etc.) for all numeric columns.describe [target]| Generate a deep summary profile for a single column.find [stat] [target] [by col]| Calculate stats like base functions (mean,median,sum,std,var). Example:find mean income by department.
Advanced Analytics & Machine Learning
predict [target] from [col1, col2...]| Automatically fit a regression model predicting the target from the provided inputs.explain [target]| Automatically discover and rank the top leading predictors/drivers of a target column.outliers [target]| Detect anomalies in a column using robust statistical thresholds (IQR fences and Z-scores).compare [target] [where..] vs [where..]| Perform group-based tests (e.g. A/B testing) between custom data subsets.correlate [col] with [col]| Evaluate Pearson correlation and dependencies between distinct columns.correlate *| Generates a massive pairwise correlation matrix for all numeric columns.cluster by [col1, col2...] [into k groups]| Perform unsupervised K-Means clustering across specified features.segment [target] into [n] groups| Auto-segment numerical data using quartile grouping or equal-width grouping.trend [target] over [date_col]| Groups the target metric chronologically by a date/time column to visualize trend series.rank [target] [top|bottom n]| Determine highest or lowest items sorted across an index.
Visualizations
histogram [target]| Generates a histogram distribution chart of the target variable.scatter [x] vs [y]| Renders a scatterplot mapping the relationship between two specific features.crosstab [col1] by [col2]| Produces a frequency cross-tabulation table to map category intersections.
Filtering Expressions
You can chain a where clause after any primary action to filter the dataset before generating computations. ScarabQL securely handles operators like =, !=, >, <, >=, <=, in, is null, and is not null.
Example: find mean salary where role = "Engineer" and age > 30
Data Transformations and Pipelines (|>)
You can push the output of your operations iteratively into various statistical checkers and pipeline algorithms via the pipe operator (|>).
|> test significance| Perform automatic hypothesis testing (e.g., student t-tests) against grouped comparison queries.|> test normality| Evaluate arrays against normality assumption tests.|> test equal variance| Validates homogeneity of variances across split groups.|> transform [log|sqrt|square|normalize|minmax|winsorize|boxcox]| Immediately force transformation models on aggregated arrays.|> smooth [n]| Apply rolling-window signal smoothing over trend datasets using a window of size n.
Code Generation: Scarab automatically generates completely portable, drop-in Python and R code for every single engine operation executed above. Instead of wrestling with dataframes locally, simply query with ScarabQL and export your findings securely formatted straight into production scripts!