ParQL
Command Line Parquet Query Tool

A powerful command-line tool for querying and manipulating Parquet datasets directly from the terminal. Bring pandas-like operations and SQL capabilities to your command line, powered by DuckDB.

25+ Commands
100% CLI Native
Possibilities
parql shell
$ parql head data/sales.parquet -n 5
user_idcountryrevenuetimestamp
1001US1250.502024-01-01
1002UK890.252024-01-01
1003CA2100.752024-01-01
$ parql agg data/sales.parquet -g country -a "sum(revenue):total"
countrytotal
US45,250.50
UK32,890.25
CA28,100.75

Why Choose ParQL?

Lightning Fast

Columnar processing with DuckDB engine. Only reads necessary columns and streams large datasets efficiently.

CLI Native

Built for the command line with beautiful terminal output, interactive shell, and ASCII visualizations.

Cloud Ready

Native support for S3, GCS, Azure, HDFS, and HTTP. Works with data anywhere in the cloud and distributed systems.

Rich Analytics

25+ commands covering data exploration, aggregation, visualization, and quality checks.

Memory Efficient

Smart caching, parallel execution, and memory limits. Handle datasets larger than RAM.

SQL Power

Full SQL support with DuckDB. Complex queries, window functions, and custom expressions.

Quick Start

1

Install ParQL

# Install from PyPI
pip install parql

# Or install from source
git clone https://github.com/abdulrafey38/parql.git
cd parql
pip install -e .
2

Explore Your Data

# Preview data
parql head data/sales.parquet -n 10

# Check schema
parql schema data/sales.parquet

# Data profiling
parql profile data/sales.parquet
3

Analyze and Visualize

# Aggregations
parql agg data/sales.parquet -g country -a "sum(revenue):total"

# Visualizations
parql plot data/sales.parquet -c revenue --chart-type hist

# Interactive mode
parql shell

# Remote data sources
parql head hdfs://localhost/tmp/data.parquet

Command Reference

Data Exploration

parql head Preview first N rows
parql tail Preview last N rows
parql schema Show column information
parql count Count rows
parql sample Random sampling
parql distinct Unique values

Analytics

parql agg Group and aggregate
parql window Window functions
parql pivot Pivot tables
parql corr Correlation analysis
parql percentiles Percentile statistics

Data Processing

parql select Filter and select
parql join Join datasets
parql sql Custom SQL queries
parql str String operations
parql pattern Pattern matching

Visualization

parql plot ASCII charts
parql profile Data profiling
parql outliers Outlier detection
parql nulls Null analysis
parql hist Histograms

Data Quality

parql assert Data validation
parql compare-schema Schema comparison
parql infer-types Type optimization

System

parql shell Interactive REPL
parql config Configuration
parql cache Cache management
parql write Export data

Need More Details?

Explore the complete command reference with detailed examples, options, and use cases for all 25+ ParQL commands.

Quick Command Reference

parql head data.parquet -n 10 Preview first 10 rows
parql agg data.parquet -g country -a "sum(revenue):total" Group by country, sum revenue
parql profile data.parquet Data quality report
parql plot data.parquet -c revenue --chart-type hist Create histogram
parql sql "SELECT * FROM t WHERE revenue > 1000" -p t=data.parquet Custom SQL query
parql join users.parquet sales.parquet --on "user_id" Join two datasets
parql shell Interactive mode
parql write data.parquet output.csv --format csv Export to CSV

Examples

Data Exploration Examples

Quick Overview

# Preview data
parql head data/sales.parquet -n 5

# Check schema
parql schema data/sales.parquet

# Count total rows
parql count data/sales.parquet

# Sample data for quick analysis
parql sample data/sales.parquet --rows 1000

Data Profiling

# Comprehensive profiling
parql profile data/sales.parquet --include-all

# Profile specific columns
parql profile data/users.parquet -c "age,country,plan"

# Check for nulls
parql nulls data/sales.parquet

Data Analysis Examples

Aggregations

# Basic grouping
parql agg data/sales.parquet -g "country" -a "sum(revenue):total"

# Multiple aggregations
parql agg data/sales.parquet -g "country,device" \
  -a "sum(revenue):total,avg(revenue):avg_rev,count():orders"

# With filtering
parql agg data/sales.parquet -g "user_id" \
  -a "sum(revenue):total" -h "total > 1000"

Window Functions

# Ranking within groups
parql window data/sales.parquet \
  --partition "country" \
  --order "revenue DESC" \
  --expr "row_number() as rank"

# Running totals
parql window data/sales.parquet \
  --partition "user_id" \
  --order "timestamp" \
  --expr "sum(revenue) over (rows unbounded preceding) as running_total"

Visualization Examples

ASCII Charts

# Histogram
parql plot data/sales.parquet -c revenue \
  --chart-type hist --bins 20 --width 60

# Bar chart
parql plot data/sales.parquet -c country \
  --chart-type bar --width 50 --limit 10

# Scatter plot
parql plot data/sales.parquet -c revenue \
  --chart-type scatter -x quantity --limit 100

Statistical Analysis

# Correlation matrix
parql corr data/sales.parquet -c "quantity,price,revenue"

# Percentiles
parql percentiles data/sales.parquet -c "revenue"

# Outlier detection
parql outliers data/sales.parquet -c revenue \
  --method zscore --threshold 3

Data Quality Examples

Validation

# Basic assertions
parql assert data/sales.parquet --rule "row_count > 1000"
parql assert data/sales.parquet --rule "no_nulls(user_id)"
parql assert data/sales.parquet --rule "unique(order_id)"

# Custom conditions
parql assert data/sales.parquet --rule "min(revenue) >= 0"
parql assert data/sales.parquet --rule "max(discount) <= 1.0"

Schema Management

# Compare schemas
parql compare-schema data/old.parquet data/new.parquet

# Type optimization
parql infer-types data/sales.parquet --suggest-types

# Schema validation
parql compare-schema data/expected.parquet \
  data/actual.parquet --fail-on-change

Remote Data Sources

ParQL supports reading from various remote data sources including cloud storage and distributed file systems.

AWS S3

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
parql head s3://bucket/path/data.parquet

Google Cloud Storage

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
parql agg gs://bucket/data/*.parquet -g country -a "sum(revenue):total"

# Public GCS Datasets
parql head gs://anonymous@voltrondata-labs-datasets/diamonds/cut=Good/part-0.parquet

Azure Blob Storage

export AZURE_STORAGE_ACCOUNT=your_account
export AZURE_STORAGE_KEY=your_key

# Azure Data Lake Storage (Gen2)
parql head abfs://container@account.dfs.core.windows.net/path/data.parquet

# Azure Blob Storage (Hadoop-style)
parql head wasbs://container@account.blob.core.windows.net/path/data.parquet

# Public Azure files via HTTPS
parql head https://account.blob.core.windows.net/container/path/data.parquet

HDFS (Hadoop)

export HDFS_NAMENODE=localhost
export HDFS_PORT=9000
parql head hdfs://localhost/tmp/save/part-r-00000-6a3ccfae-5eb9-4a88-8ce8-b11b2644d5de.gz.parquet

HTTP/HTTPS

parql schema https://example.com/data.parquet

Local & Glob Patterns

parql head "data/2024/*.parquet" -n 10
parql agg "data/sales/year=*/month=*/*.parquet" -g year,month

Complete API Reference

Global Options

These options work with all ParQL commands:

--threads INTEGER Number of processing threads
--memory-limit TEXT Memory limit (e.g., 4GB)
--format [table|csv|tsv|json|ndjson|markdown] Output format
--verbose Verbose output with additional information
--quiet Minimal output mode
--max-width INTEGER Maximum display width

Output Formats

All commands support multiple output formats:

# Table format (default, rich formatting)
parql head data/sales.parquet --format table

# CSV output
parql agg data/sales.parquet -g country -a "sum(revenue):total" --format csv

# JSON output
parql select data/sales.parquet -c "country,revenue" --format json

# Markdown tables
parql schema data/sales.parquet --format markdown

# Quiet mode (minimal output)
parql --quiet count data/sales.parquet