ParQL
Command Line Parquet Query Tool

A powerful command-line tool for querying and manipulating Parquet datasets directly from the terminal. Bring pandas-like operations and SQL capabilities to your command line, powered by DuckDB.

Get Started View on GitHub

25+ Commands

100% CLI Native

∞ Possibilities

parql shell

$ parql head data/sales.parquet -n 5

user_id	country	revenue	timestamp
1001	US	1250.50	2024-01-01
1002	UK	890.25	2024-01-01
1003	CA	2100.75	2024-01-01

$ parql agg data/sales.parquet -g country -a "sum(revenue):total"

country	total
US	45,250.50
UK	32,890.25
CA	28,100.75

Why Choose ParQL?

Lightning Fast

Columnar processing with DuckDB engine. Only reads necessary columns and streams large datasets efficiently.

CLI Native

Built for the command line with beautiful terminal output, interactive shell, and ASCII visualizations.

Cloud Ready

Native support for S3, GCS, Azure, HDFS, and HTTP. Works with data anywhere in the cloud and distributed systems.

Rich Analytics

25+ commands covering data exploration, aggregation, visualization, and quality checks.

Memory Efficient

Smart caching, parallel execution, and memory limits. Handle datasets larger than RAM.

SQL Power

Full SQL support with DuckDB. Complex queries, window functions, and custom expressions.

Quick Start

Install ParQL

# Install from PyPI
pip install parql

# Or install from source
git clone https://github.com/abdulrafey38/parql.git
cd parql
pip install -e .

Explore Your Data

# Preview data
parql head data/sales.parquet -n 10

# Check schema
parql schema data/sales.parquet

# Data profiling
parql profile data/sales.parquet

Analyze and Visualize

# Aggregations
parql agg data/sales.parquet -g country -a "sum(revenue):total"

# Visualizations
parql plot data/sales.parquet -c revenue --chart-type hist

# Interactive mode
parql shell

# Remote data sources
parql head hdfs://localhost/tmp/data.parquet

Command Reference

Data Exploration

parql head Preview first N rows

parql tail Preview last N rows

parql schema Show column information

parql count Count rows

parql sample Random sampling

parql distinct Unique values

Analytics

parql agg Group and aggregate

parql window Window functions

parql pivot Pivot tables

parql corr Correlation analysis

parql percentiles Percentile statistics

Data Processing

parql select Filter and select

parql join Join datasets

parql sql Custom SQL queries

parql str String operations

parql pattern Pattern matching

Visualization

parql plot ASCII charts

parql profile Data profiling

parql outliers Outlier detection

parql nulls Null analysis

parql hist Histograms

Data Quality

parql assert Data validation

parql compare-schema Schema comparison

parql infer-types Type optimization

System

parql shell Interactive REPL

parql config Configuration

parql cache Cache management

parql write Export data

Need More Details?

Explore the complete command reference with detailed examples, options, and use cases for all 25+ ParQL commands.

View Complete Command Reference View Source Code

Quick Command Reference

parql head data.parquet -n 10 Preview first 10 rows

parql agg data.parquet -g country -a "sum(revenue):total" Group by country, sum revenue

parql profile data.parquet Data quality report

parql plot data.parquet -c revenue --chart-type hist Create histogram

parql sql "SELECT * FROM t WHERE revenue > 1000" -p t=data.parquet Custom SQL query

parql join users.parquet sales.parquet --on "user_id" Join two datasets

parql shell Interactive mode

parql write data.parquet output.csv --format csv Export to CSV

Examples

Data Exploration Examples

Quick Overview

# Preview data
parql head data/sales.parquet -n 5

# Check schema
parql schema data/sales.parquet

# Count total rows
parql count data/sales.parquet

# Sample data for quick analysis
parql sample data/sales.parquet --rows 1000

Data Profiling

# Comprehensive profiling
parql profile data/sales.parquet --include-all

# Profile specific columns
parql profile data/users.parquet -c "age,country,plan"

# Check for nulls
parql nulls data/sales.parquet

Data Analysis Examples

Aggregations

# Basic grouping
parql agg data/sales.parquet -g "country" -a "sum(revenue):total"

# Multiple aggregations
parql agg data/sales.parquet -g "country,device" \
  -a "sum(revenue):total,avg(revenue):avg_rev,count():orders"

# With filtering
parql agg data/sales.parquet -g "user_id" \
  -a "sum(revenue):total" -h "total > 1000"

Window Functions

# Ranking within groups
parql window data/sales.parquet \
  --partition "country" \
  --order "revenue DESC" \
  --expr "row_number() as rank"

# Running totals
parql window data/sales.parquet \
  --partition "user_id" \
  --order "timestamp" \
  --expr "sum(revenue) over (rows unbounded preceding) as running_total"

Visualization Examples

ASCII Charts

# Histogram
parql plot data/sales.parquet -c revenue \
  --chart-type hist --bins 20 --width 60

# Bar chart
parql plot data/sales.parquet -c country \
  --chart-type bar --width 50 --limit 10

# Scatter plot
parql plot data/sales.parquet -c revenue \
  --chart-type scatter -x quantity --limit 100

Statistical Analysis

# Correlation matrix
parql corr data/sales.parquet -c "quantity,price,revenue"

# Percentiles
parql percentiles data/sales.parquet -c "revenue"

# Outlier detection
parql outliers data/sales.parquet -c revenue \
  --method zscore --threshold 3

Data Quality Examples

Validation

# Basic assertions
parql assert data/sales.parquet --rule "row_count > 1000"
parql assert data/sales.parquet --rule "no_nulls(user_id)"
parql assert data/sales.parquet --rule "unique(order_id)"

# Custom conditions
parql assert data/sales.parquet --rule "min(revenue) >= 0"
parql assert data/sales.parquet --rule "max(discount) <= 1.0"

Schema Management

# Compare schemas
parql compare-schema data/old.parquet data/new.parquet

# Type optimization
parql infer-types data/sales.parquet --suggest-types

# Schema validation
parql compare-schema data/expected.parquet \
  data/actual.parquet --fail-on-change

Remote Data Sources

ParQL supports reading from various remote data sources including cloud storage and distributed file systems.

AWS S3

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
parql head s3://bucket/path/data.parquet

Google Cloud Storage

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
parql agg gs://bucket/data/*.parquet -g country -a "sum(revenue):total"

# Public GCS Datasets
parql head gs://anonymous@voltrondata-labs-datasets/diamonds/cut=Good/part-0.parquet

Azure Blob Storage

export AZURE_STORAGE_ACCOUNT=your_account
export AZURE_STORAGE_KEY=your_key

# Azure Data Lake Storage (Gen2)
parql head abfs://container@account.dfs.core.windows.net/path/data.parquet

# Azure Blob Storage (Hadoop-style)
parql head wasbs://container@account.blob.core.windows.net/path/data.parquet

# Public Azure files via HTTPS
parql head https://account.blob.core.windows.net/container/path/data.parquet

HDFS (Hadoop)

export HDFS_NAMENODE=localhost
export HDFS_PORT=9000
parql head hdfs://localhost/tmp/save/part-r-00000-6a3ccfae-5eb9-4a88-8ce8-b11b2644d5de.gz.parquet

HTTP/HTTPS

parql schema https://example.com/data.parquet

Local & Glob Patterns

parql head "data/2024/*.parquet" -n 10
parql agg "data/sales/year=*/month=*/*.parquet" -g year,month

Complete API Reference

Global Options

These options work with all ParQL commands:

--threads INTEGER Number of processing threads

--memory-limit TEXT Memory limit (e.g., 4GB)

--verbose Verbose output with additional information

--quiet Minimal output mode

--max-width INTEGER Maximum display width

Output Formats

All commands support multiple output formats:

# Table format (default, rich formatting)
parql head data/sales.parquet --format table

# CSV output
parql agg data/sales.parquet -g country -a "sum(revenue):total" --format csv

# JSON output
parql select data/sales.parquet -c "country,revenue" --format json

# Markdown tables
parql schema data/sales.parquet --format markdown

# Quiet mode (minimal output)
parql --quiet count data/sales.parquet

ParQL Command Line Parquet Query Tool