Skip to main content

Overview

What it does

ClickHouse is a high-performance, column-oriented database designed for real-time analytics on massive datasets. Through Studio’s natural language interface, you can query billions of rows in milliseconds without writing SQL.Key Features:
  • Real-time SQL query execution on large datasets
  • Millisecond response times for analytics
  • Natural language to SQL conversion
  • Column-oriented storage for fast aggregations
  • Distributed query processing
  • Time-series data optimization
  • Read-only queries for data safety
  • Database and table schema exploration

Use Cases

ClickHouse excels at OLAP analytics and business intelligence:
  • Business Dashboards: Real-time metrics and KPIs
  • Log Analysis: Monitor application and system logs
  • User Analytics: Track behavior and engagement patterns
  • E-commerce Insights: Sales, inventory, and customer analytics
  • Financial Analysis: Transaction and market data processing
  • Time-Series Data: IoT sensors, metrics, and monitoring
  • Ad-hoc Exploration: Quick data discovery and analysis
  • Funnel & Cohort Analysis: User journey and retention metrics
ClickHouse can analyze billions of rows in milliseconds, making it ideal for real-time business intelligence and interactive analytics dashboards.

Quick Start

1

Set Up ClickHouse Instance

You need a ClickHouse database to connect to. Choose one option:Option 1: ClickHouse Cloud (Recommended)
  1. Sign up at clickhouse.cloud/signup
  2. Create a new service
  3. Note your connection details from the dashboard:
    • Host: xyz.clickhouse.cloud (your unique hostname)
    • Port: 8443 (HTTPS default)
    • Database: default (or your custom database)
    • Username: default (or custom user)
    • Password: Set during service creation
Free Tier Includes:
  • Up to 25GB storage
  • Fully managed infrastructure
  • Automatic backups and scaling
  • SSL/TLS encryption by default
  • No infrastructure management
Option 2: ClickHouse SQL Playground (Demo) Test with ClickHouse’s demo instance:
  • Host: sql-clickhouse.clickhouse.com
  • Port: 8443
  • Username: demo
  • Password: (leave empty)
  • Database: default
Perfect for trying ClickHouse before creating an account!Option 3: Self-Hosted
  • Install ClickHouse on your infrastructure
  • Configure network access and SSL
  • Set up authentication
  • Note connection details
ClickHouse Cloud is the easiest option - fully managed, secure by default, with automatic scaling. Perfect for business users.
2

Add to NimbleBrain Studio

  1. Open NimbleBrain Studio
  2. Navigate to SettingsMCP Servers
  3. Click Add Server
  4. Search for “ClickHouse” in the server registry
  5. Click Configure
  6. Enter your connection details:
    • CLICKHOUSE_HOST: Your ClickHouse hostname
    • CLICKHOUSE_PORT: 8443 (Cloud) or 8123 (self-hosted HTTP)
    • CLICKHOUSE_USER: Your username (usually default)
    • CLICKHOUSE_PASSWORD: Your password
    • CLICKHOUSE_DATABASE: Database name (optional, defaults to default)
    • CLICKHOUSE_SECURE: true (recommended for Cloud)
  7. Click Test Connection to verify
  8. Click Save & Enable
For ClickHouse Cloud, always use CLICKHOUSE_SECURE=true and port 8443. For local development without SSL, use false and port 8123.
3

Test Your Connection

In your Studio chat, try this prompt:“List all databases in my ClickHouse instance”What happens:
  • Studio calls the list_databases tool
  • ClickHouse returns all database names
  • You see available databases
Next, explore a database:“Show me all tables in the default database”You’ll see table names, schemas, row counts, and column details!
If you see your databases and tables, your ClickHouse connection is working perfectly!

Available Tools

Execute SELECT queries on your ClickHouse database using natural language. Studio converts your questions into optimized SQL automatically.Parameters:
ParameterTypeRequiredDescription
querystringYesSQL SELECT query to execute
Returns:
  • columns: Array of column names
  • rows: Array of result rows (each row is an array of values)
Example Usage:Natural language in Studio:
"Show me the top 10 products by sales from the orders table"
SQL Generated (automatic):
SELECT
    product_name,
    SUM(quantity * price) as total_sales
FROM orders
WHERE date >= today() - INTERVAL 30 DAY
GROUP BY product_name
ORDER BY total_sales DESC
LIMIT 10
Example Response:
{
  "columns": ["product_name", "total_sales"],
  "rows": [
    ["Widget Pro", 125000.50],
    ["Gadget Plus", 98500.25],
    ["Tool Kit", 87300.00]
  ]
}
All queries run with readonly=1 for safety - no accidental data modifications possible!
Query timeout is configurable (default: 30 seconds). For long-running queries, increase CLICKHOUSE_MCP_QUERY_TIMEOUT.
List all databases in your ClickHouse instance. Perfect for exploring what data is available.Parameters:
ParameterTypeRequiredDescription
(none)--No parameters required
Returns:
  • JSON array of database names
Example Usage:Natural language in Studio:
"What databases are available in ClickHouse?"
or simply:
"List all databases"
Example Response:
[
  "default",
  "system",
  "analytics",
  "logs",
  "events"
]
Use this as your first step when connecting to explore what data is available. Then drill down into specific databases.
System databases like system contain ClickHouse metadata and monitoring information. Your business data is typically in default or custom databases.
List all tables in a database with detailed schema information including columns, data types, row counts, and storage statistics.Parameters:
ParameterTypeRequiredDescription
databasestringYesName of the database to query
likestringNoFilter tables by pattern (e.g., 'user%' for tables starting with “user”)
not_likestringNoExclude tables by pattern (e.g., '%_tmp' to exclude temporary tables)
Returns:
  • Array of table objects with comprehensive metadata:
    • database: Database name
    • name: Table name
    • engine: Storage engine (e.g., MergeTree)
    • columns: Array of column details (name, type, default, comment)
    • total_rows: Number of rows in table
    • total_bytes: Storage size in bytes
    • primary_key: Primary key columns
    • sorting_key: Sort order for optimization
    • comment: Table description
    • Plus: engine details, compression stats, partition info
Example Usage:Natural language in Studio:
"Show me all tables in the analytics database"
With filtering:
"List all user-related tables in the default database"
(Studio automatically applies LIKE filter for “user”)Exclude temporary tables:
"Show me production tables in analytics, excluding any temp tables"
Example Response:
[
  {
    "database": "analytics",
    "name": "events",
    "engine": "MergeTree",
    "total_rows": 1250000000,
    "total_bytes": 45000000000,
    "primary_key": "date, user_id",
    "sorting_key": "date, user_id, event_type",
    "comment": "User behavior events",
    "columns": [
      {
        "name": "date",
        "column_type": "Date",
        "comment": "Event date"
      },
      {
        "name": "user_id",
        "column_type": "UInt64",
        "comment": "User identifier"
      },
      {
        "name": "event_type",
        "column_type": "String",
        "comment": "Type of event"
      }
    ]
  }
]
Use this to understand your data structure before writing queries. Pay attention to primary_key and sorting_key for optimal query performance!
The total_rows and total_bytes fields help you understand data volume. ClickHouse can handle billions of rows efficiently!
Execute SQL queries using chDB, an embedded ClickHouse engine that can query files, URLs, and databases without ETL processes. Perfect for ad-hoc data exploration from various sources.Parameters:
ParameterTypeRequiredDescription
querystringYesSQL query to execute with chDB
Returns:
  • JSON array of result objects (each row as an object with column names as keys)
What is chDB?
  • Embedded ClickHouse engine (no separate server needed)
  • Query any source: Local files (CSV, JSON, Parquet), URLs, databases
  • In-memory or file-based storage
  • No ETL required: Query data directly from source
  • Perfect for: Data exploration, prototyping, lightweight analytics
Example Usage:Query a CSV file:
"Use chDB to analyze this CSV: SELECT * FROM file('data.csv', CSV) LIMIT 10"
Query a URL:
"Use chDB to get data from this URL: SELECT * FROM url('https://example.com/data.json', JSONEachRow)"
In-memory analytics:
"Use chDB to calculate: SELECT sum(value) FROM (SELECT 1 as value UNION ALL SELECT 2 UNION ALL SELECT 3)"
Example Response:
[
  {"id": 1, "name": "Product A", "sales": 1000},
  {"id": 2, "name": "Product B", "sales": 1500},
  {"id": 3, "name": "Product C", "sales": 2000}
]
chDB is optional and disabled by default. Enable it by setting CHDB_ENABLED=true in your Studio configuration. Requires CHDB_DATA_PATH for persistent storage or defaults to in-memory.
Use chDB for quick data exploration without loading data into ClickHouse. Great for testing queries on sample data before production deployment!

Authentication

Database Credentials Required: ClickHouse requires connection authentication for secure data access.

Connection Configuration

Configure these parameters in NimbleBrain Studio (SettingsMCP ServersClickHouse): Required Parameters:
ParameterDescriptionExample
CLICKHOUSE_HOSTYour ClickHouse server hostnameabc123.clickhouse.cloud
CLICKHOUSE_USERDatabase usernamedefault
CLICKHOUSE_PASSWORDDatabase passwordyour_secure_password
Optional Parameters:
ParameterDescriptionDefaultNotes
CLICKHOUSE_PORTConnection port8443 (secure) or 8123 (HTTP)Auto-set based on CLICKHOUSE_SECURE
CLICKHOUSE_DATABASEDefault databasedefaultConnect to specific database
CLICKHOUSE_SECUREEnable SSL/TLStrueAlways true for ClickHouse Cloud
CLICKHOUSE_VERIFYVerify SSL certificatestrueRecommended for production
CLICKHOUSE_CONNECT_TIMEOUTConnection timeout (seconds)30Increase for slow networks
CLICKHOUSE_SEND_RECEIVE_TIMEOUTQuery timeout (seconds)300Increase for long-running queries
For ClickHouse Cloud: Use CLICKHOUSE_SECURE=true and port 8443. SSL/TLS is enabled by default with automatic certificate handling.

ClickHouse Cloud vs Self-Hosted

ClickHouse Cloud (Recommended):
TierStorageComputePriceBest For
Free25GB8GB RAM$0/moTesting, small projects
Development100GB16GB RAM~$50/moDev/staging environments
Production1TB+32GB+ RAM~$300+/moProduction workloads
EnterpriseCustomCustomCustomLarge-scale deployments
Benefits:
  • ✅ Fully managed (no DevOps)
  • ✅ Automatic backups
  • ✅ SSL/TLS by default
  • ✅ Auto-scaling
  • ✅ High availability
Self-Hosted:
  • Free open-source software
  • You manage infrastructure costs (compute, storage, networking)
  • Full control over configuration
  • Requires DevOps expertise for:
    • Installation and upgrades
    • Backup and recovery
    • Security and SSL setup
    • Performance tuning
    • Monitoring and alerts
Self-hosted deployments require significant DevOps expertise. ClickHouse Cloud is recommended for business users who want to focus on analytics, not infrastructure management.

Security Best Practices

Create a dedicated read-only user for Studio to prevent accidental data modification:
-- Create read-only user
CREATE USER studio_readonly IDENTIFIED BY 'secure_password_here';

-- Grant SELECT permission on specific databases
GRANT SELECT ON analytics.* TO studio_readonly;
GRANT SELECT ON logs.* TO studio_readonly;

-- Allow listing databases and tables
GRANT SHOW DATABASES ON *.* TO studio_readonly;
GRANT SHOW TABLES ON *.* TO studio_readonly;
Why read-only?
  • Prevents accidental data deletion or modification
  • Safe for business users to explore data
  • ClickHouse MCP already enforces readonly=1, but database-level permissions add extra security
All Studio queries automatically run with readonly=1 setting, ensuring queries cannot modify data even if the user has write permissions.
Encrypt data in transit between Studio and ClickHouse:ClickHouse Cloud:
  • SSL/TLS enabled by default
  • Automatic certificate management
  • Just set CLICKHOUSE_SECURE=true
Self-Hosted:
  1. Configure SSL certificates in ClickHouse server
  2. Set CLICKHOUSE_SECURE=true
  3. Set CLICKHOUSE_VERIFY=true to validate certificates
Studio Settings:
CLICKHOUSE_SECURE=true
CLICKHOUSE_VERIFY=true
CLICKHOUSE_PORT=8443
Never use CLICKHOUSE_SECURE=false with sensitive data in production. Unencrypted connections expose your data to network sniffing.
Control which IPs can connect to your ClickHouse instance:ClickHouse Cloud:
  • Configure IP allowlists in the Cloud console
  • Add your office IPs
  • Add Studio’s IP ranges (contact NimbleBrain support)
Self-Hosted:
  • Configure firewall rules (iptables, security groups)
  • Limit access to known IPs
  • Use VPN for remote access
  • Disable public internet access if possible
For maximum security, place ClickHouse behind a VPN and only allow connections from authorized networks.
Password security best practices:Strong Password Requirements:
  • Minimum 16 characters
  • Mix of uppercase, lowercase, numbers, symbols
  • Not reused from other services
  • Not based on dictionary words
  • Unique per environment (dev, staging, prod)
Password Rotation:
  • Rotate every 90 days
  • Rotate immediately if:
    • Employee leaves with access
    • Suspected credential compromise
    • Security audit recommendation
Password Management:
  • Store in a password manager (1Password, LastPass, etc.)
  • Never commit passwords to git repos
  • Use environment variables, not hardcoded values
Default passwords like “clickhouse” or “password123” are the #1 cause of database breaches. Always use strong, unique passwords.
Track who is querying your database and what they’re asking:ClickHouse System Tables:
-- Recent queries
SELECT
    user,
    query,
    query_start_time,
    query_duration_ms
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY query_start_time DESC
LIMIT 100;
What to monitor:
  • Unusual query patterns
  • Long-running queries
  • Failed authentication attempts
  • Queries accessing sensitive tables
  • Unusual access times (nights, weekends)
Set up alerts for:
  • Multiple failed authentication attempts
  • Queries to sensitive tables
  • Unusual data volume exports
  • Performance degradation
Regular query monitoring helps identify both security issues and performance bottlenecks early.

Managing Credentials in Studio

Update your ClickHouse connection details anytime:
  1. Go to SettingsMCP Servers
  2. Find “ClickHouse” in your server list
  3. Click Edit Configuration
  4. Update your connection parameters
  5. Click Test Connection to verify changes
  6. Click Save
Studio securely stores your database credentials with encryption. Your credentials are never exposed in conversation logs or shared with other users.

Example Workflows

  • Sales Analytics
  • User Behavior Analysis
  • Log Analysis & Troubleshooting
  • Time-Series Metrics
  • Funnel Analysis
  • Cohort Retention Analysis
  • A/B Test Analysis
  • Real-Time Dashboard
Scenario: Analyze sales performance by region and product categoryBusiness Goal: Identify top-performing regions and products to optimize inventory and marketing spend.Prompt:
"Show me total sales by region for the last 30 days, including order count and average order value"
What happens:
  1. Studio converts your question to SQL
  2. Queries your sales or orders table
  3. Aggregates by region with date filtering
  4. Calculates metrics (total, count, average)
  5. Returns results in readable format
SQL Generated (automatic):
SELECT
    region,
    COUNT(*) as order_count,
    SUM(amount) as total_sales,
    AVG(amount) as avg_order_value
FROM sales
WHERE date >= today() - INTERVAL 30 DAY
GROUP BY region
ORDER BY total_sales DESC
Example Results:
RegionOrder CountTotal SalesAvg Order Value
West15,234$1,250,000$82.05
East12,890$987,500$76.61
South10,456$890,000$85.12
Time: <100ms for millions of rows Follow-up: “Now break down West region by product category”
ClickHouse’s column-oriented storage makes aggregations incredibly fast. Perfect for real-time dashboard queries!
All examples use natural language prompts in Studio - SQL is generated automatically! Focus on asking business questions, not writing queries.

Troubleshooting

Error Message:
Connection refused - unable to connect to ClickHouse
ERROR - Cannot connect to ClickHouse: [errno] Connection refused
Possible Causes:
  • Incorrect host or port
  • Firewall blocking connection
  • ClickHouse server not running (self-hosted)
  • Network connectivity issues
Solutions:
  1. Verify Connection Details:
    • Go to SettingsMCP ServersClickHouseEdit
    • Check CLICKHOUSE_HOST matches your instance
    • For Cloud: Should be xyz.clickhouse.cloud
    • For Playground: Should be sql-clickhouse.clickhouse.com
  2. Check Port Configuration:
    • Cloud (HTTPS): Port 8443, CLICKHOUSE_SECURE=true
    • Self-hosted (HTTP): Port 8123, CLICKHOUSE_SECURE=false
    • Self-hosted (HTTPS): Port 8443, CLICKHOUSE_SECURE=true
  3. Test from Command Line:
    curl "https://your-host:8443?query=SELECT version()" \
      --user username:password
    
  4. Check Firewall Rules:
    • ClickHouse Cloud: Add your IP to allowlist in Cloud console
    • Self-hosted: Check firewall rules allow inbound on ClickHouse port
    • Corporate network: Verify outbound connections allowed
Most connection issues are due to port misconfiguration. Cloud always uses 8443 with SSL, local dev typically uses 8123 without SSL.
Error Message:
401 Authentication failed
Code: 516. DB::Exception: Invalid credentials
Possible Causes:
  • Wrong username or password
  • Password contains special characters not properly encoded
  • User doesn’t exist in ClickHouse
  • User doesn’t have necessary permissions
Solutions:
  1. Verify Credentials:
    • Double-check username and password in Studio config
    • Check for typos, extra spaces, or incorrect capitalization
    • ClickHouse passwords are case-sensitive
  2. Reset Password:
    • Cloud: Reset in ClickHouse Cloud dashboard
    • Self-hosted: Update via SQL or configuration file
  3. Check User Exists:
    -- As admin user, check if user exists
    SELECT name FROM system.users WHERE name = 'your_username';
    
  4. Grant Permissions:
    -- Grant necessary permissions
    GRANT SELECT ON database_name.* TO username;
    GRANT SHOW DATABASES ON *.* TO username;
    
  5. Special Characters in Password:
    • If password contains @, #, %, etc., ensure proper encoding
    • Try resetting to a simpler password temporarily for testing
Default username for ClickHouse is usually default. Verify your actual username in your ClickHouse setup.
Error Message:
Query timed out after 30 seconds
Timeout exceeded: elapsed 30.001 seconds
Possible Causes:
  • Query scanning too much data
  • Missing indexes or poor table design
  • Unoptimized query (no WHERE clause on large table)
  • Server overloaded
Solutions:
  1. Add Date/Time Filters:
    -- Instead of:
    SELECT * FROM events
    
    -- Do:
    SELECT * FROM events
    WHERE date >= today() - INTERVAL 7 DAY
    
  2. Use LIMIT:
    SELECT * FROM large_table
    WHERE condition
    LIMIT 10000
    
  3. Increase Timeout:
    • In Studio: Add CLICKHOUSE_MCP_QUERY_TIMEOUT=60 (seconds)
    • Or in query: Use Studio to request “with longer timeout”
  4. Optimize Query:
    -- Use PREWHERE for efficient filtering
    SELECT * FROM table
    PREWHERE date >= today() - 7
    WHERE status = 'active'
    
    -- Use sampling for exploration
    SELECT * FROM table SAMPLE 0.1 -- 10% sample
    
  5. Check Table Design:
    • Verify ORDER BY key matches your query patterns
    • Consider partitioning by date
    • Create materialized views for complex queries
ClickHouse is extremely fast, but unoptimized queries on billions of rows can still timeout. Always add date filters and LIMIT clauses.
Error Message:
DB::Exception: Table database.table_name doesn't exist
DB::Exception: Database 'database_name' doesn't exist
Possible Causes:
  • Table or database name is misspelled
  • Table doesn’t exist in the specified database
  • User doesn’t have permission to see the table
  • Wrong database specified
Solutions:
  1. List Available Databases:
    "List all databases in ClickHouse"
    
  2. List Tables in Database:
    "Show me all tables in the [database_name] database"
    
  3. Check Table Name:
    • ClickHouse table names are case-sensitive
    • UsersusersUSERS
    • Use exact capitalization
  4. Check Database Context:
    -- Explicitly specify database
    SELECT * FROM database_name.table_name
    
  5. Verify Permissions:
    -- Check what you have access to
    SHOW GRANTS FOR CURRENT_USER
    
Use list_databases and list_tables tools before writing queries to discover what data is available and get exact table names.
Error Message:
DB::Exception: Memory limit exceeded for query
Code: 241. DB::Exception: Memory limit (for query) exceeded
Possible Causes:
  • Query returning too many rows
  • Large GROUP BY creating many groups
  • JOINing very large tables
  • Not enough memory allocated
Solutions:
  1. Add LIMIT:
    SELECT * FROM large_table
    WHERE date = today()
    LIMIT 10000  -- Limit rows returned
    
  2. Filter More Aggressively:
    -- Add WHERE clauses to reduce data scanned
    SELECT * FROM table
    WHERE date >= today() - INTERVAL 1 DAY
      AND region = 'US'
      AND status = 'active'
    
  3. Aggregate Instead of Returning Raw Data:
    -- Instead of returning millions of rows:
    SELECT * FROM events
    
    -- Return aggregated summary:
    SELECT
        date,
        event_type,
        COUNT(*) as event_count
    FROM events
    GROUP BY date, event_type
    
  4. Use Sampling for Exploration:
    SELECT * FROM huge_table
    SAMPLE 0.01  -- 1% sample
    WHERE conditions
    
  5. For ClickHouse Cloud:
    • Memory auto-scales, but very large queries may still hit limits
    • Consider upgrading to higher tier for more memory
Avoid SELECT * on large tables without filtering. Always use WHERE clauses and LIMIT to reduce data volume.
Error Message:
DB::Exception: Syntax error
Code: 62. DB::Exception: Syntax error: unexpected token
Possible Causes:
  • ClickHouse SQL syntax differs from other databases
  • Using MySQL/PostgreSQL-specific functions
  • Incorrect function names
  • Malformed query
Solutions:
  1. Use Studio’s Natural Language:
    • Let Studio generate SQL automatically
    • Ask in plain English: “Show me top 10 users by activity”
    • Studio handles ClickHouse-specific syntax
  2. Common ClickHouse Differences:
    -- Date functions
    toDate('2024-01-01')     -- not CAST('2024-01-01' AS DATE)
    today()                   -- not CURDATE() or CURRENT_DATE
    now()                     -- not CURRENT_TIMESTAMP
    
    -- Array functions
    arrayJoin(array_column)   -- for array operations
    groupArray(column)        -- aggregate into array
    
    -- String functions
    concat(str1, str2)        -- same as others
    position(haystack, needle) -- not INSTR
    
  3. Check ClickHouse Documentation:
  4. Test Incrementally:
    • Start with simple SELECT
    • Add complexity step by step
    • Identify what causes the error
When using Studio, natural language queries avoid syntax issues entirely. Describe what you want, and Studio handles ClickHouse-specific syntax automatically!
Issue: Queries taking longer than expected (multiple seconds)Diagnosis:ClickHouse is extremely fast - if queries are slow, it’s usually a schema or query optimization issue.Common Causes & Solutions:
  1. Missing Date Filters:
    -- Slow: Scans entire table
    SELECT * FROM events WHERE user_id = 123
    
    -- Fast: Filters by date first
    SELECT * FROM events
    WHERE date >= today() - 7
      AND user_id = 123
    
  2. ORDER BY Doesn’t Match Table Sort Key:
    -- If table is sorted by (date, user_id):
    
    -- Fast: Matches sort key
    SELECT * FROM events
    ORDER BY date, user_id
    
    -- Slow: Different sort order
    SELECT * FROM events
    ORDER BY timestamp DESC
    
  3. Not Using PREWHERE:
    -- PREWHERE evaluates before reading all columns
    SELECT * FROM large_table
    PREWHERE date >= today() - 7  -- Evaluated first
    WHERE status = 'active'       -- Evaluated on filtered data
    
  4. Large Result Sets:
    • Always use LIMIT
    • Return aggregated data instead of raw rows
    • Use sampling for exploration
  5. Check Query Performance:
    -- See query execution plan
    EXPLAIN SELECT ...
    
    -- See query statistics
    SELECT query_duration_ms, read_rows, read_bytes
    FROM system.query_log
    WHERE type = 'QueryFinish'
    ORDER BY query_start_time DESC
    LIMIT 10
    
Well-optimized ClickHouse queries typically run in 10-100ms even on billions of rows. If your queries are slower, schema optimization is usually the answer.
Error Message:
SSL certificate verification failed
Certificate verification error
Possible Causes:
  • Self-signed certificates
  • Expired certificates
  • Certificate chain issues
  • System trust store not configured
Solutions:
  1. For ClickHouse Cloud:
    • Certificates are valid and trusted
    • Ensure CLICKHOUSE_SECURE=true
    • Ensure CLICKHOUSE_VERIFY=true
    • Should work automatically
  2. For Self-Hosted with Valid Certs:
    • Ensure certificates are from trusted CA
    • System trust store should recognize them
    • Use CLICKHOUSE_VERIFY=true
  3. For Self-Hosted with Self-Signed Certs (DEV ONLY):
    • Not recommended for production
    • Set CLICKHOUSE_VERIFY=false
    • Disables certificate verification
  4. Update System Trust Store:
    • Ensure your OS has updated CA certificates
    • On Ubuntu: sudo apt-get update ca-certificates
    • On macOS: Usually automatic via OS updates
Never use CLICKHOUSE_VERIFY=false in production. This disables certificate verification and makes your connection vulnerable to man-in-the-middle attacks.
Looking for other data and analytics tools?

Learning Resources

ClickHouse Overview:ClickHouse is an open-source, column-oriented database management system designed for OLAP (Online Analytical Processing).Key Characteristics:
  • Column-Oriented: Stores data by columns, not rows
    • Traditional databases: Store full rows together
    • ClickHouse: Stores each column separately
    • Result: 100-1000x faster for analytics queries
  • Massively Parallel: Distributes queries across cores and servers
  • Real-Time: Query billions of rows in milliseconds
  • Highly Compressed: 10-100x compression ratios
  • Scalable: From single server to hundreds of nodes
Best For:
  • Analytics and business intelligence
  • Real-time dashboards
  • Log processing and monitoring
  • Time-series data
  • Event tracking and user analytics
Not Ideal For:
  • Transactional workloads (OLTP)
  • Frequent updates to individual rows
  • Small datasets (less than millions of rows)
  • Document storage
How Studio Converts Your Questions to ClickHouse SQL:You ask: “Show me revenue by product category this month”Studio’s AI:
  1. Understands your intent (revenue analysis)
  2. Identifies entities (revenue, product, category, time period)
  3. Maps to your database schema
  4. Generates optimized ClickHouse SQL
  5. Executes and formats results
Generated SQL:
SELECT
    product_category,
    SUM(revenue) as total_revenue,
    COUNT(*) as order_count
FROM sales
WHERE date >= toStartOfMonth(now())
GROUP BY product_category
ORDER BY total_revenue DESC
You see: Formatted results in natural language with data tableNo SQL knowledge required - just ask business questions!
Be specific in your questions: mention time ranges, specific tables if known, and what metrics you want. The more context, the better the SQL!
Make Your Queries Lightning Fast:1. Always Filter by Date/Time
-- Slow: Scans entire table
SELECT * FROM events

-- Fast: Filters first
SELECT * FROM events
WHERE date >= today() - 30
ClickHouse tables are typically partitioned by date - this reduces data scanned by 10-1000x.2. Use LIMIT
SELECT * FROM large_table
WHERE conditions
LIMIT 1000  -- Stop after 1000 rows
Prevents accidentally returning millions of rows.3. Aggregate When Possible
-- Instead of returning millions of raw rows:
SELECT * FROM events

-- Return summary:
SELECT date, COUNT(*) as events
FROM events
GROUP BY date
Aggregated data is much smaller and faster to return.4. Match Your ORDER BY to Table Sort Key
-- If table sorted by (date, user_id):
SELECT * FROM table
WHERE date = today()
ORDER BY date, user_id  -- Fast!
Matching sort order uses pre-sorted data.5. Use PREWHERE for Filtering
SELECT expensive_columns
FROM table
PREWHERE cheap_filter = value  -- Evaluated first
WHERE expensive_filter = value  -- Evaluated after
PREWHERE reduces data before expensive operations.6. Use Sampling for Exploration
SELECT * FROM billion_row_table
SAMPLE 0.01  -- 1% sample = 10M rows
WHERE conditions
Perfect for testing queries before running on full data.
With these optimizations, ClickHouse can query billions of rows in 10-100ms!
Designing Tables for Performance:1. Choose the Right Engine
CREATE TABLE events (
    date Date,
    timestamp DateTime,
    user_id UInt64,
    event_type String
) ENGINE = MergeTree()  -- Most common
ORDER BY (date, user_id)
PARTITION BY toYYYYMM(date);
2. Order By Frequently Queried Columns
-- If you always filter by date and user_id:
ORDER BY (date, user_id)

-- If you query by user_id first:
ORDER BY (user_id, date)
Put most filtered columns first.3. Partition by Date
PARTITION BY toYYYYMM(date)  -- Monthly partitions
Makes dropping old data fast and enables efficient time-range queries.4. Use Appropriate Data Types
-- Good: Right-sized types
user_id UInt64
status Enum8('active'=1, 'inactive'=2)
date Date

-- Bad: Oversized types
user_id String  -- Numbers as strings are slow
status String   -- Use Enum instead
Smaller types = faster queries and less storage.5. Denormalize for Query Performance
-- Instead of JOINs:
SELECT u.name, e.event
FROM events e
JOIN users u ON e.user_id = u.id

-- Denormalize:
SELECT user_name, event
FROM events_denormalized
ClickHouse favors denormalization - duplicate data for faster queries.6. Use Materialized Views for Complex Queries
CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
ORDER BY (date, user_id)
AS SELECT
    date,
    user_id,
    COUNT(*) as event_count
FROM events
GROUP BY date, user_id
Pre-aggregate frequently queried data.
Good schema design is 10x more important than query optimization in ClickHouse. Invest time in proper table structure!
Understanding Database Types:OLAP (ClickHouse):
  • Online Analytical Processing
  • Optimized for: Reading and aggregating large datasets
  • Use for: Analytics, reports, dashboards, business intelligence
OLTP (PostgreSQL, MySQL):
  • Online Transaction Processing
  • Optimized for: Frequent small updates and inserts
  • Use for: Application databases, user data, transactions
Comparison:
FeatureClickHouse (OLAP)PostgreSQL (OLTP)
Read SpeedExtremely fast (billions/sec)Moderate
Write SpeedBatch inserts onlyIndividual inserts fast
UpdatesNot supportedFrequent updates OK
DeletesExpensiveFast
Use CaseAnalytics, reportingApplication backend
Data VolumeBillions of rowsMillions of rows
Query TypeAggregations, scansLookups, joins
When to Use ClickHouse:
  • ✅ Analyzing logs (millions of log lines per day)
  • ✅ User behavior analytics (clickstream data)
  • ✅ Business intelligence dashboards
  • ✅ Time-series metrics and monitoring
  • ✅ Financial analysis (transactions, market data)
  • ✅ Ad-hoc data exploration
When NOT to Use ClickHouse:
  • ❌ User profile storage (use PostgreSQL)
  • ❌ Shopping cart data (use Redis/PostgreSQL)
  • ❌ Content management (use MongoDB/PostgreSQL)
  • ❌ Frequent updates to individual records
  • ❌ Transaction processing (orders, payments)
Hybrid Architecture (Best Practice):
Application Database (PostgreSQL)
↓ Real-time streaming
Analytics Database (ClickHouse)
Use OLTP for application, stream data to OLAP for analytics!
Most successful data architectures use both OLTP and OLAP databases. Write to OLTP, analyze in OLAP.