Introduction
DuckDB is an open-source analytical database system designed specifically for data wrangling — the process of transforming raw data into a usable format for analysis through discovery, cleaning, enrichment, and validation.
DuckDB provides comprehensive library support for Python, R, Java, and WebAssembly (WASM). In this article, we’ll focus on the standard command-line interface (CLI) application and explore its innovative browser-based user interface.
DuckDB provides robust support for importing various data formats including CSV, Excel, Parquet, and remote file protocols (e.g., S3 buckets in AWS), making it highly versatile for data integration tasks.
Installation
DuckDB CLI can be installed across multiple platforms using the following methods:
Linux / macOS
curl https://install.duckdb.org | sh
Windows (x86-64)
# Download from the official GitHub repository
https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-windows-amd64.zip
Windows (ARM64)
# Download from the official GitHub repository
https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-windows-arm64.zip
DuckDB CLI
DuckDB can be launched in memory mode by executing the following command:
duckdb
Note: In memory mode, all data and modifications are transient and will be lost upon exiting the session.
For persistent data storage, specify a database file path. The following example creates or opens a database named test.db
:
duckdb test.db
Data Import Operations To import data into a persistent database (from within the DuckDB CLI):
CSV File Import
Important: Remember to terminate all SQL statements with a semicolon.
CREATE TABLE input_csv AS SELECT * FROM read_csv('input.csv');
Excel File Import
CREATE TABLE input_excel AS SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
For additional file formats and import options, refer to the DuckDB Documentation - File Formats.
Tip: To overwrite existing tables, use the
CREATE OR REPLACE TABLE
syntax:CREATE OR REPLACE TABLE input_csv AS SELECT * FROM read_csv('input.csv');
Once data is persisted in the local database, you can query it using standard SQL. Here’s a sample query result from a Kaggle dataset:
SELECT * FROM input_csv LIMIT 5;
┌────────────┬────────┬────────┬────────┬────────┬────────┬─────────┐
│ Date │ Open │ High │ Low │ Close │ Volume │ OpenInt │
│ date │ double │ double │ double │ double │ int64 │ int64 │
├────────────┼────────┼────────┼────────┼────────┼────────┼─────────┤
│ 2012-12-13 │ 15.0 │ 15.0 │ 15.0 │ 15.0 │ 100 │ 0 │
│ 2012-12-14 │ 19.0 │ 30.0 │ 19.0 │ 30.0 │ 144600 │ 0 │
│ 2012-12-17 │ 31.5 │ 65.0 │ 31.5 │ 65.0 │ 68600 │ 0 │
│ 2012-12-18 │ 65.0 │ 89.0 │ 65.0 │ 80.0 │ 43600 │ 0 │
│ 2012-12-19 │ 80.0 │ 84.0 │ 78.0 │ 84.0 │ 24000 │ 0 │
└────────────┴────────┴────────┴────────┴────────┴────────┴─────────┘
DuckDB is optimized for analytical query workloads (Online Analytical Processing or OLAP) and excels at processing large datasets that would typically overwhelm spreadsheet applications like Excel.
Essential CLI Commands
Several dot commands are available within the DuckDB CLI environment:
.exit # Exits from CLI mode
.table # Lists all tables in the local database
.help # Displays available help options
.open # Opens a database instance
For a comprehensive list of available dot commands, refer to the DuckDB Documentation - Dot Commands.
DuckDB Browser UI
For those who prefer graphical interfaces, DuckDB version 1.2.1 introduced a -ui
flag that launches a Jupyter-like notebook interface in the browser, offering a local SQL notebook experience.
In the following demonstration, we’ll use the Kaggle Amazon Reviews dataset (1.4 GB) to showcase the UI capabilities:
- Launch the UI mode by passing the
-ui
flag along with a database name:
duckdb -ui amazon.db
This will launch a DuckDB interface in your browser (e.g., at http://localhost:4213/
).
- Create and populate a table from the CSV data:
CREATE table amazon_reviews AS SELECT * FROM read_csv('Downloads/train.csv');
- Enhance clarity by renaming columns:
ALTER TABLE amazon_reviews RENAME column0 TO polarity;
ALTER TABLE amazon_reviews RENAME column1 TO title;
ALTER TABLE amazon_reviews RENAME column2 TO text;
- Execute standard SQL queries:
SELECT * FROM amazon_reviews LIMIT 5;
Duck DB UI interface
Note: Steps 2 and 3 are one-time operations. After initial setup, simply executing step 1 will open the database with your formatted data intact.
Advantages of the Browser UI
- Built-in SQL autocompletion
- Ability to attach additional local databases (similar to CLI)
- Support for multiple notebooks with local persistence
- Visual column explorer for query result visualization
Features Awaiting Development
- SQL notebook export functionality
- Import capabilities for SQL notebooks created by other users
Final Thoughts: DuckDB is a powerful tool for analyzing large datasets, and its intuitive UI enables anyone with SQL knowledge to use it effectively.