Introduction

DuckDB is an open-source analytical database system designed specifically for data wrangling — the process of transforming raw data into a usable format for analysis through discovery, cleaning, enrichment, and validation.

DuckDB provides comprehensive library support for Python, R, Java, and WebAssembly (WASM). In this article, we’ll focus on the standard command-line interface (CLI) application and explore its innovative browser-based user interface.

DuckDB provides robust support for importing various data formats including CSV, Excel, Parquet, and remote file protocols (e.g., S3 buckets in AWS), making it highly versatile for data integration tasks.


Installation

DuckDB CLI can be installed across multiple platforms using the following methods:

Linux / macOS

curl https://install.duckdb.org | sh

Windows (x86-64)

# Download from the official GitHub repository
https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-windows-amd64.zip

Windows (ARM64)

# Download from the official GitHub repository
https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-windows-arm64.zip

DuckDB CLI

DuckDB can be launched in memory mode by executing the following command:

duckdb

Note: In memory mode, all data and modifications are transient and will be lost upon exiting the session.

For persistent data storage, specify a database file path. The following example creates or opens a database named test.db:

duckdb test.db

Data Import Operations To import data into a persistent database (from within the DuckDB CLI):

CSV File Import

Important: Remember to terminate all SQL statements with a semicolon.

CREATE TABLE input_csv AS SELECT * FROM read_csv('input.csv');

Excel File Import

CREATE TABLE input_excel AS SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');

For additional file formats and import options, refer to the DuckDB Documentation - File Formats.

Tip: To overwrite existing tables, use the CREATE OR REPLACE TABLE syntax:

CREATE OR REPLACE TABLE input_csv AS SELECT * FROM read_csv('input.csv');

Once data is persisted in the local database, you can query it using standard SQL. Here’s a sample query result from a Kaggle dataset:

SELECT * FROM input_csv LIMIT 5;
┌────────────┬────────┬────────┬────────┬────────┬────────┬─────────┐
│    Date    │  Open  │  High  │  Low   │ Close  │ Volume │ OpenInt │
│    date    │ double │ double │ double │ double │ int64  │  int64  │
├────────────┼────────┼────────┼────────┼────────┼────────┼─────────┤
│ 2012-12-13 │   15.0 │   15.0 │   15.0 │   15.0 │    1000│ 2012-12-14 │   19.0 │   30.0 │   19.0 │   30.0 │ 1446000│ 2012-12-17 │   31.5 │   65.0 │   31.5 │   65.0 │  686000│ 2012-12-18 │   65.0 │   89.0 │   65.0 │   80.0 │  436000│ 2012-12-19 │   80.0 │   84.0 │   78.0 │   84.0 │  240000└────────────┴────────┴────────┴────────┴────────┴────────┴─────────┘

DuckDB is optimized for analytical query workloads (Online Analytical Processing or OLAP) and excels at processing large datasets that would typically overwhelm spreadsheet applications like Excel.

Essential CLI Commands

Several dot commands are available within the DuckDB CLI environment:

.exit  # Exits from CLI mode
.table # Lists all tables in the local database
.help  # Displays available help options
.open  # Opens a database instance

For a comprehensive list of available dot commands, refer to the DuckDB Documentation - Dot Commands.


DuckDB Browser UI

For those who prefer graphical interfaces, DuckDB version 1.2.1 introduced a -ui flag that launches a Jupyter-like notebook interface in the browser, offering a local SQL notebook experience.

In the following demonstration, we’ll use the Kaggle Amazon Reviews dataset (1.4 GB) to showcase the UI capabilities:

  1. Launch the UI mode by passing the -ui flag along with a database name:
duckdb -ui amazon.db

This will launch a DuckDB interface in your browser (e.g., at http://localhost:4213/).

  1. Create and populate a table from the CSV data:
CREATE table amazon_reviews AS SELECT * FROM read_csv('Downloads/train.csv');
  1. Enhance clarity by renaming columns:
ALTER TABLE amazon_reviews RENAME column0 TO polarity;
ALTER TABLE amazon_reviews RENAME column1 TO title;
ALTER TABLE amazon_reviews RENAME column2 TO text;
  1. Execute standard SQL queries:
SELECT * FROM amazon_reviews LIMIT 5;

Duck DB UI interface

Image Description

Note: Steps 2 and 3 are one-time operations. After initial setup, simply executing step 1 will open the database with your formatted data intact.


Advantages of the Browser UI

  • Built-in SQL autocompletion
  • Ability to attach additional local databases (similar to CLI)
  • Support for multiple notebooks with local persistence
  • Visual column explorer for query result visualization

Features Awaiting Development

  • SQL notebook export functionality
  • Import capabilities for SQL notebooks created by other users

Final Thoughts: DuckDB is a powerful tool for analyzing large datasets, and its intuitive UI enables anyone with SQL knowledge to use it effectively.