Lately, I've been using DuckDB to quickly analyze data in a Google Sheet.
For example, I stumbled upon this sheet collecting salary data for Data Engineers from the /r/dataengineering subreddit and was curious about some averages.
Traditionally, I would have had to export the data as a CSV to my machine and load it into some database or data warehouse where I could run some queries over it, but with DuckDB, this is so much easier!
All I have to do is run:
FROM read_csv_auto('https://docs.google.com/spreadsheets/export?format=csv&id=1GuEPkwjdICgJ31Ji3iUoarirZNDbPxQj_kf7fd4h4Ro', normalize_names=True);
The data is immediately available for number crunching! Let's say I want to know the average salary for Data Engineers in Texas.
round(avg(try_cast(base_salary as int))) AS avg_salary
FROM de_salaries -- Saved results of the previous query as a table
WHERE currency = 'USD'
AND state = 'Texas'
GROUP BY current_job_title
ORDER BY avg_salary DESC;
And I didn't even have to leave the terminal.