PRQL (Pipelined Relational Query Language, pronounced “Prequel”) is a modern language for transforming data, can be compiled to SQL.
This package provides a simple function to convert a PRQL query string to a SQL string.
For example, this is a PRQL query.
And, this is the SQL query that is compiled from the PRQL query.
To compile a PRQL string, just pass the query string to the
prql_compile()
function, like this.
library(prqlr)
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
" |>
prql_compile() |>
cat()
#> SELECT
#> cyl,
#> mpg,
#> ROUND(mpg, 0) AS mpg_int
#> FROM
#> mtcars
#> WHERE
#> cyl > 6
#>
#> -- Generated by PRQL compiler version:0.13.0 (https://prql-lang.org)
This output SQL query string can be used with already existing great packages that manipulate data with SQL.
Using it with the {DBI}
package, we can execute PRQL
queries against the database.
library(DBI)
# Create an ephemeral in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Create a table inclueds `mtcars` data
dbWriteTable(con, "mtcars", mtcars)
# Execute a PRQL query
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
prql_compile("sql.sqlite") |>
dbGetQuery(con, statement = _)
#> cyl mpg mpg_int
#> 1 8 18.7 19
#> 2 8 14.3 14
#> 3 8 16.4 16
We can also use the sqldf::sqldf()
function to
automatically register Data Frames to the database.
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
prql_compile("sql.sqlite") |>
sqldf::sqldf()
#> cyl mpg mpg_int
#> 1 8 18.7 19
#> 2 8 14.3 14
#> 3 8 16.4 16
Since SQLite is used here via {RSQLite}
, the
target
option of prql_compile()
is set to
"sql.sqlite"
.
Available target names can be found with the
prql_get_targets()
function.
Using {prqlr}
with the {tidyquery}
package,
we can execute PRQL queries against R Data Frames via
{dplyr}
.
{dplyr}
is a very popular R package for manipulating
Data Frames, and the PRQL syntax is very similar to the
{dplyr}
syntax.
Let’s run a query that aggregates a Data Frame flights
,
contained in the {nycflights13}
package.
library(tidyquery)
library(nycflights13)
"
from flights
filter (distance | in 200..300)
filter air_time != null
group {origin, dest} (
aggregate {
num_flts = count this,
avg_delay = (average arr_delay | math.round 0)
}
)
sort {-origin, avg_delay}
take 2
" |>
prql_compile() |>
query()
#> # A tibble: 2 × 4
#> origin dest num_flts avg_delay
#> <chr> <chr> <int> <dbl>
#> 1 LGA BUF 122 -2
#> 2 LGA PWM 273 2
This query can be written with {dplyr}
’s functions as
follows.
library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
flights |>
filter(
distance |> between(200, 300),
!is.na(air_time)
) |>
group_by(origin, dest) |>
summarise(
num_flts = n(),
avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0),
.groups = "drop"
) |>
arrange(desc(origin), avg_delay) |>
head(2)
#> # A tibble: 2 × 4
#> origin dest num_flts avg_delay
#> <chr> <chr> <int> <dbl>
#> 1 LGA BUF 122 -2
#> 2 LGA PWM 273 2
Note that {dplyr}
queries can be generated by the
tidyquery::show_dplyr()
function!