Sometimes standard DBMS functionality is not enough, and we seek to extend it. Most database systems, including MariaDB, support some method of creating User Defined Functions (UDFs), but what is a UDF?
UDF definition and examples
A UDF is simply a user created function with custom logic for use in an SQL statement. You have most likely already used one of the many
built-in functions already available like GROUP_CONCAT() for concatenating strings, a datetime function like DATESUB(), or even an aggregate function like SUM() as part of a GROUP BY.
In the following example we use CONCAT() to merge the first and last names to be a single space separated string, of all users who created accounts before today’s date:
SELECT CONCAT(firstname, " ", lastname)
FROM users
WHERE creation_date < CURDATE();
UDFs in MariaDB
Just like Stored Procedures, UDFs can be created using SQL in MariaDB, which is very convenient and a great way to being prototyping custom logic for your application or business requirements.
Using C/C++ for UDFs in MariaDB
If however you need a more performance on computationally heavy numeric outputs, then MariaDB also has the ability to create a
UDF using C/C++.
The end result of writing, building, and testing a C/C++ UDF is a shared object binary. The binary file is copied to the MariaDB plugins directory which can then be imported for use.
For example, let’s say we have created a plugin called `my_func` which when run just returns the value 1, and compiled to `my_func.so`.
We can get the MariaDB plugin directory either from a config file or MariaDB itself, and copy the binary there:
cp my_func.so $(mariadb -NBe "select @@plugin_dir")/
Now from within MariaDB, `CREATE` the function and run it:
CREATE FUNCTION my_func RETURNS INT SONAME 'my_func.so';
SELECT my_func();
What if we need more?
So using UDFs we can create custom logic for our database. But what if we wanted to do more than that? What if we wanted to add more functionality to our functions and even add them to stored procedures to create very complex programs? The limitations of UDFs come with the problems of library linking or integration, does this extra UDF warrant adding dependencies to a database server? What if you are running MariaDB in Docker or Kubernetes? Are you ok with building your own images? What about thread safety?
Writing UDFs for MariaDB in Golang
Fortunately, all the above mentioned cases are common questions and problems which Go has already solved for us!
Go is quite unique in that it is a statically compiled language with a very extensive
standard library. This means all of this built in functionality is ready for us to exploit without having to manage any external dependencies. The output is statically linked and ultra portable. While I do think it is interesting to add HTTP and even SMTP requests directly into a UDF, my favourite feature by far, is Go’s error handling when used with existing data importing and exporting procedures. Using Go and the `os` and `gzip` libraries I have created an example of how to expand a common process; creating files with OUTFILE.
When creating an
OUTFILE, a process external to the database needs to ensure the file does not exist first, and after it has been created, handle it’s lifecycle appropriately like renaming, moving, compressing, and transporting the file elsewhere.
For security and safety reasons MariaDB does not allow you overwrite or delete a file that already exists, otherwise some rogue SQL could be write data into a system file owned by the server process and corrupt a database!
But wouldn’t it be really neat if we could just have a stored procedure which not would allow us to delete an old file, create and new one and maybe even compress it? We could even add some extra safety ourselves to ensure we don’t do something silly or dangerous. I think if we ensure all our operations only took place in /tmp/, we could mitigate a lot of potential danger.
In the
mariadb-udfs/utils repository, I have created an example of what this process might look like.
The extra functions deletefile and gzipfile will only work when the path prefix is /tmp/.
The function `bytesfree` returns the amount of free bytes available on the specified volume path. A value of -1 means the path is unreachable, this can be very useful to ascertain before starting a long running query.
fileexists will return a 0 if a file is found at the specified path, a value of 1 means that path is unreachable or already exists.
The pseudo logic for this process is very simple:
- Check there is enough free space.
- Delete any files from a previous run.
- Run the SQL to create an OUTFILE.
- Compress the file
- Write process steps to a table.
Anyone now maintaining a stored procedure using these functions have fewer places to look for problems or even expand the process.
Why not add a step 6? A new function, file_put, which uses an HTTP put request to upload the export to S3?
Conclusion
UDFs are a great way of extending a databases utility within your application stack, and using Go adds so much functionality without adding any extra system dependencies.
Richard Bensley
0 Comments