Fastest way to string escape: ' -> '', \ -> \\

I’m running a SQL query of the form

INSERT INTO blah.table (FIELDS) VALUES ( '%s', '%s', '%s', '%s')

note the strings are required to be single quote delimited.

For this, I’d like to escape all strings with ’ → ‘’ and \ → \.

Question: what is the most efficient way to do this? Currently, the best I see is

string → char list → escape char list → reverse → back to string

I’d prefer something less convoluted, as it will be run on every field of every INSERT query.

I don’t understand exactly what you’re trying to do, but it looks like you’re building SQL queries with string interpolation? This practice is considered a security issue and can lead to SQL injection attacks. You can use a prepared statement instead. It will be more secure as well as more efficient. Oh also, you won’t need to worry about escaping characters manually. Prepared statements do that for you.

1 Like

Is prepared statements possible with HTTP Interface | ClickHouse Docs ?

EDIT: I also find this response somewhat condescending / patronizing. It makes a lot of assumptions about (1) our relative programming skills and (2) what I’m working on (i.e. internet facing API or internal quick data collection system). In particular, being new to OCaml does not imply being new to programming.

Lastly, it implies there is a bug / security flaw in doing ' -> '' and \ -> \\ without actually constructing an example where it breaks.

I would find the above argument much more convincing if it started with: here is a concrete xkcd: Exploits of a Mom string.

Hi,

you may want to use a Buffer.t rather than going through lists. This way you benefit from compact storage and append. Something along the lines of:

let buffer = Buffer.create 16
let escape s =
  Buffer.clear buffer;
  for i = 0 to String.length s - 1 do
    match s.[i] with
      ('\'' | '\\') as c -> Buffer.add_char buffer c; Buffer.add_char buffer c
    | c -> Buffer.add_char buffer c
  done;
  Buffer.contents buffer

(yes this uses a global variable to avoid allocating a buffer each time. If this is not ok you can also allocate buffer locally in escape). If characters smaller than 32 occur, you may want to escape them by their corresponding \xHH escape sequence.

It’s mentioned in the docs: HTTP Interface | ClickHouse Docs

In the absence of any other information, and seeing an attempt to create a query by interpolation, risking SQL injection, I made an assumption. No harm intended.

1 Like

What are you referring to by “It” ? Reasonable choices are:

  1. SQL injection attack

  2. “predefined http endpoints”, i.e.

$ echo '(4),(5),(6)' | curl 'http://localhost:8123/?query=INSERT%20INTO%20t%20VALUES' --data-binary @-
  1. or the section on predefined_query_handler (not sure what this is)

===

I could see how #2 is useful in that it separates the query from the data

@yawaramin

I rewrote the code using

and I have to say – you’re right, it does feel cleaner / safer, in that an escape would result only in failing the insert, and not in executing arbitrary SQL. Thanks for pointing it out.

1 Like