Can't get my semicolons right, I guess. Noob question-alert

Hi folks!

This is my first posting. So a short introduction of myself:
I’ve be programming since 1977. Used many languages, but mainly Pascal, Modula-2, C++ and C (about in that sequence). Other languages were Fortran, assembler, obj-C, NewtonScript and Tcl/Tk (still active for quick tools).
My last job was embedded hard- and software-development in C.
So last year, I decided to try something completely new and ended at OCaml. Gave it up (other reasons) and started again a few months ago. My main reference is R. Clarcson’s book and his videos.

Because I learn best with self-assigned tasks, I wrote an XML-parser (that works, with limitations. Not a full blown validating parser with DTD etc.).

Finally, here is what I’m stuck at since days:
Read the function as if a C-programmer wrote it. There is (should be) only a single point of return. And that is the last line.
I’ll provide the required records, so it should (!) compile.

type tagAttrRcrd = {
  attrib : string;
  value  : string;
}

type xmlElementRcrd = {
  tagStr   : string;
  tagType  : tagVariant;
  tagAttrs : tagAttrRcrd list;
  value    : string;
  children : xmlElementRcrd list;
}

let rec xmlDump ?(indent = 0) ?(inStr = "") element : string =
  let buff = Buffer.create 100 in
    bprintf buff "%s%s%s" inStr (String.make indent ' ') element.tagStr;
  
    if element.tagAttrs <> [] then (
      List.iter (fun attr -> (bprintf buff " %s='%s'" attr.attrib attr.value)) element.tagAttrs
    );
    bprintf buff " \"%s\"\n" element.value;

    if element.children <> [] then (
      List.iter (bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff))) element.children
    );

  Buffer.contents buff (* return *)

Compile fails at the second List.iter. So I assume, that xmlDump does not return the string that I want. But … I’m completely lost.
I also read Let* syntax, semicolon and chaining units at least two times.

Any help is welcome. Feel free to critisize me, I know I still have to learn at lot!

Thanks,
Nick

The code does not compile as-is so I added

type tagVariant = unit
let bprintf = Printf.bprintf

At the beginning.

When you try to compile it you get

File "./tt.ml", line 27, characters 35-95:
27 |       List.iter (bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff))) element.children
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This expression has type xmlElementRcrd -> string
       but an expression was expected of type string
  Hint: This function application is partial,
  maybe some arguments are missing.

Which tells you that in the second List.iter you’re not passing in the element argument to xmlDump. If you add it (List.iter takes a function and iterates over the arguments) like so

      List.iter (fun e -> bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff) e)) element.children

then it compiles.

If I were to simplify I’d go for something like this:

type tagVariant = unit

type tagAttrRcrd = {
  attrib : string;
  value  : string;
}

type xmlElementRcrd = {
  tagStr   : string;
  tagType  : tagVariant;
  tagAttrs : tagAttrRcrd list;
  value    : string;
  children : xmlElementRcrd list;
}

let rec xmlDump ?(indent = 0) ?(inStr = "") element =
  let buff = Buffer.create 100 in
  Printf.bprintf buff "%s%s%s" inStr (String.make indent ' ') element.tagStr;
  
  List.iter (fun attr ->
    Printf.bprintf buff " %s='%s'" attr.attrib attr.value)
    element.tagAttrs;
  Printf.bprintf buff " \"%s\"\n" element.value;

  List.iter (fun child ->
    Printf.bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff) child))
    element.children;

  Buffer.contents buff

(No need to check for emptiness of the lists, as List.iter will call the function n times for a list of length n, thus 0 times for an empty list)

However I do note that your naming is sort of unconventional. It’s not really common to use camelCase in OCaml in types and function names, despite the name and logo.

3 Likes

Thanks a lot for your quick help!
And sorry for not including the

type tagVariant = XvTagEmpty | XvTagOpen | XvTagClose | XvNoTag (* <x/> | <x> | </x> | no tag found *)

So I was quite close, when I tried the second List.iter with a fun → child (but introduced new errors).
Oh well …

Yes, I know that my formatting style is unconventional for OCaml. That comes from my C-experience and that I never liked the K&R-style. I’ll work at that, also on my formatting that needs some adapting.

If you want you can use ocamlformat. Since using ocamlformat I never need to think about formatting, I just write it in whatever way and let the program sort it out. Pretty much a game-changer, especially when working with deeply nested code.

I’m aware of ocamlformat. But as is, I really don’t like it (the default setting that is). I need to investigate more time into it.
A line like this …

          {tagRcrd = {tagType = inTagWrapped.tagRcrd.tagType; tagStr = tagStr; tagAttrs = (assembleAttrList attrList [])}; rest = inTagWrapped.rest; failed = inTagWrapped.failed}

… would fill half of the screen heigt. I’ll find my way through it …

1 Like

Welcome, hope you have fun learning OCaml :slight_smile:

If semicolons still confuse you, you may want to take a look at this excellent book and more specifically this chapter:

https://johnwhitington.net/ocamlfromtheverybeginning/split17.html

I had a blast going through this book and I reckon you would enjoy it.

For me semicolons made the most sense when I started seeing them as a binary operator, a bit like + or &&:

val (;) : unit -> 'a -> 'a

Which means “take a value with type unit, discard it and return the value on the right hand of the expression”. The way OCaml handles semicolons is slightly more involved, since it allows them fairly generously without much of an effect but that intuition is a pretty good way to add semicolons where they are needed and omit them where they are not needed.

1 Like

“take a value with type unit, discard it and return the value on the right hand of the expression”.

This might be a bit misleading.
So,
e1; e2
returns the result of e2. The function terminates. Or understand the semicolon as sequencing-operator.
Does that mean that
if <expression = true> then
e1
does return? No and you can’t terminate e1 with an semicolon unless you add a begin / end.
But maybe I’m wrong here.

Hello from another old-timer :slight_smile: I started around 1981 on an Acorn Atom.

I think the bit you are missing is that if isn’t a control structure - it is an expression.

if day_num > 5 then
  "Weekend :-)"
else
  "Work-day :-("

That evaluates to a string. You need both branches of the if and both need to have the same type (so the if always evaluates to the same type).

AFAIK the only way to do an “early return” from a function is by raising an exception.


edit: see Leonidas’ post below for a better discussion of “if” - I didn’t know you could skip the “else” if returning unit from the then branch.

I think your confusion comes from the fact that OCaml tries to look like an imperative language by having constructs that look like imperative constructs but in fact aren’t working like the constructs work in imperative languages.

I think you’re focussing too much on “flow control” (“execute this code or execute that code”) and not on evaluation of values (“determine the value of this expression or the value of that expression”). Functions in OCaml don’t really “return”, they evaluate to a value (thus there is no multiple returns, a bit like a function f(x) = x * 2 in mathematics can’t have multiple returns, since it doesn’t return, it evaluates the result).

Even a function that just does something like print_endline evaluates to a value: () which is of type unit. This is different from C where void does not have a value. You can’t do void foo = ??? in C, but let foo : unit = () is a perfectly valid, albeit pointless piece of OCaml code.

if in OCaml is an expression, it always evaluates to a value. What makes this a bit confusing for someone who is used to see if as an control flow statement is that the else branch is optional and defaults to evaluating to ():

let result_of_if = if expression = true then e1 in
let result_of_if_else = if expression = true then e1 else () in
result_of_if = result_of_if_else

Thus you can also do

(if expression = true then e1); e2

which will evaluate e1 (which has to evaluate to a unit value), then discard that unit and evaluate e2.

begin and end are purely aliases for ( and ) and work as they do in math, they don’t do anything special. You can even do

let x = begin 1 + 2 end * 3 in
let x' = (1 + 2) * 3 in
x = x'
3 Likes

You can’t write this… I guess you meant let result_of_if = if expression = true then e1 in ?

Ah yes my apologies, you’re right, it does need the then part indeed, I’ve edited the source code above.

I know this is more friendly to talk with a human, but in case you would like a quick reply, chatgpt is often able to explain you this kind of things, in most cases. It can also suggests you some improvments, and is able to convert camelCase to snake_case. You can also ask him questions about a programming language that you want to learn, and it can help you to write documentation. If you are more comfortable with another programming language, I noticed that it’s able to convert code from a programming language to another one on rosettacode.org. It is not perfect, but it can probably be of some help for beginning with something.

being reminded of the irony in this always makes me chuckle :)

Thank you Leonidas, you explanation was very helpful!

In the meantime, I went over my code and cut off all the camel backs and now have a snake farm.
Thrown away a FSM that I wrote in a brute force style (consisting of almost only refs) and re-wrote it to be more OCaml-ish. I think I’ll post a code snippet here for comments (even picky ones) when I’m happy with it.
Or should I start a new topic? I think that’s better, as it no longer relates to the question.

1 Like