Can't get my semicolons right, I guess. Noob question-alert

nickm · June 3, 2024, 5:28pm

Hi folks!

This is my first posting. So a short introduction of myself:
I’ve be programming since 1977. Used many languages, but mainly Pascal, Modula-2, C++ and C (about in that sequence). Other languages were Fortran, assembler, obj-C, NewtonScript and Tcl/Tk (still active for quick tools).
My last job was embedded hard- and software-development in C.
So last year, I decided to try something completely new and ended at OCaml. Gave it up (other reasons) and started again a few months ago. My main reference is R. Clarcson’s book and his videos.

Because I learn best with self-assigned tasks, I wrote an XML-parser (that works, with limitations. Not a full blown validating parser with DTD etc.).

Finally, here is what I’m stuck at since days:
Read the function as if a C-programmer wrote it. There is (should be) only a single point of return. And that is the last line.
I’ll provide the required records, so it should (!) compile.

type tagAttrRcrd = {
  attrib : string;
  value  : string;
}

type xmlElementRcrd = {
  tagStr   : string;
  tagType  : tagVariant;
  tagAttrs : tagAttrRcrd list;
  value    : string;
  children : xmlElementRcrd list;
}

let rec xmlDump ?(indent = 0) ?(inStr = "") element : string =
  let buff = Buffer.create 100 in
    bprintf buff "%s%s%s" inStr (String.make indent ' ') element.tagStr;
  
    if element.tagAttrs <> [] then (
      List.iter (fun attr -> (bprintf buff " %s='%s'" attr.attrib attr.value)) element.tagAttrs
    );
    bprintf buff " \"%s\"\n" element.value;

    if element.children <> [] then (
      List.iter (bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff))) element.children
    );

  Buffer.contents buff (* return *)

Compile fails at the second List.iter. So I assume, that xmlDump does not return the string that I want. But … I’m completely lost.
I also read Let* syntax, semicolon and chaining units at least two times.

Any help is welcome. Feel free to critisize me, I know I still have to learn at lot!

Thanks,
Nick

Leonidas · June 3, 2024, 5:43pm

The code does not compile as-is so I added

type tagVariant = unit
let bprintf = Printf.bprintf

At the beginning.

When you try to compile it you get

File "./tt.ml", line 27, characters 35-95:
27 |       List.iter (bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff))) element.children
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This expression has type xmlElementRcrd -> string
       but an expression was expected of type string
  Hint: This function application is partial,
  maybe some arguments are missing.

Which tells you that in the second List.iter you’re not passing in the element argument to xmlDump. If you add it (List.iter takes a function and iterates over the arguments) like so

      List.iter (fun e -> bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff) e)) element.children

then it compiles.

If I were to simplify I’d go for something like this:

type tagVariant = unit

type tagAttrRcrd = {
  attrib : string;
  value  : string;
}

type xmlElementRcrd = {
  tagStr   : string;
  tagType  : tagVariant;
  tagAttrs : tagAttrRcrd list;
  value    : string;
  children : xmlElementRcrd list;
}

let rec xmlDump ?(indent = 0) ?(inStr = "") element =
  let buff = Buffer.create 100 in
  Printf.bprintf buff "%s%s%s" inStr (String.make indent ' ') element.tagStr;
  
  List.iter (fun attr ->
    Printf.bprintf buff " %s='%s'" attr.attrib attr.value)
    element.tagAttrs;
  Printf.bprintf buff " \"%s\"\n" element.value;

  List.iter (fun child ->
    Printf.bprintf buff "%s" (xmlDump ~indent:(indent + 2) ~inStr:(Buffer.contents buff) child))
    element.children;

  Buffer.contents buff

(No need to check for emptiness of the lists, as List.iter will call the function n times for a list of length n, thus 0 times for an empty list)

However I do note that your naming is sort of unconventional. It’s not really common to use camelCase in OCaml in types and function names, despite the name and logo.

nickm · June 3, 2024, 6:03pm

Thanks a lot for your quick help!
And sorry for not including the

type tagVariant = XvTagEmpty | XvTagOpen | XvTagClose | XvNoTag (* <x/> | <x> | </x> | no tag found *)

So I was quite close, when I tried the second List.iter with a fun → child (but introduced new errors).
Oh well …

Yes, I know that my formatting style is unconventional for OCaml. That comes from my C-experience and that I never liked the K&R-style. I’ll work at that, also on my formatting that needs some adapting.

Leonidas · June 3, 2024, 6:08pm

If you want you can use ocamlformat. Since using ocamlformat I never need to think about formatting, I just write it in whatever way and let the program sort it out. Pretty much a game-changer, especially when working with deeply nested code.

nickm · June 3, 2024, 6:47pm

I’m aware of ocamlformat. But as is, I really don’t like it (the default setting that is). I need to investigate more time into it.
A line like this …

          {tagRcrd = {tagType = inTagWrapped.tagRcrd.tagType; tagStr = tagStr; tagAttrs = (assembleAttrList attrList [])}; rest = inTagWrapped.rest; failed = inTagWrapped.failed}

… would fill half of the screen heigt. I’ll find my way through it …

benjamin-thomas · June 3, 2024, 10:10pm

Welcome, hope you have fun learning OCaml

If semicolons still confuse you, you may want to take a look at this excellent book and more specifically this chapter:

https://johnwhitington.net/ocamlfromtheverybeginning/split17.html

I had a blast going through this book and I reckon you would enjoy it.

Leonidas · June 4, 2024, 9:21am

For me semicolons made the most sense when I started seeing them as a binary operator, a bit like + or &&:

val (;) : unit -> 'a -> 'a

Which means “take a value with type unit, discard it and return the value on the right hand of the expression”. The way OCaml handles semicolons is slightly more involved, since it allows them fairly generously without much of an effect but that intuition is a pretty good way to add semicolons where they are needed and omit them where they are not needed.

nickm · June 7, 2024, 6:23pm

“take a value with type unit, discard it and return the value on the right hand of the expression”.

This might be a bit misleading.
So,
e1; e2
returns the result of e2. The function terminates. Or understand the semicolon as sequencing-operator.
Does that mean that
if <expression = true> then
e1
does return? No and you can’t terminate e1 with an semicolon unless you add a begin / end.
But maybe I’m wrong here.

R_Huxton · June 7, 2024, 7:11pm

Hello from another old-timer I started around 1981 on an Acorn Atom.

I think the bit you are missing is that if isn’t a control structure - it is an expression.

if day_num > 5 then
  "Weekend :-)"
else
  "Work-day :-("

That evaluates to a string. You need both branches of the if and both need to have the same type (so the if always evaluates to the same type).

AFAIK the only way to do an “early return” from a function is by raising an exception.

edit: see Leonidas’ post below for a better discussion of “if” - I didn’t know you could skip the “else” if returning unit from the then branch.

Leonidas · June 7, 2024, 7:12pm

I think your confusion comes from the fact that OCaml tries to look like an imperative language by having constructs that look like imperative constructs but in fact aren’t working like the constructs work in imperative languages.

I think you’re focussing too much on “flow control” (“execute this code or execute that code”) and not on evaluation of values (“determine the value of this expression or the value of that expression”). Functions in OCaml don’t really “return”, they evaluate to a value (thus there is no multiple returns, a bit like a function f(x) = x * 2 in mathematics can’t have multiple returns, since it doesn’t return, it evaluates the result).

Even a function that just does something like print_endline evaluates to a value: () which is of type unit. This is different from C where void does not have a value. You can’t do void foo = ??? in C, but let foo : unit = () is a perfectly valid, albeit pointless piece of OCaml code.

if in OCaml is an expression, it always evaluates to a value. What makes this a bit confusing for someone who is used to see if as an control flow statement is that the else branch is optional and defaults to evaluating to ():

let result_of_if = if expression = true then e1 in
let result_of_if_else = if expression = true then e1 else () in
result_of_if = result_of_if_else

Thus you can also do

(if expression = true then e1); e2

which will evaluate e1 (which has to evaluate to a unit value), then discard that unit and evaluate e2.

begin and end are purely aliases for ( and ) and work as they do in math, they don’t do anything special. You can even do

let x = begin 1 + 2 end * 3 in
let x' = (1 + 2) * 3 in
x = x'

zapashcanon · June 12, 2024, 8:33am

You can’t write this… I guess you meant let result_of_if = if expression = true then e1 in ?

Leonidas · June 12, 2024, 9:30am

Ah yes my apologies, you’re right, it does need the then part indeed, I’ve edited the source code above.

deca3 · June 12, 2024, 9:55am

I know this is more friendly to talk with a human, but in case you would like a quick reply, chatgpt is often able to explain you this kind of things, in most cases. It can also suggests you some improvments, and is able to convert camelCase to snake_case. You can also ask him questions about a programming language that you want to learn, and it can help you to write documentation. If you are more comfortable with another programming language, I noticed that it’s able to convert code from a programming language to another one on rosettacode.org. It is not perfect, but it can probably be of some help for beginning with something.

hyphenrf · June 15, 2024, 8:52am

being reminded of the irony in this always makes me chuckle :)

nickm · June 16, 2024, 12:35pm

Thank you Leonidas, you explanation was very helpful!

In the meantime, I went over my code and cut off all the camel backs and now have a snake farm.
Thrown away a FSM that I wrote in a brute force style (consisting of almost only refs) and re-wrote it to be more OCaml-ish. I think I’ll post a code snippet here for comments (even picky ones) when I’m happy with it.
Or should I start a new topic? I think that’s better, as it no longer relates to the question.

Topic		Replies	Views
Please comment my code ... thanks! Learning	4	234	June 17, 2024
Syntax error OCaml Learning	12	7026	May 26, 2021
What is wrong with this situation? Learning	5	763	June 30, 2021
Code review: Simple template string parser Learning	27	3626	October 5, 2018
Formatter string provided by a function. How can I do that? Learning	1	224	June 7, 2024

Can't get my semicolons right, I guess. Noob question-alert

Related topics