September 10th 2010 | David Cooksey
Lexing and Parsing with F# – Part 1
FsLex and FsYacc are F# implementations of Ocaml’s Lex and Yacc. They are part of the F# Powerpack released for Visual Studio 2010. Used together, they take an input string and create a parse tree of typed objects. Once you have the parse tree, you can do anything with it—generate code for another language, interpret it directly, etc. If you’d like to jump right into the code, scroll to the bottom of the post. Note that the code includes a small test project using FsUnit.
FsLex is the lexer part of the lexer-parser pair. It converts an input string or stream into a series of tokens. A token is simply a string labeled so that the parser knows how to handle it. For example, ‘92’ might be within a token labeled DIGIT. Simply put, the lexer’s job is to take a meaningless string, label the parts of it relevant to the language, and hand the labeled parts, in order, to the parser.
FsYacc is the parser component. The parser’s job is to impose syntactical rules onto the stream of typed tokens generated by the lexer. In an analogy to spoken languages, the tokenizer recognizes words and the parser recognizes correct sentence structures.
Let’s start with a very simple syntax that requires a valid program to be a single assignment of an integer or string to a variable. Working with Lex and Yacc primarily requires altering three files. Lexer.fsl defines the allowed tokens (the lexer component) and Parser.fsy defines the allowed syntax (the parsing component). The third file (which we will call Statements.cs) defines the types that we will use to generate the parse tree.
In it we define allowed types for Stmt and expr. Stmt can be only a single type, called Assign, which is a tuple consisting of a string and an expr. Its dependent type, expr, can be a string (typed as Val) or an Int32 typed as Int.
Note that this single file gives a very concise picture of what kind of language we are parsing. Assignments of strings and integers are allowed, nothing else.
Next comes the lexer.
Take note of a few things here. The ‘open Parser’ statement on line 5 indicates that the lexer is dependent on the parser. This is because the tokens (ID, INT, EQUAL, etc.) used on lines 18-23 are defined in the Parser file. The syntax used for matching tokens is similar to regular expressions. Line 18 ignores whitespace characters (defined above as space, tab, carriage return, and newline) by calling tokenize on the next character in the buffer. The ID and INT tokens can contain multiple characters. Token patterns can be defined above (like digit, equal, semicolon, …) or directly within the parse rules (such as the allowed pattern for ID). Any character in the input stream that does not match one of these patterns will generate an error.
Last, the parser.
Here, it all comes together. Lines 7 and 8 indicate that the program starts with start: which is of type Stmt as defined in the Statements file. Line 16 declares the program must consist of a single StmtExp, which expands according to the remainder of the rules. Rules are defined as an allowed pattern of tokens (defined above using %token ) and the types that they generate. The $1, $2, $3 references inside the types indicate the 1-based index position within the tokens to the left. For example, the $2 on line 19 refers to the contents of the ID token. Let’s expand the syntax above for a specific example.
StmtExp -> ID EQUAL Expr SEMICOLON -> ID EQUAL INT SEMICOLON -> varname = 5;
Which becomes Assign(“varname”, Int(5))
StmtExp -> ID EQUAL Expr SEMICOLON -> ID EQUAL QUOTE ID QUOTE SEMICOLON -> varname = “string”;
Which becomes Assign(“varname”, Val(“string”))
We’ll see more complicated expansions, along with a bit more of the theory, in part 2.
One interesting expansion of the lexer/parser would be to adjust it to handle typed variable definitions.
Fslex and FsYacc Info: http://en.wikibooks.org/wiki/F_Sharp_Programming/Lexing_and_Parsing
February 11th | 2009
FsUnit – Test F# with F#
I recently spent some time putting together an F# lab with a series of unit tests designed to help teach the language. I started out with a set of Nunit tests in C#, but kept running head-first into type problems with the method signatures and return types. Another hiccup was that my VS 2008 C# code couldn’t see my F# code until I referenced the generated F# dll directly via Browsing. Adding a simple project reference didn’t work.
For example, here’s a simple F# function that returns the square of the input for any member with (*) defined and a C# assert based on it.
module SampleOne =
let inline Square (x : ^x) = x * x;
Seems simple enough, right? Except for the build error. “The Type arguments for method ‘Samples.SampleOne.Square<x,a>(x)’ cannot be inferred from the usage. Try specifying the type arguments explicitly.” What the compiler is telling us here is that we’re calling a generic function without specifying the types.
To make the compiler happy, we have to do the following instead:
Assert.AreEqual(9, Samples.SampleOne.Square<int, int>(3));
Elegant? No, not really. The syntactic magic that F# uses to allow us to infer types is not available within C#.
Here’s a simple F# function that calls ToString on all members of a list and returns it as a new list.
let ListToString x = List.map (fun x → x.ToString()) x;
Its signature in F# looks like: val WorkOnList : (`a → string list)
In C# … Microsoft.FSharp.Core.FSharpFunc<Microsoft.FSharp.Collections.FSharpList<b>, Microsoft.FSharp.Collections.FSharpList<string>> SampleOne.WorkOnList<a,b>(a x)
Confusing, at best. Luckily there’s a testing framework that avoids these issues and is designed for F#.
FsUnit to the Rescue
FsUnit is a library that defines a simple, easy-to-use testing syntax within F#. It currently works with NUnit, but is intended to work with MbUnit, xUnit, and MsTest in the future. It’s a breeze to set up, simply reference nunit.framework, FsUnit.NUnit, and include the FsUnit F# script that sets up the syntax.
Using FsUnit, we can describe our functions and write a test for them as follows:
#light namespace Tests open NUnit.Framework open FsUnit module module1 = let inline public Square x = x * x; [<TestFixture>] type ``Sample Tests`` ()= let WorkOnList x = List.map (fun (x) -> x.ToString()) x; let lame x = List.head x; [<Test>] member test. ``Test One`` ()= Square 5 |> should equal 25; Square 0 |> should equal 0; Square 1.5 |> should equal 2.25; WorkOnList [1;2;3;4;5] |> should equal ["1";"2";"3";"4";"5"]
Much more elegant and concise than the C# mess above. Spaces in test names are allowed—a feature I love. I’ve written far too many tests along the lines of ShouldCorrectlyDoOperationWhenValueAandValueBDoNotExceedValueC.
Languages typically use spaces to separate words instead of capitalization for a reason.
Keep in mind that this test syntax does not cause a compile time error if types do not match. For example, a function defined as let x = 0; does not cause a compile error when tested with x |> should equal “horse”.
In conclusion, I recommend giving FsUnit a try if you intend to test F# code. Test in the language you wrote the code in!