Real World F# Programming Part 1: Structuring Your Solution
I have several “side project” apps I work on throughout the year. One of those is Newspaper23.com. I have a problem with spending too much time online. Newspaper23 is supposed to go to all the sites I might visit, pull down the headlines and a synopsis of the article text. No ads, no votes, no comments, no email sign-ups. Just a quick overview of what’s happening. If I want more I can click through. (Although I might remove external links at a later date)
Right now newspaper23.com pulls down the headlines from the sites I visit and attempts to get text from the articles. It’s not always successful — probably gets enough text for my purposes about 70% or so. There’s some complicated stuff going on. And it doesn’t get the full text or collapse it into a synopsis yet. That’s coming down the road. But it’s a start. It goes through about 1600 articles a day from about a hundred sites. People are always saying F# isn’t a language for production, and you have to have some kind of framework/toolset to do anything useful, but that’s not true. I thought I’d show you how you can do really useful things with a very small toolset.
I’m running F# on Mono. The front-end is HTML5 and Jquery. There is no back-end. Or rather, the back end is text files. Right now it’s mostly static and supports less than a thousand users, but I plan on making it interactive and being able to scale up past 100K users. Although I have a brutally-minimal development environment, I don’t see any need in changing the stack to scale up massively. Note that this app is part-time hobby where I code for a few days just a couple of times a year. I don’t see my overall involvement changing as the system scales either. Server cost is about 40 bucks a month.
I come from an OO background so all of this is directed at all you JAVA/.Net types. You know who you are :)
(Yes, there will be code in this series)
The game here is small, composable executables. FP and small functions mean less code. Less code means less bugs. Deploy that smaller amount of code in samlller chunks and that means less maintenance. Little stand-alone things are inherently scalable. Nobody wonders whether they can deploy the “directory function” on multiple servers. Or the FTP program. Big, intertwined things are not. Start playing around sometime with sharding databases and load balancers. Ask some folks at the local BigCorp if they can actually deploy their enterprise software on a new set of servers easily.
I’ve found that you’ll write 4 or 5 good functions and you’re done with a stand-alone executable. In OO world you’ll spend forever wiring things up and testing the crap out of stuff to write the same code spread out over 8 classes. Then, because you’ve already created those classes, you’ll have a natural starting point for whatever new functionality you want. Which is exactly the opposite direction FP takes you in. In OO, the more you try to do, the more structure you have, the more structure, the more places to put new code, the more new code, the more brittle your solution. (And I’m not talking buggy, I’m talking brittle. This is not a testing or architecture issue. Composable things have composable architectures. Static graphs do not. [Yes, you can get there in OO, but many have ventured down this path. Few have arrived.])
The O/S or shell joins it all together. That’s right, you’re writing batch files. Just like a monad handles what happens between the lines of code in an imperative program, the shell is going to handle what happens between the composable programs in your project. A program runs. It creates output. The shell runs it at the appropriate time, the shell moves that output to where it can be processed as input by the next program, the shell monitors that the executable ran correctly. There is no difference between programming at the O/S level and at the application level. You work equally and in the same fashion in both.
Ever delete your production directory? You will. Having it all scripted makes this a non-event. You make this mistake once. Then you never make it again. DevOps isn’t some fashionable add-on; it’s just the natural way to create solutions. Automate early, and continue automating rigorously as you go along. DRY applies at all levels up and down the stack.
Each file has its own script file. There are config files and cron hooks it all up. This means CPU load, data size, and problem difficulty are all separate issues. Some of you might be thinking, isn’t this just bad old batch days? Are we killing transactional databases? No. In a way, the “bad old days” never left us. We just had little atomic batches of 1 that took an indeterminate amount of time because they had to traverse various object graphs across various systems. We can still have batches of 1 that run immediately — or batches of 10 that run every minute — or bathces of n that run every 5 minutes. It’s just that we’re in control. We’re simply fine-tuning things.
Note that I’m not saying don’t have a framework, database, or transactions. I’m saying be sure that you need one before you add one in. Too often the overhead from our tools outweighs the limited value we get, especially in apps, startups, and hobby projects. If you’re going to have a coder or network guy tweaking this system 3 years from now, it’s a different kind of math than if you’re writing an app on spec to put in the app store. Start simple. Add only the minimum amount you need, and only when you need it.
One of the things you’re going to need that’s non-negotiable is a progressive logging system that’s tweaked from the command line. Basically just a prinfn. I ended up creating something with about 20 LOC that works fine. Batch mode? All you need to see is a start/stop/error summary. Command line debug? You might need to see everything the program is doing. Remember, the goal: as much as possible, you should never touch the code or recompile. Every time you open the code up is a failure. If you can make major changes to the way the system works from the shell, you’re doing it right. If you can’t, you’re not.
Many of you might thing that you’re turning your code into a mess of printf statements like the bad old days. But if that’s what you’re doing, you’re doing it wrong. Each program shouldn’t have that many flags or things to kick out. Remember: you’re only doing a very small thing. For most apps, I’d shoot for less than 300LOC, and no more than 10-20 code paths. All code paths can be logged off or on from command line. To make this workable in a command-line format without creating a “War and Peace” of option flags means cyclomatic complexity must be fairly low. Which also keeps bugs down. Natch.
Of course Windows and various OO programs and frameworks have logging features too, but they run counter to the way things are done here. Usually you have to plug into a logging subsystem, throw things out, then go to some other tool, read the messages, do some analysis, and so on. In linux with command-line stuff, the logging, the analysis, and changing the code to stop the problem all happen from the same place, the command line. There’s no context-switching. Remember, most errors should be just configuration issues, not coding issues.
One of the ways I judge whether I’m on-track or not is the degree of futzing around I have to do when something goes wrong. Can I, from a cold start (after weeks of non-programming), take a look at the problem, figure out what’s happening, and fix it — all within 10 minutes or so? I should be able to. And, with only a few exceptions, that’s the way it has been working.
The way I do this is that I have the system monitor itself and report back to me whether each program is running and producing the expected output. Every 5 minutes it creates a web page (which is just another output file) that tells me how things are going. In the FP world, each program is ran several different ways with different inputs. Common error conditions have been hammered out in development. So most times, it runs fine in all but one or two cases. So before I even start I can identify the small piece of code and the type of data causing problems. The biggest part of debugging is done by simply looking at a web page while I eat my corn flakes.
System tests run all the time, against everything. It’s not just test-first development. It’s test-all-the-time deployment. Fixing a bug could easily involve writing a test in F#, writing another test at the shell level, and writing a monitor that shows me if there’s ever a problem again. Whatever you do, don’t trade off monolithic OO apps for monolithic shell apps. FP skills and thinking are really critical up and down the toolchain here.
It’s interesting that there’s no distinction between app and shell programming. This is good for both worlds. Once we started creating silos for application deployment, we started losing our way.
Instead of rock-solid apps, failure is expected, and I’m not just talking about defensive programming. Pattern-matching should make you think about all paths through a program. Failure at the app level should fit seamlessly into the DevOps ecosystem, with logging, fallbacks, reporting, resillience. Can’t open a file? Can’t write to a file? Who cares? Life goes on. There are a thousand other files. Log it and keep rolling. It could be a transient error. 99% of the time we don’t work with all-or-nothing situations.
As much as possible, you should check for File System and other common errors before you even start the work. My pattern is to load the command line parameters, load the config file (if there is one), then check to make sure I have all the external pieces I need — the actual files exist, the directories are there, and so on. This is stuff I can check early, before the “real” code, so I do it. That way I’m not in the middle of a function to do something really complex and then forgetting to check whether or not I have an input file. By the time I get to the work, I know that all of my tools are in place and ready. This allows me to structure solutions much better.
For instance I woke up this morning and looked at the stats. Looks like pulling things from medium.com isn’t working — and hasn’t been working for several days. That’s fine. It’s an issue with the WebClient request headers. So what? I’ll fix it when I feel like it. Compare this to waking up this morning with a monolithic server app and realizing the entire thing doesn’t run because of intricate (and cognitively hidden) dependencies between functions that cause the failure of WebClient in one part to prevent the writing of new data for the entire app in another part. Just getting started with the debugging from a cold start could take hours.
Note that a lot of DevOps sounds overwhelming. The temptation is to stop and plan everything out. Wrong answer. It’s an elephant. Planning is great, but it’s critical that you eat a little bit of the elephant at a time. Never go big-bang, especially in this pattern, because you really don’t know the shape of the outcome at the DevOps level. [Although there may be some corporate patterns to follow. Insert long discussion here]
Next time I’ll talk about how the actual programming works: how code evolves over time, how to keep things simple, and how not to make dumb OO mistakes in your FP code.
September 8, 2014