Sunday, August 16, 2009

Software Complexity in Practice

I had a minor epiphany last week, but have been having too much to do until now to blog about it. It revolved around the now quite common (for those who have made the effort to learn a second (or third) language properly) question:

Why does x takes longer to implement in Java than in my-scripting-language-of-choice?

And what goes for implementation also goes for maintaince, et.c.

Having had the opportunity (..) for the last two months (OK, I had a four week vacation in there somwhere :) to work with a system that was based on using Maven for everything server-side, it struck me how dangerous it is to use compiling languages at all. Listen to this;

The services ( that I am working with, in Java) are generic to a fault, and use Spring applicationContext.xml files to hold specific information such as hostnames, DAOs and a host of other stuff. Now, I have nothing in principle against using config files for properties, but when the property files start to contain logic and 'mixins' for the rest of the program, things start to get fishy.

By 'fishy' I mean hard to maintain.

So we have the following;

1) Java code which is quite generic _and_ depentent on Spring to work.
2) an applicationContext.xml file which partly describe which classes to compile in (to the Java program)
3) A maven pom.xml file which perform magic on the applciationContext.xml file (and other things) to force things like development and test environments for the service in question.

The basic premis is to be able to move the code and use it somwhere else, and just change config files. The basic problem is that the config files are (as always) not documented, even though they are now completely logically intermeshed with the code they are acting upon.

This means that if you are making a very minor change to a part of a service, the risk is high that you need to understand how the configurations files interact with your new code. This is not very modular.

Also, I suspect that one of the reasons one want to do this is to be able to use the oh-so-smart dependency injection to be able to hot-wire in new behavior in the Java classes, even after (gasp) they are compiled. This behavior being defined in various configuration files, naturally.

Duck typed languages doesn't need dependency injection (magically replacing or inserting chunks of actual logic on the fly) because they alrady have that functionality.

This means that that the Java community (and others..) have taken 5-7 years to work around a big problem and finally (some years ago) had a great system for dependency injection. The reason for this workaround is that Java is a compiled language. The compiled class-files were _supposed_ to be static and non-dynamic. It's the whole point. Errors and general no-nos were supposed to be caught at compile time. It's that kind of language.

So, many years later one finally have a solution to all this, and it's not even needed if you're using a Python, Ruby or JavaScript (et.c.) since it's an inherent part of the language. If you want to add a new functon to an object, at any time, on a lark, you just write;

foo.newfunc = function(a,b){ .....}

Now, I could have put the logic for that in a configuration file, but then I would have to have some kind of framework which read that file, which would make my work as a programmer more complex. And it's here that my little epiphany comes back into the story, namely;

1) For every extra configuration-file that link out from the file that contain the code, complexity increases dramatically.

What I mean by 'link out from' I mean that the code might be dependent on configuration file x, which in turn might be dependent on configuration file y (See maven -> Spring -> Code example above). Also 'dramatically' is not a very specific number, but since there are next to no research in this area it's all I can do to argue that all logic you can keep inside your code files lead to less complexity, even though it might lead to less flexibility.

OK, hold the horses, you might say now, all this Java-bashing is making me dizzy (thanks, that was my intention :); isn't the whole point with Java (et.c.) that the additional security of exception-checking, static typing and public/private/protected access guard you from making a lot of horrible mistakes? If we would using these flimsy little scripting languages, we would all be lost!

Well, here's another rub of mine: Static typing is supposed to help me make less mistakes. I have coded in JavaScript almost exclusively for the last three years (and almost exclusively in Java four years before that), and in all the JavaScript code I have written, not once have I sat down and cried silently, wishing for someone to hand me enforced static typing for JavaScript. It has not bogged me down, it has not made me confused, in reality it has made me more productive.

How is that? I have a creeping feeling that these 'safety features' are more good in formal proving of things than in actual engineering of stuff.

For example, I was able to quickly write a (qute shabby, but working) REST service implementation in only a page or two of (server-side) JavaScript code. The main reason for that is IMO that I didn't need to jump through hoops either creating complex class-hierarchies for the data - or- casting myself to high heaven and back - or - generating miles of try/catch code which I knew was not really needed. Sometimes the code bombed, I checked the logs, added a try/catch, and later removed it when I had fixed the problem.

So, as Steve Yegge correctly has pointed out at great length, javaScript (etc) lead to dramatically reduced Lines of Code in a program. Mostly due to the non-existence of enforced 'security'. And here is a more specific thing to chew on;

1) All security features of a programming language does not confer the same amount of benefits.

This ties back to my unspecific 'dramatical' level of complexity above. There simply does not exist any research in this area, which is sorely needed.

My point here is that for a given security feature of a langauge, you receive a certain level of benefits, but it also costs you an amount of flexibility. These are all realtive values for now. But from my personal experience, the following has become more and more glaringly obvious;

1) The lack of a security feature from a language gives more benefits from increased flexibility than the secutiry feature gave when present.

So if I use a language which already have duck-typing, I don't need dependency injection, and I don't need to manage that injection. Moreover, if I need to manage the duck-typing in any way, I might as well do it in the scripting language itself, since there is no compilation step, thus reducing complexity from linked configuration files. I can keep all logic in the code.

All comments welcome :)

Cheers,
PS

14 comments:

Henk said...

Maybe you have to take a look at Groovy and Grails. In my opinion it will fit you. It takes a lot of "java complexity" away.

Peter Svensson said...

@Henk. Absolutely. Groovy is a par with Ruby, Python, JavaScript, et.c. I hoped to get Groovy in in my generic comments about scripting languages, sorry :)

And Grails is really easy to get going with too, I'd recommend that as well, as long as you don't do any server-side templating.

Cheers,
PS

plosson said...

I couldn't agree more ... And I have developing in java for more than 6 years now. The only reason I see for this is that java has been and is mainly used in the "enterprise" or corporate world which suffers two factors : incompetence & Job protection.

This is the main reason I see at inventing so many complex architectures (J2EE), api's and software. If it's complex, you need lots of people and these people are more likely to stay in place since they created the "beast". And usually these teams have one or two "guru" (so called) and a bunch of junior which are happy to copy the guru in his complex thinking they are learning something. And one day the juniors become seniors and the story goes on.

complexity, configuration file explosion etc.. Is really not intrinsic to any language but only to the people using it.

Peter Svensson said...

@plosson: I agree that it is also a matter of mentality of the developers, but what I wanted to point out was that many of these frameworks and architectures are created out of an honest need to make the day-to-day work less complex.

More specifically I drive the these that the frameworks are created to work around the deficiencies of compiled languages with static typing.

If they had been using another kind of language, they would have had less need for these super-complex frameworks.

Cheers,
PS

plosson said...

Indeed :-) But in my opinion, you take the same bunch of people I mention here above, lock them up in a room with a Linux box and a python interpreter, they would reinvent static typing a recreate struts or J2ee in pyhton !

:-)

justin meyer said...

Couldn't agree more. I've been trying to covince my old team at Accenture to research this exact point. Apparently ACN writes more code than IBM and Msft combined.

Bertrand Delacretaz said...

I think Java can be a great host for scripting languages.

In Apache Sling we're using java code for the infrastructure parts where we need modularity, automated tests, stability and powerful debugging tools, and to take advantage of lots of good quality libraries that are available for Java.

On top of that, we allow (pretty much) any scripting language to be used to write the application-level code, so that application developers don't have to bother with Java.

I think the combination works very well. Writing (and above all testing) a system like Sling all in javascript and getting the same level of stability would have been harder, IMHO.

Peter Svensson said...

@Bertrand Delacretaz: Well, I would argue that if you had a team with just as much experience in Java as in (for example) Server-side JavaScript, and would have chosen the SSJS route, the codebase would have been much smaller and less complex, which (if my experience can translate into a general theory) would outweigh the apparent importance of static typing in tests.

Cheers,
PS

Bertrand Delacretaz said...

@Peter you're right, such decisions obviously depend on the "shape" and experience of the team. We have some great Java guys over here, so it works for us but I see your point!

Zoom said...

I don't want to refute your point about dynamically typed languages are better than statically typed languages. To each his own :-) However, the situation with configuration files galore is rather common. I think there is a choice you need to make from three options.

Like this; assume application A, it executes in an environment E. Now, A needs to use services in E. Usually A does so by knowing something about E, names or URLs or what have you.
Assuming that you somehow package A into some deployable unit. A file or something. This deployable unit needs to contain information about E. So how do you do it when deploying to another environment, say T for testing.
There are three options that I know of:

* Hard code knowledge about both E and T into A, so you get deployable unit A(t&e).

* Package two deployable units. A(e) and A(t). So you use the build system to encode environment information into A.

* Let E handle it: so A expresses that it needs a resource with a symbolic name R. Then we encode into E what the actual value of R is. So we get the other way around E(a) and T(a) - environments tailored to run A.

THe JEE specification describes in length and detail the third option. I guess that is the reason Java-dudes often talk about application servers and containers.

I guess that in your case the second option was taken.

Cheers!

opsb said...

Having moved over to Ruby on Rails from java recently I came to exactly the same conclusion, the security features in java just aren't worth what I now see as a massive cost. With meta programming becoming so pervasive in all of the languages it seems that java is fundamentally just not suited to this style. When sun added annotations to the language they only did half the job. It's all well and good being able to mark code for enhancement but if you have to step outside of the language and use hacks like the subclassing stunt that cglib pulls to actually effect the behaviour of the code then what you end up with is a huge mess. Just take a look at the stack traces for your average spring based app, they're littered with dynamically created classes that add a mass of noise to the system.

Mikael Kindborg said...

I think the big distinction here is static typing vs dynamic typing (duck-typing), rather than compiled vs non-compiled. E.g. Lisp and Smalltalk have good compilers and jitters. (The Java jitter was written by people who developed jitters for Self and Smalltalk.) People who are not familiar with this tend to view "scripting languages" as "toy" languages that are not useful for building large systems. Lisp has had C-performance for many years, Erlang runs and scales extremely well on multi-processor architectures.

Peter Svensson said...

@Mikael Kindborg: I agree on the static/dynamic thing, but actually I would still argue that compiled languages introduce complexity regardless of the way they are typed.

A compiled language will need a script (ant, make, maven, et.c.) eventually to organize to build process. The resulting object files must then be organized into larger units and deploy.

Each step introduced between writing code and running code lead to greater brittleness because each step need externalities (like that ant script) that also need to be maintained.

If you use an interpreted language, you need not necessarily use build scripts (Rails use rake, for example, but it's not necessary for using ruby as such). This means that you have fewer steps of maintenance and thus over time, statistically greater productivity (on top of the fact that scripting languages (with the exception of Perl IMO - but that's flogging a dead horse) retain readability while generating code with less lines of code for a give functionality that Java, for example)

Cheers,
PS

Peter Svensson said...

@Mikael Kindborg: I agree on the static/dynamic thing, but actually I would still argue that compiled languages introduce complexity regardless of the way they are typed.

A compiled language will need a script (ant, make, maven, et.c.) eventually to organize to build process. The resulting object files must then be organized into larger units and deploy.

Each step introduced between writing code and running code lead to greater brittleness because each step need externalities (like that ant script) that also need to be maintained.

If you use an interpreted language, you need not necessarily use build scripts (Rails use rake, for example, but it's not necessary for using ruby as such). This means that you have fewer steps of maintenance and thus over time, statistically greater productivity (on top of the fact that scripting languages (with the exception of Perl IMO - but that's flogging a dead horse) retain readability while generating code with less lines of code for a give functionality that Java, for example)

Cheers,
PS