Justin Carter's madfellas.com

CFML proposal: XML and CFML tag literals for cfscript (C4X)

Update: I've added Enhancement Request #86199 to the Adobe ColdFusion Bug Tracker. Please go there and vote for it if you would like to see this considered in a future version of ColdFusion!

This is a feature proposal for the CFML standard and for ColdFusion 10. It's reasonably long so you might want to grab a coffee or a cold drink before settling in for the ride. You can't say I didn't warn you :)

Almost a year ago Sean Corfield called for comments to help the CFML Advisory Committee decide on how some parts of cfscript should be implemented in future versions of ColdFusion. Full script based components were mostly under control, the only real sticking point was how to handle the script equivalent of nested CFML tags that also include body text - such as cfmail, cfquery and custom tags - which I'll call the "problem tags". A month later, after reviewing the 145 replies in the discussion, Sean made his recommendation to the committee to introduce a set of objects for the "problem tags". The use of objects was a logical, straight forward solution which would not introduce any new syntax and would therefore have no direct impact on the learning curve of the language or on the compiler(s). It was definitely the right step for bringing the capabilities of cfscript into line with the capabilities of CFML tags.

ColdFusion 9 was released in October 2009 and brought with it the much desired enhancements to cfscript, including full script based components, implicit getters and setters, many of the missing functions that developers have typically created UDF wrappers for, script equivalents of newly introduced tags, and the new "problem tag" objects which Adobe called script functions implemented as CFCs.

The CFML Advisory Committee and Adobe have finally brought cfscript up to the point where, in most cases, there is no need to chop and change between script and tag based approaches because the majority of code can now be written successfully either way. Which brings us to today...

Can cfscript be further improved?

At the moment cfscript is in good shape but I think most developers would agree that there are still things that could be done to improve our productivity and help with code readability and maintenance.

There are some CFML tags that still do not have script equivalents such as cffeed and cfldap (as noted by Chris Peters in his post on Full script CFCs aren't yet where they need to be), cfcontent, cfheader, cfschedule and cfsetting. We could continue to write UDF and CFC wrappers for these tags but of course it would be nicer if the language specification and vendors had clear rules about providing a consistent implementation for all tags and their equivalent functions or objects. Consistency and coverage are important.

There are also some things about the script functions implemented as CFCs which are not quite optimal. For tags like cfmail and cfquery the script equivalent tends to be slightly more verbose than the tag based approach - which in itself isn't a problem, more like a potential target for optimisation. Ben Nadel has also found a couple of issues* with the Query.cfc implementation, one of which is a bug in the parsing of named parameters and the other which can potentially expose a SQL injection vulnerability that wouldn't exist when using the cfquery tag. Ben does note that this is only a problem "when you are using horrible SQL to begin with" but you could say that any code which exposes a vulnerability is horrible. Ben also concluded that some of the cfscript "tag operators" are inconsistent in the way they are implemented (with/without named arguments, with/without parenthesis, etc). So there may be some opportunities for improvement in this area. (*Please note: one or both of the above Query.cfc issues may have been addressed after the RC or Gold release of CF9 but I have so far been unable to confirm it).

Finally, cfscript doesn't yet attempt to tackle support for custom tags. The main reason for this, and any tags which use body text and nested tags, is that trying to morph the syntax into something that looks like script - and yet will actually work - is difficult. Sean was the first to admit this, and it became evident after many attempts that there isn't really a nice "scripty looking" way to do it. Vince Bonfanti says that we shouldn't even try to solve the problem, and that we should ban the use of cfcomponent and cffunction tags and the writeOutput() function as soon as is feasible. Personally I couldn't disagree more. Banning tag based CFCs is pretty extreme, but there is also nothing wrong with allowing developers the flexibility to build something the way they want to build it - after all that is what we've been fighting for in regards to cfscript! Let's leave the choice of syntactic optimisation up to the developers and not dictate it in the language. (As a side note, Railo supports component based custom tags which I think are also quite exciting for CFML and cfscript).

Overall these issues aren't show stoppers though and I'm sure the tag to script coverage will push further towards 100% in future releases.

However, there is one more important issue to recognise with the balance of tags versus script. You can use cfscript anywhere you like within a tag based component or page, but you cannot use CFML tags within a script based component. You might ask, "Why would you want to do that anyway?" Well, the answer, as I alluded to above, is simply: flexibility.

What is cfscript for XML (C4X)?

If you are not familiar with ECMAScript for XML have a quick read of the E4X entry on Wikipedia. There are a couple of ways that the addition of E4X-style syntax could benefit CFML so I want to describe them in incremental steps. I'll call this concept C4X meaning "cfscript for XML".

The idea of having something like C4X in cfscript is not mine. I saw it first mentioned by Rick Osborne in Sean's call for comments where it received some good support by a number of people. Rick then went on to expand on his ideas in a blog entry titled CF9 + E4X + C4X. Even more interesting is that 3 months prior he had already suggested that cfscript should have E4X-like support (and commented that it was already too late to get it into CF9):

"There's no way we're going to see something E4X-like with XML fragments built into CF9. If we scream loud enough we might see it in CF10. Maybe." -- Rick Osborne

This is my attempt to get the conversation going again, and if enough people are interested then we can all "scream" together. So, on to the actual examples...

XML literals

The first and most obvious benefit of C4X is the ability to declare XML literals. When XML is treated as a primitive type it means you could assign a chunk of XML directly to a variable, i.e. there is no need to create a string and then use the xmlParse() function to parse it into an XML object.

Creating an XML object in cfscript without C4X:

person = xmlParse("<person>
  <firstname>Ben</firstname>
  <lastname>Forta</lastname>
</person>");

Creating an XML object in cfscript with C4X:

person = <person>
  <firstname>Ben</firstname>
  <lastname>Forta</lastname>
</person>;

This is the most basic example and as you can see it saves a dozen keystrokes and makes the code slightly cleaner and easier to read. It's not ground breaking but it's an improvement.

XML literals would also support variable / statement evaluation and could be useful for working with XML-compliant chunks of HTML. You could build fragments of HTML from a data set and then do further processing on them as XML (using the existing functions of the language) - something which would be difficult to do with strings - before finally using / rendering them.

Creating an XML-compliant HTML fragment including variable evaluations with C4X:

article = <div class="article">
  <h2>#qArticle.title#</h2>
  <p>#qArticle.teaser#</p>
</div>;

A full E4X-style implementation would also include filtering, manipulation via operators (e.g. using + for appending nodes) and a bunch of other stuff, but at this stage I am a little hesitant to suggest taking it that far (though this is up for debate). The main thing I wanted to do was explain XML literals and their benefits so that I could introduce the next C4X concept.

CFML tag literals

This is where we get to the meat of the proposal. C4X could overcome some of the issues raised above by allowing us to write declarative code where it makes the most sense: when dealing with nested CFML tags and tag body text. I'm not sure that "tag literals" is the right term but I'll run with it.

Since the cfquery and cfmail (and other) tags now have a script equivalent in CF9 we can begin to look at ways in which our cfscript code can be further enhanced. I'll demonstrate this by showing CF9's script version followed by the proposed C4X version.

Executing a query in cfscript without C4X:

qry = new Query(datasource="myDSN");
qry.setSql = "SELECT * FROM users";
qUsers = qry.execute().getResult();

Executing a query in cfscript with C4X:

qUsers = <cfquery datasource="myDSN">
  SELECT * FROM users
</cfquery>;

The first thing you'll notice is how tidy the C4X version is, at around 70% of the keystrokes of the script equivalent. The second thing is that the (usually required) name attribute is omitted from the cfquery tag because the assignment implies that the cfquery tag will assign the resultant query object directly to the qUsers variable. This implied attribute assignment could also work for all other tags that typically have a name or result attribute (note: since it's not quite consistent across all the built-in tags the actual attribute name will vary from tag to tag - I haven't done a full analysis but in most cases the implied attribute should be a single, obvious attribute).

So those query examples were pretty trivial. Let's beef it up a little with some logic and a parameterised value.

Executing a query in cfscript without C4X:

function getUsers(userID="0") { 
  var qry = new Query(datasource="myDSN");
  var sql = "SELECT * FROM users";
  if (arguments.userID neq 0) {
    sql += " WHERE userID = :userID";
    qry.addParam(name="userID", value=arguments.userID, cfsqltype="cf_sql_integer");
  }
  qry.setSQL(sql);
  var qUsers = qry.execute().getResult();
  return qUsers;
}

Executing a query in cfscript with C4X:

function getUsers(userID="0") { 
  var qUsers = <cfquery datasource="myDSN">
    SELECT * FROM users
    <cfif arguments.userID neq 0>
      WHERE userID = <cfqueryparam name="userID" value="#arguments.userID#" cfsqltype="cf_sql_integer">
    </cfif>
  </cfquery>;
  return qUsers;
}

With the slightly more complex query the C4X example is around 80% of the keystrokes of the script equivalent and a couple of lines of code shorter. I think at this point it's becoming clearer how workable this solution could be.

So which tags could be used as a CFML tag literal? Pretty much any CFML tag except for flow control tags I think. If it's a tag that "returns" a value like cfquery then definitely. If it's a tag that just does some processing and doesn't return anything then there's no reason it couldn't return "true" if no exceptions were thrown. If it's a tag that outputs something to the response stream then the output could either be assigned to the variable or we could use the cfsavecontent tag as a wrapper to capture it (the latter would probably be better).

How does a compiler handle C4X?

I don't think C4X is a difficult thing for a compiler to deal with. If the root node is a CFML tag (or a custom tag) then it's clearly a block of CFML code. Otherwise, it's assumed that it's a chunk of XML that could contain some variables or statements that need to be evaluated, and invalid XML should throw a compile time error. And, obviously, for this to work the XML or CFML literal would have to have a single root node / tag.

Internally, CFML engines may choose to treat XML literals as a subclass of their existing XML classes if that provides some benefit during compilation or for any future additions to C4X (such as full E4X-style filtering syntax and other operations, if they were deemed feasible; again, I'm still on the fence). I don't know enough about how the engines work under the hood to make any further suggestions here though.

One more consideration is how do IDE's handle C4X? Well, if an IDE can handle E4X's XML literals then C4X support should be somewhere in the same ball park, and so I'd be hopeful that it's within the realm of possibility for the IDEs that we use today.

Final thoughts

Personally I think it only takes a glance to see that C4X could be quite nice to work with, and the beauty of it is that it's not the only way to write the code - if you want to use a purely object based scripting solution then you can, because it already exists. On top of that, this is just the same as existing code that we have always had to write in CFML, just used in a new way (with a variable name, assignment operator and a semi-colon - definitely not rocket science!). There is barely any learning curve at all.

I'd be very interested to hear the thoughts of the ColdFusion community, staff/members of the CFML Advisory Committee, Adobe, Railo and OpenBD, and especially from those who thought E4X-style syntax was a bad idea 12 months ago. Now that a pure, object based solution exists, is there room for an improvement like C4X, and if not is there a better reason than "I just don't like tags inside script", even though in most cases the code could be considered cleaner and easier to read?

I think C4X could be another step in the right direction for CFML. Let us know what you think!