“How do we test software at StratEx? – Part 2”

StratEx is an application deployed on “the Cloud” (Public-Private SaaS) and “On-Premise” onto the customer’ environment. One single line of code is deployed onto three diverse systems.

In order to achieve such result we use a concept of code generation to shorten the development cycle, hereby being able to quickly implement new features and functionality. It also provides for a nice uniform User Interface, because we kept the code generation as simple as we dared to.

The code generation concept gives us reliable code leading also to reduced testing time. While this sounds easy, we spent quite some time figuring out what the best testing strategy could be, how to implement this.

Looking for software testability using Test-Driven Development, we found issues to enforce rock-hard principles on the architecture, it does only dramatically increase the complexity of the code, and, very importantly it serves no purpose for the end-objective of the system, namely to solve a business problem.

We moved to Behavior-Driven Development (BDD) process that has a main advantage: it implements Gherkin, a “Business Readable, Domain Specific Language” that lets you describe software’s behavior without detailing how that behavior is implemented. Unfortunately, its main downside is that you will end up with a lot of Gherkin to fully describe a system of a reasonable size.

We conclude by the main criticism we have on most of these “methodologies” including Unified Modeling Language (UML): if you have a system that goes beyond a simple calculator no modeling language is powerful enough to describe a full and complete system in such a way that you can understand and describe it more quickly than by looking at the screens and the code that implement these screens.

We have presented a method for efficiently testing large parts of web-based software by using elements of code generation to generate automatable tests, and by using BDD concepts to model tests for non-generated screens and non-generated business actions. Further we have described a method for context-based unit testing that, when combined with generated code and –tests, yields an acceptable trade-off between development efficiency and time spent on testing.

We didn’t cover areas such as performance and security tests.


How do we test software at StratEx? – Part 2


In part 1 of this article, we’ve outlined a number of criticisms and roadblocks that we have encountered while looking for the best way to test our application.

We know it is easy to criticize, but this was not for the sake of being negative. We believe that our experience has led to some valuable insights, apart from the points we simply don’t like. Further, we believe our experiences are not unique. So, in this second article we want to take a look at what can be done to test our software efficiently.

What we describe here is not all implemented today. As said in the previous article, this is part of a journey, a search for the optimum way of doing things. One might wonder why?

There are a couple of reasons for this. First and foremost, as a small startup company, our resources are limited. And as Google says “scarcity brings clarity[i]”, so we have to be careful where we spend our time and energy on.

Second, when doing our research on testing methodologies and how to apply these best, there was one recurring theme: there is never enough time, money, resources to test. This can probably be translated into “management does not give me enough, because they apparently believe that what they spend on testing is already enough”.

Now here comes the big question: what if management is right? Can we honestly say that every test, every check, every test department even, is super-efficient? We may argue that test effort may reach up to x% of development effort (30% has been given as a rough guideline). Well then, if by magic, development effort is sliced down to one fifth, is it not logical to assume that the test effort should be reduced by the same factor? And how would this be achieved? We want to explore this here.

This is where we came from. We generate large parts of our code. This reduces development time dramatically.

For a small startup this is a good thing. But this also means that we must be able to reduce our testing time. And that was the reason we had a good and hard look at current testing methods, what to test, when to test and how to test.

Testing of a Web application deployable as Public-Private Cloud and On-premises software

First, let’s frame our discussion. The application we are developing is rather standard from a testing point of view: web-based, multi-tier, a back-end database and running in the cloud.

The GUI is moderately sprinkled with JavaScript (JQuery[ii] and proprietary scripts from a set of Commercial off-the-shelf[iii] (COTS) UI controls like DevExpress[iv] and Aspose[v]). The main way to interact is through a series of CRUD[vi] screens.

We can safely say that there are probably thousands of applications like this except that the same piece of code is deployable as Private Cloud[vii], Public Cloud[viii] application as well as On-premises software[ix]; this is our target audience for this article.

We’re not pretending to describe how to test military-spec applications[x], or embedded systems[xi] for example.

What do we want to achieve with our tests? In simple terms, we’re not looking for mathematical correctness proofs of our code. Nor are we looking for exhaustively tested use cases.

We want to be reasonably certain that the code we deploy is stable and behaves as expected. We are ready to accept the odd outlier case that gives an issue. We believe our bug fixing process is quick enough to address such issues in an acceptable timeframe (between 4 and 48 hours).

Let’s look a bit closer at the various types of tests we might need to devise to achieve such reasonable assurance.

Testing a CRUD application

The application presents a number of screens to the user, starting from the home screen, with a menu. The menu gives access to screens, mostly CRUD-type, while some screens are process-oriented or serve a specific purpose (a screen to run reporting, for example).

A CRUD screen has 5 distinct, physical views:

  1. The items list (index) i.e. the Contracts list
  2. The item details i.e. the Work Page details
  3. The item edition i.e. edit the Activities details
  4. The item creation i.e. create a new Type of Risk
  5. And the item deleting i.e. delete a Project and its related items


Possible actions on each screen are standardized, with the exception of the details screen, where specific, business-oriented actions can be added. You may think of an action such as closing a record, which involves a number of things such as checking validity, status change, updating log information and maybe creating some other record in the system. In essence these actions are always non-trivial.

Testing a generated vs. a hand-coded piece of software

All CRUD screens are made of fully generated code, with the exception of the business actions, which are always hand-coded.

The non-CRUD screens are not generated and always hand-coded. Needless to say we try to keep the number of these screens low.

We have observed that the generated code and screens are usually of acceptable initial quality. This is because the number of human/manual activities to produce such a screen is very low. The code templates[xii] that are used by the generator obviously have taken their time to be developed.

This was however a localized effort, because we could concentrate on one specific use case. Once it worked and was tested (manually!), we could replicate this with minimal effort to the other screens (through generation).

We knew in advance that all the features we had developed would work on the other screens as well. An interesting side-effect of this method is that if there is an error in the generated code, the probability of finding this error is actually very high, because the code generation process multiplies the error to all screens, meaning it is likely to be found very quickly.

The hand-code screens are on the other side of the scale. They present a high likelihood of errors, and we have also found that these screens are prone to non-standard look and feel and non-standard behavior within the application.

When compared to the approach of generating highly standardized code, the reasons for this are obvious.

Testing business actions

The business actions are the third main concern for testing. These are non-trivial actions, highly context (data, user) dependent, and with a multitude of possible outcomes. We haven’t figured out yet how to test those artifacts automatically due to the amount of cases we need to take into account. Each change in our logic needs a complete refactoring of those tests that will certainly produce most of tTest-Driven Developments are Inefficient; Behavior-Driven Developments are a Beacon of Hope?he complaints from of our beloved customers.

Testing the User Interface using BDD

A final concern is the UI testing. Even with highly standardized screens, we value this type of testing, for three reasons:

  • First it is the way to run a partial end-to-end test[xiii] on your screens as we test partially the content of the database after a screen is tested.
  • Second, it is what the user actually sees. If something is wrong there, there is no escape.
  • Third, we like to use such test runs to record documentation, demo and training videos using Castro[xiv], Selenium[xv], Behave[xvi]; mostly open source software.

We believe that this context is relatively common, with possibly the exception of massive code generation and the use of tests to document the application (and we’d recommend these as something to consider for your next project!), so it makes sense to examine how these different areas can be tested efficiently.

For the generated CRUD screens, tests should be standardized and generated. Do we need to test all the generated screens? Given the fact that the tests are generated (so there is no effort involved to create the tests), we’d say that at least you must have the possibility to test all screens. Whether you test them for every deployment is a question of judgment.


Hand-coded screens, if you have them, probably require hand-coded tests. Yet, if you have information available that allows to generate (parts of) your tests, do it.

It reduces the maintenance cycle of your tests, which means you improve the long-term chances of survival for your tests.

We’ve not found convincing evidence to state that such screens can be fully described (see Table 1) by a set of BDD/Gherkin tests or briefly described (see Table 2).

The simple fact is that it would require a large amount of BDD-like tests to fully describe such screens. One practice we’ve observed is to have one big test for a complete screen; however we found that such tests quickly become complex and difficult to maintain for many reasons:

  1. You don’t want to disclose too much technical information to the business user i.e. username/password, the acceptable data types, the URL of the servers supporting the Factory Acceptance Test[xvii] (FAT), System Acceptance Tests[xviii] (SAT) and the Live application
  2. You need to multiply the amount of features by the amount of languages your system supports
  3. You want to keep independent the BDD test from the code that the tester writes as the tests depend on the software architecture (Cloud, On-premise) and the device that may support the application (desktop, mobile)
  4. A database containing acceptable data might be used by either the business user or the tester; the data might be digested by the system testing the application and reduce the complexity of the BDD-tests while increasing the source code to test the application

At StratEx, we decided to write brief BDD after many attempts to find a right balance between writing code and BDD-tests. We did choose the Python Programming language (see Table 3) to write the tests as Python[xix] is readable even by business users and can be deployed on all our diverse systems made of Linux and Windows machines.


Business actions are hand-coded too but such actions are good candidates for BDD-tests, described in Gherkin. As we mentioned before, Gherkin is powerful for describing functional behavior, and this is exactly what the business actions implement. So there seems to be natural fit between business actions and BDD/Gherkin. The context can usually be described the same as for UI tests (see here above). Can such tests be generated? We believe that the effort for this might outweigh the benefits. Still, using Gherkin to describe the intended business action, and then implementing tests for it seems like a promising approach.



This covers in broad lines the area of functional testing, including UI tests. The question obviously pops up what else needs to be tested, as it is clear that the tests we describe here are not ideal candidates for development-level (unit) tests – they would simply be too slow. In various configurations, the tests we described above would run in a pre-deployment scenario, with more or less coverage: run all screens, run only the hand-coded screen tests, run some actions as smoke tests, etc.

We believe that the most relevant tests for the development cycle are the ones related to the work the developer does, i.e. producing code. This means that generated code can be excluded in principle (although there is nothing against generating such tests). It focuses on hand-coded screens and business action implementation.

Starting with the business action implementation, we observe that this only requires coding in non-UI parts of the application: the Model[xx] and the database. It has been shown that it is possible to run unit-like tests against the model code and against the controller code. Unit tests against the model can be used to validate the individual functions (as in “normal” unit tests), while tests against the controller will actually validate that the action, when invoked by the user from the UI will yield the expected result. In that sense such a test runs like a UI test without the overhead (and performance penalty) of a browser.

What is so special about this approach? First, these are not real unit tests because they do not isolate components. They test an architectural layer, including the layers below it. This means that when testing a model, the database will be accessed as well. This is a deliberate tradeoff between the work required to make components “unit-testable” and the impact of testing them “in context”. It means we have to consider issues such as database content and we need to accept that these tests run slower than “real” unit tests. However, because we have far fewer of this type of tests (we only implement the tests for the business actions, which is between 2 to 10 per screen), the number of this tests will be around 100-200 for the complete application. We believe that this is a workable situation, as it allows developing without having to consider the intricacies of emulated data such as mocks or other architectural artifacts to allow for out-of-context testing. In other words we can concentrate on the business problems we need to solve.

An additional benefit here is that this allows us to test the database along with the code. Database testing is an area we have not seen covered often, for reasons that somewhat elude us.

In summary, we have presented here a method for efficiently testing large parts of web-based software by using elements of code generation to generate automatable tests, and by using BDD concepts to model tests for non-generated screens and non-generated business actions. Further we have described a method for context-based unit testing that, when combined with generated code and –tests, yields an acceptable trade-off between development efficiency and time spent on testing.

What are not covered in this article are other areas of testing such as performance and security tests. Currently StratEx has no immediate concerns in these areas that required us to critically observe how we validate the application in this respect.

Testing experience (PDF version)

Test-Driven Developments are Inefficient; Behavior-Driven Developments are a Beacon of Hope? The StratEx Experience (A Public-Private SaaS and On-Premises Application) – Part II




Table 1 BDD definition (full description): Create a Request for offer “11_Create_Contract_Request_for_offer.feature”

# file: ./Create_Contract_Request_for_offer.feature               

Feature: Create a Request for offer

                As a registered user,

                I want to create a Request for offer for a project


                  Given I open StratEx “<url>”

                  When I sign up as “<username>”

                  Then I should be signed in as “<user_first_last_name>”

                Scenario Outline:

                  Then I click on “Contract” menu item

                  Then I click on “Request for offer” menu item

                  Then I click on “Create new” menu item

                  Then I select “<project_name> from the “Project” dropdown

                  Then I set the “Title” box with “<project_title>”

                  Then I click on “Save” button

                  Then I check that the “Project” field equals “<project_name>”

                  Then I check that the “Title” field equals “<project_title>”

                  And I click on the link “Logout”

                  Examples: staging

                               | url | username | user_first_last_name | project_name | project_title |

                               | https://staging.<your application>.com | a Username | a firstname, a lastname | a project name | a project title |



Table 2 BDD definition (brief description): Create a Request for offer “11_Create_Contract_Request_for_offer.feature”

# file: ./Create_Contract_Request_for_offer.feature

Feature: Create a Request for offer

                As a registered user,

                I want to create a Request for offer for a project


                  Given I open StratEx “<url>”

                  When I sign up as “<username>”

                  Then I should be signed in as “<user_first_last_name>”

                Scenario Outline:

                  Then I create one Request for offer

                  And I click on the link “Logout”

                  Examples: staging

                               | url | username | user_first_last_name |

                               | https://staging.<your application>.com | a Username | a firstname, a lastname |



Table 3 Excerpt of “11_Create_Contract_Request_for_offer.py”

@then(u’I create one Request for offer’)

def step_impl(context):


    # click | ‘Contract’ menu item




    # click | ‘Request for Offer’ menu item

    context.browser.find_element_by_xpath(“//a[contains(text(),’Request for Offer’)]”).click()


    # click | ‘Create New’ menu item

    context.browser.find_element_by_xpath(“//a[contains(text(),’Create New’)]”).click()


    # select | id=Project | label=StratEx Demo



    # type | id=Title | Horizon 2020 dedicated SME Instrument – Phase 2 2014







Referenced books

  • “The Cucumber Book” (Wynne and Hellesoy)
  • “Application testing with Capybara” (Robbins)
  • “Beautiful testing” (Robbins and Riley)
  • “Experiences of Test Automation (Graham and Fewster)
  • “How Google tests software” (Whittaker, Arbon et al.)
  • “Selenium Testing Tools Cookbook” (Gundecha)


Referenced articles

  • “Model Driven Software engineering” (Brambilla et al.)
  • “Continuous Delivery” (Humble and Farley)
  • “Domain Specific Languages” (Fowler)
  • “Domain Specific Modeling” (Kelly et al)
  • “Language Implementation Patterns” (Parr)

[i] “How Google tests software” (Whittaker, Arbon et al.)

[ii] jQuery: http://jquery.com

[iii] Commercial off-the-shelf: http://en.wikipedia.org/wiki/Commercial_off-the-shelf

[iv] DevExpress: https://www.devexpress.com

[v] Aspose: http://www.aspose.com

[vi] Create, Read, Update, Delete

[vii] Private Cloud: http://en.wikipedia.org/wiki/Cloud_computing#Private_cloud

[viii] Public Cloud: http://en.wikipedia.org/wiki/Cloud_computing#Public_cloud

[ix] On-premises software: http://en.wikipedia.org/wiki/On-premises_software

[x] United States Military Standard: http://en.wikipedia.org/wiki/United_States_Military_Standard

[xi] Embedded system: http://en.wikipedia.org/wiki/Embedded_system

[xii] Web template system: http://en.wikipedia.org/wiki/Web_template_system

[xiii] End-to-end test: http://www.techopedia.com/definition/7035/end-to-end-test

[xiv] Castro is a library for recording automated screencasts via a simple API: https://pypi.python.org/pypi/castro

[xv] Automating web applications for testing purposes: http://www.seleniumhq.org

[xvi] Behave is behavior-driven development, Python style: https://pypi.python.org/pypi/behave

[xvii] Factory Acceptance Test: http://en.wikipedia.org/wiki/Acceptance_testing#User_acceptance_testing

[xviii] System Acceptance Test: https://computing.cells.es/services/controls/sat

[xix] Python Programming Language: https://www.python.org

[xx] We assume to be working with a MVC or similar architecture, or anything that separates clearly the UI from the rest of the system

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: