.NET and me Coding dreams since 1998!

31Dec/076

NET Foundations – Memory model (Part 1)

Previous two parts of NET foundations series were covering structure of  managed assembly and .NET execution model. Originally I have planned this to be a long and complete post about the stack and heap, but it get too long so I decide to split in two parts:

  • First part of this post would set up the stage by giving some information on a conceptual terms which need to be understand in order to be able to understand stack and heap related examples. Questions covered there would be:
    • what are stack and heap (with a little more details then usually found on most of the posts I found)
    • what are process, application  domains and threads and what is the relationship between them
    • what are reference and value types and what is the conceptual difference between then
  • In second part of the post , I would use then those explanations and try to give some answers on common stack and heap based questions:
    • how memory allocation looks like in cases of reference and value types,
    • what is boxing/unboxing and why it is something we have to be careful
    • what are the differences between static and instance members from memory allocation perspective and which one should we prefer in our code
    • what is the difference between value types and reference types etc. As you can tell from this list, this would be an long post but with lots of useful information (I hope :) )

Setting the stage up

Process, Application domains and threads

As we saw in .NET execution model post, execution of NET assembly starts with Win32 process  primary thread calling the shim - mscoree and mscorwks.dll. As a part of the CLR bootstrapping process, shim then creates a default application domain and two additional application domains: system and shared application domain. 

Application domain is a concept .NET uses to isolate different .NET applications running in the same process by providing them unique virtual address space and scoped resources. Main advantages of CLR application domain model is that:

  • enables existing of multiple application domains in a single process which has lower system costs comparing to multiple process creation,
  • "what happens in domain stays in domain" which means that domains
    • are not dependable on possible fault or exception occurring in different domains
    • the communication between domains goes through well defined decoupled mechanisms without direct access to objects existing outside of current domain
  • separate domain can have separate configuration settings and security models

As I've mentioned in the CLR bootstrapping process, during that process beside default application domain,  shared and system domain are also created.

The purpose of SharedDomain is to contain all the assemblies which would are used from multiple default domains. The advantage shared domain hosting provides is better performance due to single JIT compilation performed regardless of number of application domains using the shared domain. Without the shared domain, we have "once per method per domain" JIT compilation method. Also, memory consumption is higher in cases of redundant assembly loads in each one of the application domains. During the CLR bootstrapping Mscorlib.dll (system library) and fundamental types from System namespace like object, ValueType,Array,Enum, String, Delegate are being preloaded in shared domain and shared across all the other domains.Console programs can load code into SharedDomain by annotating the app's Main method with a System.LoaderOptimizationAttribute

The purpose of SystemDomain is to to create and initialize SharedDomain and the default AppDomain, handle string interning etc.

(In case you are interested in more details on application domains, check out this excellent post)

CLR bootstrap process creates only one default application domain, but if needed inside of the process multiple application domains can co-exist not impacting each other, with optional different security models. During the original design of NET framework, the tests showed that up to 1000 separate application domains containing small application can coexist effectively in one process (which doesn't mean that is something we should strive to :)

At the end, once CLR is initialized and loaded, shim calls the entry entry point defined in CLR header and managed application from that point is running and processing IL code in default application domain.

The whole bootstrap process is driven by process primary thread, but if needed  multiple threads can be executed inside of process on one application domain. The only constraint enforced by .NET application model is that a thread can execute only in one application domain at the time, but it can switch application domains if needed.

Stack and heap

As we saw just said, every win32 process hosting the CLR can have multiple threads and each one of those threads during its creation gets allocated 1 Mb of stack memory. As the name implies, memory reads and writes in that stack are done in Last-In-First-Out manner without any additional overhead activities. If we add to the stack operation nature the fact that stack is 1Mb size "only", it becomes clear that the stack operations are very fast and efficient.

Even stack being very efficient, .NET can not rely only on stack memory(due to its size and sequential nature), so every application domain has an allocated memory space called heap, where types and instances are created and read in direct manner using stack pointers, without any constrains in which order R/W operations should be performed and with automatic memory de-allocation (GC collection) of non used heap allocations.

Heap is been created by GC (garbage collector) using the VirtualAlloc() WinApi function as large, contiguous memory block where all the NET managed memory allocations are placed in the heap one after another. That improves very much allocation time in NET because there is no need for searching available and appropriate size memory blocks.

To increase performance of heap memory operations, CLR segments heap to "regular" heap and Large Object Heap which are both often referred as "GC heaps".

Regular object heap is a heap containing the type instances which are subject to garbage collection and defragmentation. (I would have a separate post .NET fundamentals post about GC so no more details here therefore). Large object heap contains instances which size is greater then 85 Kb, which are treated as generation 2, never defragmented and collected only during the full GC collecting  phase. All this as GC performance enhancement in cases of large object.

Beside GC heaps holding the object instances, there are also loader heaps. While GC heap contain object instances, loader heaps are containing object types, handle method JIT compilation, enable CLR to make inheritance based decisions etc.There are three types of loader heap: High frequency loader heap, low frequency loader heap, stub heap.

High frequency loader heap contains other things method table with MehodDesc table which we briefly mentioned in JITCompilation part of .NET execution model post as the table containing the list of methods with column pointing to JITCompiler function or address where already JIT compiled native CPU instructions are located. Method table also contains MethodSlot table created based on the linearized list of implementation methods laid out in the following order: Inherited virtuals, Introduced virtuals, Instance Methods, and Static Methods. Those values are used by CLR when deciding how member should be executed

image

In this example we see that type HelloHelper is loaded in High Frequency load heap which contains one static field and 3 methods where:

  • method GetDate was already once called and JIT compiled so it's address points to native instructions while other two methods are pointing to address of JIT Compiler internal function
  • GetHelloText method slot is defined as introduced virtuals, which would mean that this method is marked with virtual keyword so the CLR has to check if any of the inherited types override this method
  • GetDate method is marked as static, which signalize to CLR that no instance is needed for method invocation
  • GetVirtualText is marked as instance method, which signalize to CLR that the method require instance context for its execution.

In my posts, I would use just term "heap" without specifying explicitly  type of heap and I'll refer by that GC ("small object, instance") heap. Once again, in case you are interested in more details on heaps and stack , check out this excellent post

Everything in .NET is object, but not everything behaves as an object

System.Object type is base type of all types in .NET framework class library, so in that sense it is correct to say that everything in .NET is an object. But, although everything being object provides numerous benefits, designers of .NET framework were also aware that it has some performance disadvantage in a sense that some "objects" are so simple and frequently used that there's no sense in using them on the same way as "real, complex objects" which are allocated on heap and GC collected with performance hit caused by that activities.
Therefore, .NET architects decide to make to divert from Java and introduced in FCL value types as types inheriting from System.ValueType which are either allocated in stack memory or allocated inline on stack (stack) or heap (value type member of class). Value types are most of the primitive types (bool, int, byte etc) and structures.

Very important to be mentioned here regarding value type is that in cases when we use value type by casting it to reference type there is the the same performance hit as the type was value type, because an object wrapping the value from the stack would have to be created and garbage collected from heap. That traversal of a type from stack to heap is called boxing and traversal from heap to stack is called unboxing

The book

I forgot to mention in previous two posts that the best book I read so far for C#/CLR stuff is CLR via C# by Jeffrey Richter, where most of the things I am presenting in my blog posts are explained in great detail. In case you haven't read it already do yourself a favor and buy it. That book is pure gold!

Summary

This blog post give basic explanations about some very common NET terms, which proper understanding is (IMHO) very important for every .NET developer. Next blog post would use the terms explained here and it would dive in into ,NET memory model, through couple of examples and interesting use cases

Quote of the day:
I no longer prepare food or drink with more than one ingredient. - Cyra McFadden

Filed under: Uncategorized 6 Comments
27Dec/073

Why am I doing it?

A couple of times I've been asked why I spend my free time blogging; what's in it for me  etc..

I know this would sound maybe too naive but the most honest answer I can give on that is that I like explaining things I (think) I know and help people saving some of their time on the problems I already faced.

During the 2007 I had 4 x 3+ hours long sessions for Prague .NET Group on Test Driven Development, Enterprise Library 3.1, Design patterns in real world, Web Client Software Factory etc, with around 300 developers attending those. I've get a lot of great feedback from people attending them and most of my sessions were "sold out" very fast, a day or two after the announcement was made. To me that is a sign that all of my free time and energy spent on preparing those sessions are well spent and I intend to make more of sessions in 2008 (MVC.NET and Silverlight 2.0 are two I prepare currently) I intend also to make sure that those sessions would be recorded in future and available beside slides and source code.

I also started blogging about .NET architecture, development and test driven development  during the 2007.
I just did a quick "last 3 months" Google analytics figures check and I was literally shocked when I saw that I had almost 13.000 visitors with 20K pages.

image 

I could never ever imagine so many people would visit my blog, send me comments emails etc. Thank you all for that!
(I know: Average time spent on site is not the best, bounce rate too, but hey - I just started blogging :) )

What I am about to do in 2008?

First of all, I'll spend some serious time in 2008 on improving my English grammar which currently sucks. That's a promise! :) 

With (now dead) Acropolis project I become interested in WPF and with Silverlight 2.0 I become infested with WPF stuff, so expect in 2008 a lot of blog posts targeted to WPF, Blend and Silverlight 2.0.  WPF and Silverlight 2.0 for me are the things that would shake the grounds of Web development in 2008 and when this happens I am about to be there fully prepared :)

Another major thing I plan for 2008 is to start learning Ruby, primarily to learn how duck typing and other DLR concepts are influencing the .NET/Java architecture principles I am currently found of. I expect some significant upgrades on my understanding of building software concepts.

I also plan to start one or two CodePlex small scale open source projects, because I feel  that is also an important way of contributing to the community.

Last but not the least, I plan to continue doing my sessions here for .NET group, for the subjects I think are cool and useful.

Summarized: I'm satisfied with my 2007 contribution to the community, which is nothing monumental but enough to give me  an answer to myself why I am doing it:)

Filed under: Uncategorized 3 Comments
25Dec/076

NET Foundations – .NET execution model

Today's post will continue the quest of answering the "How .NET works?" question exactly where previous post stopped, at .NET assembly structure. In case you haven't read that post already, I suggest you to do that before proceeding with this post

General level explanation

When assembly containing NET code gets executed first it (1) executes small piece of native code inside the module which only purpose is (2) to call the MSCorEE.dll assembly. MSCorEE  then (3) loads appropriate version of MSCorWks.dll , which is COM server contained inside of DLL which implements core NET functionality. Once the CLR would be loaded and running, MSCorEE (4) executes the method matching the entry point with token value defined in CLR header.

image

Now, when we have done the L100 answer, we can take a more detail look how NET execution model really works

Detail explanation

When a managed executable would start, Windows would examine the assembly PE header to determine wheatear to create a 32 or 64 -bit process and then process primary thread calls a method inside of MSCorEE.dll ("shim") file.

To get understanding on how that shim dll would be found, loaded and which method inside of it would be called, we would have to take a look at couple of things from dumpbin results shown in previous post.

We can see there that PE header optional header values section defines 0040274E value as module entry point.

 

If we would now take a look at RAW DATA section (in the same result.txt result of the last post) we would see that at 0040274E entry point address we would have next set of  bytes: FF 25 00 20 40 00.

image

These bytes roughly translate into a "jump indirect to 0040200" instruction.

To get understanding what that 0040200 value represents we would have to take a look at imports section of clr header from previous post:

 

As we can see there, clr header import section has an entry for mscoree.dll which therefore has to exist so itcould be loaded into the already created unmanaged process. This import section is the standard way how PE file is specifying which DLLs it depends (managed or non managed).

An important thing to  be noticed here  is that import address 402000 is the same one as the one we saw stored in raw data. Another thing we can see in mscoree import section isthat mscoree.dll entry function is called _CorExeMain, so previous interpretation of raw data bytes I've used "jump indirect to 0040200" could be therefore represented as "jmp _CorExeMain (as it is represented on diagram)

MSCorEE.dll (Microsoft Component object runtime Execution Engine)

This dll is essential for functioning of all NET applications and it is located in %SystemRoot%system32 directory so in case we would be asked

"How to perform simple file based test if NET framework is installed?"

we could just check if the mscoree.dll file exist in mentioned location and we would know if NET framework is installed on a given machine

Contrary to popular belief, CLR itself is not implemented in MSCorEE.dll. It is implemented in COM server contained MSCorWks.dll and we have separate versions of that file for each one of the framework installed.

For e.g. if we would have NET 1.1, NET 2.0, NET 3.5 the core CLR functionality would be located in

  • C:WindowsMicrosoft.NETFrameworkv1.0.4322    (NET 1.1 / CLR 1.0)
  • C:WindowsMicrosoft.NETFrameworkv2.0.50727  (NET 2.0/NET 3.0/NET3.5/CLR 2.0)

In case you are confused with wide scope of mscorwks second list item, you might have ask yourself next question:

"Which version of CLR are using NET 3.0 and NET 3.5 frameworks ?"

The answer is next one:

  • NET 3.0 = CLR 2.0 + WinFX (where WinFX=WCF + WPF+ WF)
  • NET 3.5 = CLR 2.0 + WinFX + Linq

Once the mscoree would determine appropriate version of CLR it would initialize it.

With CLR being loaded and running, mscoree _CorExeMain method would then load managed module data , retrieve from module CLR header MethodDef token of the Main method and then call that method.

From that moment, managed application is running and CLR takes care from that moment application module execution.

JIT compiler

Every IL instruction generated from C# code needs to be compiled to machine native CPU instructions before it can be executed. Due to .NET IL code orientation that is not been done during the compile time (although something like that can be done with NGen tool you would still need to deploy IL code ). Instead of that, NET utilizes just-in-time (JIT) compilation model which compiles on the fly lines of IL code which are about to execute.

Two major advantages of JIT compiling are:

  • compilation would produce native code specific to the CPU used on client machine executing the application
  • CLR could profile code execution paths and recompile the IL native code on the fly to increase performances

The most common question regarding the JIT compiler is in my experience:

"Explain me how this JIT compiler works?"

To give an answer on that question I'll use the same C# example we used in previous .NET foundations post

namespace CSharp_ILCode
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Console.WriteLine("Hello world!");
            Hello2();
        }

        static void Hello2()
        {
            System.Console.WriteLine("Hello world 2x!");
        }
    }
}

 

Just before _CorExeMain method of mscoree.dll would call the main method, CLR would make a list of all the types used in Main method. (In our little code example, there would be only one type - Console)

For each one of the detected types CLR would create internal data structure, something similar to data table containing all the methods of referenced types (that information would be retrieved from type metadata) 

In case of our Console type, that internal data table could look something like this

Method name Address
Beep ^JITCFunction
...  
Write ^JITCFunction
...  
WriteLine ^JITCFunction

As we can see in left column we have list of all methods of Console type. Right column initially contains a pointer to undocumented, internal CLR function which I would call here JITCFunction.

After that internal data table would be created,, mscoree would start executing main and the first line is calling of the WriteLine method.

CLR would try to get from internal data table the address where it can find native CPU instructions but because our internal data table contains only pointer to JITCFunction, CLR would make a call to JITCFunction.

That JITCFunction would be aware of what method caused call and it would then perform next steps:

  1. examine method metadata and retrieve it's IL code,
  2. verify that the IL code is safe to be executed
  3. compile the IL code into native CPU instructions
  4. store created native CPU instructions in new dynamically allocated memory block.
  5. update the internal data table by replacing in WriteLine address column JITCFunction pointer with a pointer to 4)

At the end of those steps, that internal data table would look like this

Method name Address
Beep ^JITCFunction
...  
Write ^JITCFunction
...  
WriteLine ^NativeInstructionsMemoryBlock

Once the update of internal data table would be complete, JITCFunction would jump to the address of native CPU instruction memory block which would result in our case with "Hello" text being shown in console screen

After seeing all this steps happening before every method JIT compilation, the next question always pop up

"Are managed applications much slower then native one due to performance hits caused by JIT compiler?"

The short answer is: No, they are not because the performance hit caused by JIT compiler is minimal and JIT compiler makes that up with its advantages (custom CPU compilation)

The explanation why JIT compiler caused hits are minimal could be explained the best if we would continue our code example walk through.

After the first line would be written in console, my little example would call the Hello2 method which contain only one line - calling the Console.WriteLine method.

The difference this time is that this time when CLR tries to find the address of where WriteLine method native CPU instructions are located, it would succeed and instead of whole set of JIT compile  steps it would just execute native CPU instructions already compiled during first WriteLine execution.

We can see that JIT compiler therefore "caches" native CPU instructions into dynamic memory, which means that compiled code would be available to all the code executing into the same AppDomain as long the application won't be terminated.

The nature of most applications is that it most of the time it performs repetitive calls to the same method, so the overall application performance could be treated in general as very close (if not the same or better) one to the performance of the native one

Conclusion

In this two posts I tried to give a quick overview with some crucial details which could help answering the "How NET works?" question. I hope you now realize how structure of NET assembly and execution model are two parts of the same experience cooperating and supporting each other. I tried also to give couple of side-answers on some of the smaller but still interesting questions

Next post in my development NET Foundations post series would be covering stack/heap related subjects: value type vs reference type, instance vs type members, why boxing is evil etc...

So, stay tuned :)

del.icio.us Tags: ,,,

Share this post :

Filed under: Uncategorized 6 Comments
24Dec/0711

.NET Foundations – .NET assembly structure

Every once in a while, I've been asked by a non .NET developer (VB6, C++ etc) to explain "how .NET works, how GC works, why boxing is bad etc". I'm usually trying to find a link and save some of my time but for some of the subjects i am not able to find the appropriate ones (either they are too wide or too short and partial in presenting answers). Therefore, to save me some time in a future repeating the same whiteboard session I decided to make a couple of blog post  explaining .NET foundations. That + I got bored from all this architecture posts :)

Today's post would try to give answer on first half of the "how .NET works" question by trying to explain NET assembly structure only. Next post would then cover the second part of that answer explaining the NET assembly execution model.

General level explanation

.NET framework gives to developers freedom of choosing the language they would like to use in doing .NET programming (C#, VB, C++/CLI ...). It even enables using multiple languages in the same project where the code developed in different languages cooperate seamlessly. That is possible due to the fact that .NET framework operates only with the intermediate language (IL) code. IL code is created during the compile time by language compilers which translate high level code (c#, vb..) concepts to a combination of language agnostic IL code and its metadata. That IL code together with that metadata and headers makes a unit called managed module.

One or more managed modules and zero or more resource files are linked by language compiler or assembly linker to a managed assembly, which we see as .net DLL file. Every assembly contains also embedded manifest file which describes structure of the assembly types member definition, structure  of external assembly member references etc

image

This diagram and general level explanation are roughly sufficient for L100 explanation, but my personal preference is that always complement the big picture approximated explanation with some concrete implementation details so I'll do that here to by explaining in more details structure of managed module . I'll try to minimize talking and maximize illustrations and pictures so it would be shorter, more reader friendly while still having some weight.


C# code file

In this post, I would use very simple example where we would have a single class console application which would write two lines to console.

Something like this:

namespace CSharp_ILCode
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Console.WriteLine("Hello world!");
            Hello2();
        }

        static void Hello2()
        {
            System.Console.WriteLine("Hello world 2x!");
        }
    }
}

As we can see on a diagram above, that code would during compile time be "translated" to IL code with appropriate metadata definition and that would all then become one managed module and through that NET assembly

Managed module

Any managed module, regardless of the fact from which  code was created consist of next four big parts: 

  1. PE32 header
  2. CLR header
  3. Metadata
  4. IL code

PE header

Every managed module contains the standard windows PE execute header like the non managed - native application contain too. The only difference in case of managed code is in the fact that bulk of PE header information is just ignored, while  in case of native code that PE header information contain information about the native CPU code.

To get some information about PE header, in Visual studio command prompt, you have to execute next command

dumpbin /all assembly_name > result.txt

That command would result with result.txt file being created and that file would contain next PE header specific information's (among many other information):

PE Header

We can see on this image that PE header contains information about:

  • what type of module it is,
  • what is the value of module time stamp creation
  • for which CPU architecture IL code is optimized (PE32 32 bit/64 bit Windows, PE32+ Win 64 bit only)
  • entry point representing memory address of the _CorExeMain() function  (more about this in assembly execution  part of the post)

I'll use this opportunity (while being here at PE header part) to answer one of the common .NET questions I heard:

"How to recognize from PE header information if a module is managed module?"

If we would scroll down the optional header values results in the dumpbin result text file we would see that number of directories is 10h (14) which is number higher then the number of directories in native assemblies.The extra one specific to managed code is "COM Descriptor Directory"  and that is entry in this "table of contents" which describes where to reach during execution for the metadata and the IL

image

CLR header

In we would scroll the result.txt file to a CLR header section we would see next:

image

Here we can see:

  • targeted version of CLR for this module is 2.05 (NET 2 SP1)
  • module consist only from managed code.
  • managed module entry's point - Main method has a 60000001 metadata token value

Metadata

While we still have dumbin result.txt open, let's take a quick look at something very cool and that is how to recognize where in module metadata segment begins.

If we would scroll to raw data #1 section we would see something like this

image

Start of metadata definition block is marked with 4 bytes 42 53 4A 42 (BSJB) which are first letters of the names of developers implementing metadata part of the framework in NET 1.0. I spend 2 hours trying to find their names but no success... Looks like either no one knows who they are or no one wants to name them..

After this been said, we can close the dumbin result file because in investigating metadata and IL code we would be using ILDasm.exe tool and it's results

To use a tool we should again open Visual Studio command prompt, navigate to folder where resulting assembly  is and execute next command

ildasm CSharp_ILCode.exe

Once that would be executed, we would see ILDasm application window which would show as assembly first part embedded manifest file item.

image 

To see what manifest contains, I have double clicked it. Resulting window contains definition of assembly level data and defines data required for external assembly binding.

In our example definition of mscoree.dll would look like this

image

We saw already in CLR header part that there is information about metadata token of assembly entry point, which had value of 600001.

Knowing that tokens starting with 06 are MethodDef tokens, would lead us to examining the MethodDef related metadata. So, while ILDasm windows in focus, I have pressed <Ctrl>+<M> and found easily MethodDef with that given token value which pointed to Main method (as we already saw that in c# code definition)

 

image

Summarized:

  • in CLR header we defined token value
  • that token value is used then in metadata to lookup appropriate MethodDef entry
  • that entry describes part of the IL code which would be executed as entry point.

Metadata binary block of data consist of several tables which can be categorized in definition tables, reference tables and manifest tables, but the scope of this post is not allowing its deeper explanations (In case you are interested in more details on metadata check out the MSDN metadata start page 

IL Code

I then expanded the ILDasm tree and double clicked the Main method entry.

 

image

As we can see, the static method Main is marked as .entrypoint.

IL is stack based, which means that operand values are pushed on execution stack and results are pop of the stack, without manipulating registries.

Therefore, in L_0000 the code is pushing on operand  stack "Hello world" value which would be used in L_0005.

IL is namespace ignorant which means that namespaces from C# code in IL are becoming just prefix in the "full" type name. In IL code every member is defined in full type name format like "Namespace.Type:MemberName"

That's why we have in:

  • L_0005  System.Console:WriteLine(string) (System namespace, console type, write line member)
  • L_000a CSharp_ILCode.Program:Hello2()    (CSharp_ILCode namespace, Program type, Hello2 member)

As highlighted on IL code image, L_0005 full type name  is having one more additional prefix because the Console type is defined in external assembly (in this case that is core .NET assembly - mscorwks.dll)

That been said, one question is inevitable

"How .NET knows where to find that [mscorlib]?"

Answer is very easy and already shown in a part of describing Manifest content where we saw on the beginning of Manifest data mscorlib public token key definition which would be used for accessing that assembly in GAC

While we are still at IL code window, let's answer one more question:

"How .NET debugging works?"

The IL code presented in upper screen is built optimized in release mode (I've blogged about compiler optimizations it more details here) and we all know that we can not debug code built in release mode. To get an answer why we need to build only debug builds, let's take a quick look at how IL code for the same C# code built in debug model looks:

image

As we can see, compiler inserted before each line one NOP statement. When we set a break point on a line in Visual Studio IDE, the breakpoint is set in fact onto the NOP function before that line. Because in release mode there are no NOP instructions created by compiler and therefore there's no possibility to set a breakpoint.

This post would stop here (so it won't be too long) and it would be continued tomorrow with the description of how .NET assembly code is been executed during the runtime.

(to be continued)

Filed under: Uncategorized 11 Comments
18Dec/0711

Model View Presenter (MVP) VS Model View Controller (MVC)

As promised in MVP design pattern - Part 2, today post would cover something which generated a lot (very well deserved) noise last days - Microsoft MVC.NET framework. After some playing with the MVC.NET framework CTP bits and after reading some MVC.NET blog posts, I found myself  thinking about next questions:

  • "What is the difference between MVP and MVC? "
  • "Having MVC .NET in place, do we need MVP?"
  • "What is easier to use for TDD?"

This blog post would try to give answers on those questions

The role of view

If we would compare the diagrams of MVP and MVC patterns

image

we could see that MVP design pattern implementation has a view totally unaware of model. Presenter is the one exclusively communicating with the model and sending DTO (in case of Supervising controller) to model or performing direct manipulation with the view UI elements.

On the other hand, in MVC we have next two "read data" cases:

  • a view reading data directly from model and perform "declarative data binding" of its controls or
  • controller retrieving data from model, pass that context data to view while loading the appropriate view and view then binds to that sent context data

In most of the "update data" use cases we have a view sending updated context data to controller, where controller  performs some validation and/or business logic and updates the model with that data.

Navigation aspects

In general, MVP pattern is based on standard ASP NET page controller pattern implementation, where Page A performs Response.Redirect("PageB") and that results with page B loading with complete page life cycle events fired etc. MVP adds to that sequence just  one additional step: injecting the view to presenter (where view for presenter is just an UI technology abstracted view interface) image

Beside that "default" ASP NET page controller based implementation, there are various ways how the MVP implementation can be enhanced.

One of them is the utilization of application controller  and Page Flow Application Block (which I presented last week on my Web Client Software Factory session) where application controller centralizes flow and navigation concerns and behaves as shared context  container for multiple views covering the same use case.

Page flow application block is a block using the NET 3.0 workflow engine enhanced with additional UI designer used in defining navigation flows. The end result is that you have a separate project with workflow definition describing navigation flow use cases. In a way, that is something very similar to the purpose front controller has.
Having in mind type of enhancement application controller and navigation flow AB bring to standard ASP NET, I am calling the WCSF sometimes "ASP NET web forms on steroids"

MVP summarized: Navigation is handled on a per page basis using page controller pattern approach.

MVC.NET framework is different - more "web suitable" approach to the same concern of efficient handling navigation. It is based on front controller design pattern driven routing engine, which in general represents intercepting of  every http request on a http handler level and checking if there are mapped controller types responsible for handling intercepted Url.

As we can see on first diagram, the flow in MVC starts with controller class and not with web page - view (which is use case in MVP). When controller is been invocated, it usually:

  • retrieves some data from model,
  • performs business logic and
  • picking the appropriate view to which he pass the processed model data
  • loads  the view - which then renders without the need of complete page life cycle being executed

In a way, that is something similar to what we have in Supervising controller MVP pattern, with a difference that in supervising controller the view (page) is still been loaded first and then the presenter class is been created)

MVC summarized: The navigation is handled in one centralized configurable place for all the pages  based on front controller pattern.

I found one thing important enough to be emphasized here: Front controller (routing engine) is a part of Microsoft MVC.NET framework, but that is its separate part and pattern different then MVC pattern.

To illustrate how routing engine is important for MVC framework, I'll just mentioned briefly that I've been working on a POC which was supposed to handle on efficient way the need for having various page UI flavors of the same page - based on multiple business conditions.

I choose to built POC platform based on MVP pattern where multiple views (representing versions of the page) are sharing the same presenter (to remove redundancy) enhanced with custom (Maverick .NET based) routing front controller engine. The end result of this solution based on a front controller + MVP is something which looks (to me at least) very much like what I've seen in MVC.NET framework but just with standard web forms used.

Advantages of the MVP

One advantage I am particular excited about MVP is something David Hayden blogged about and Ron Jacobs talked about with Atomic Object guys. It is the fact that in MVP all you need to develop the presenter is an interface of the view. That interface of the view in passive view version of MVP represents page UI elements, so in a sense it can be considered as page abstraction or even as page contract between the UX (designers) and DEV (developers).  

Although passive view is very cumbersome to be implemented and sometimes lead to complex presenters, the advantages of its usage are much higher in standard waterfall development process, because it enables tech leads  to translate the FRD (Functional Requirement Document) requirements right at the beginning of development phase and produce sets of view interfaces (abstracted pages).

Therefore, communication on relation tech lead->developer is concentrated to the beginning of development phase with very precise set of artifacts defining requirements. Once the interface of the views would be defined and verified as the one matching the FRD requirements, developers are able to start developing their presenters including the unit tests to verify that FRD functional requirements are fulfilled well.

The end result of that activity would be a list of unit tests representing FRD requirements. Percentage of successfully "green" test of that list can be used then to track realistic project progress. 5 green tests of 100 FRD tests=> 5% progress report.

image

What we get as the end result is:

  • totally decoupled and effective working streams of tech lead, developer and designer
  • set of artifacts useful in tracking the real project implementation progress  and as a guidance to developers what requirements they have to met.

IMHO, Second advantage MVP has is that while MVC pattern feels more natural in some web development scenarios,  MVP pattern based development offers "crossing the boundaries" out of the box by utilizing the fact that presenter is not aware of the view UI used technology. Both view interface and presenter  are living even in separate class library project which allows their re-usage in win form applications too.

I know, most of us thinks about that as "cool but YAGNI", "we are doing only web development", but once you would see how easy it is to get the smart client application from already built web application, you would start thinking about that too.

Think about another idea: win form UI development is much faster to be implemented and modified so imagine doing prototypes of your web sites in win form just to speed up process of verifying if business needs would be met with application. Once that would be verified, you would still use the same presenter you used for win form and build a web form UI layer on top of it but this time with already confirmed business expectations.

MVP and MVC in Test Driven Development

As stated already in this post, MVP biggest advantage is enabling writing presenter tests with only view interface available.
On the other hand, MVC.NET approach to TDD offer much more power with providing ability of mocking the flow itself (you can mock context, request, response) + providing the controller based testing on a way similar to testing the supervising controller (setting the context DTO to desired values to trigger certain test flows).

Although that looks maybe to someone on first sight as an overkill, according to the Jeremy D. Miler (and he knows what he's talking :) ) the simple nature of MVC route -> controller -> view makes writing complex web form tests much easier

I spent decent amount of time of this 2007 doing MVP tests in web forms and although there were a lot of hurdles during that time I face, I can not say that I felt too much pain mainly because I was mostly using  either front controller or application controller as MVP supplemental patterns

Right now, to me too MVC looks a little bit more complex to be TDD-ed and I don't know if the advantages of the MVP can be achieved easily in MVC too, but I realize that I don't have too much experience with real world TDD-ing MVC usage but I know that I can relay on Jeremy's opinion.

The promise of having easier TDD experience with MVC in upcoming period is making me very happy and really eager to dive deeper in MVC, so be prepared for some blog posts on that subject (if anything left to be blogged after all this MVC .NET posts appearing last days like mushrooms after the rain :)


Share this post :

Filed under: Uncategorized 11 Comments
14Dec/070

Web Client Software Factory – Slides from presentation

WCSF Presentation Yesterday session was something really special , because it took much longer then we expected, due to the ad hoc decision to add to the L100 material of the  presentation some more advance "real" examples regarding usage of windows workflow for defining page and code walk through of reference implementation with debug enabled explaining how all those controllers, presenters, services, object builder etc really works under the hood. We ended at half past 9, but I hope that attendees  had equally fun and usefully spent time on that, as I had.

Couple of links I mentioned on session ( and not included in slide deck)  you should check out for WCSF information

You can download slides from yesterday presentation here

We would be doing rerun of the session on January 17th again in Monster office, which would probably be different (read: shorter) with the same set of slides but custom made example. Off course that would be available and published after the session.

At the end, I just want to apologize to all of you asking me if there were be recording of the session. We didn't record the session because the person who was supposed to record didn't came (due to misunderstanding). I'll do my best to try to record an half an hour screen cast in the upcoming week which would present most important WCSF concepts and "internals" from developer perspective. Also, we'll try to record the rerun session.

Both of those recordings should be available for download from my blog

Quote of the day:
A coupla months in the laboratory can save a coupla hours in the library. - Westheimer's Discovery

Filed under: Uncategorized No Comments
2Dec/071

Designing for testability – an valid architecture choice? (Part 3)

This is third and final part of design for testability series of posts. 
First part of the series covered initial separation of concerns between manager in provider classes, so in case you haven't read that already, jump here and  read it first. 
Second part of the series was focused on decoupling the manager and provider class by separating them with usage of service stub pattern with user provider interface. In case you haven't seen it, check it out here

Redesign step 3 - IUserManager

In part 2 we applied service stub decoupling between user manager and provider by putting internal provider interface between them. In this step, we would in general do the same thing, just this time putting the interface.

As you can probably guess already, we have again the same problem: UserManager is static class with static NumberOfUsersActiveInLast10Days method so we have to remove them if we want to introduce interface usage. But, the difference is in the fact that we could do whatever we wanted with provider class because it was internal, encapsulated class. This time we have a console code using the static manager method so by removing static attributes we would change that code on a significant way  by forcing console to create an instance to access the instance method which could cause some performance issues etc

Today post would show two approaches to solution of that problem:

Both examples would have  common start:

  • removal of the static attributes from UserManager class and NumberOfUsersActiveInLast10Days method
  • extracting an interface IUserManager and implementing it on UserManager

image

Singleton based refactoring

Singleton is very simple pattern, which is based on the idea that if we replace instance constructors with  static factory method we would get the result of instance class behaving like static class.

    public class UserManager : IUserManager
    {
        private UserManager(){/*prevents instance contruction*/}

        public static readonly IUserManager Instance = new UserManager();

In line 3, default constructor is hidden to prevent creating of another instance of UserManager

In line 5,  static field is been defined and set to an instance of Usermanager(). Because of the fact this is static field, this instantiation would happen only once.

Due to the fact that now NumberOfUsersActiveInLast10Days  is an instance method, the way of how it would be invocated in CompanyManager would have to change too, to reflect the new singleton nature of the class.

using System.Collections.Generic;

namespace DAL
{
    public class CompanyManager
    {
        private static IUserManager _userManager = UserManager.Instance;

        internal static IUserManager Manager
        {
            set { _userManager=value; }
        }


        public static IList<int> GetActiveUsers()
        {
            IList<int> result = new List<int>();
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("A"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("B"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("C"));
            return result;
        }
   

The changes done in CompanyManager are very similar to the one done in UserManager in previous post and in short they are based on replacement of the UserManager direct usage (in lines 27) with a field declared in line 10 and instantiated to point to newly created Singleton Instance property of the UserManager.

Line 11 provide just a way already seen of how to make an option for changing the real UserManager with something implementing the IUserManager during the run time to allow easy testability and all thanks to dependency injection - setter type

With this code in place, testing of the CompanyManager would be very easy and looked something like this

        [Test]
        public void GetActiveUsers_TestCaseOfZeroUsers2()
        {
            IUserManager userManager = mockRepository.DynamicMock<IUserManager>();
            Expect.Call(userManager.NumberOfUsersActiveInLast10Days(null))
                .IgnoreArguments()
                .Return(0);
            mockRepository.ReplayAll();

            CompanyManager.Manager = userManager;
            IList<int> results = CompanyManager.GetActiveUsers();
            Assert.IsTrue(results.Count == 0);
        }

As we can see on this test, we now just mock the direct dependencies - UserManager behavior without going into the internal details on how manager is implemented. "UserManager would return this and I don't care here at all how that would be retrieved "

So we got the decoupled and testable design without exposing the internals of how something is been used.

What is wrong with this code?

Now when we cleaned internal relationships between user manager and user provider and decoupled the external relationship UserManager has, the only thing left is the fact that UserManager.NumberOfUsersActiveInLast10Days method still has mixed crosscutting concerns (validation and caching) with business logic which breaks separation of concerns and decreases testability. A lot of people I know would say that this is not a big deal for them, so for them redesigning for testability could stop here because with singleton pattern implementation there is not much what it can be done further on efficient way.

Registry based refactoring

As I stated earlier, there is another type of solution applicable to this problem and in it we won't make any changes on the UserManager. UserManager class would just lost its static attributes and become and normal instance class. Desired static functionality in this approach will be achieved with a new repository class which would expose unique instances of manager classes through static properties.

Something like this:

namespace DAL
{
    public static class DALRepository
    {
        public static readonly IUserManager UserManager =
            new UserManager();

        public static readonly CompanyManager CompanyManager =
            new CompanyManager();

    }
}

As it can be seen from lines 5 and 8, the normal instances are created and stored in a static repository class which is then used as registry of assembly functionality, removing the need of doing any kind of "being static" targeted development. CompanyManager is just another instance class in this example implementing dependecy injection/service stub design we already saw in this post:

using System.Collections.Generic;

namespace DAL
{
    public class CompanyManager
    {
        private IUserManager _userManager = new UserManager();

        internal IUserManager UserManager
        {
            set { _userManager=value; }
        }


        public IList<int> GetActiveUsers()
        {
            IList<int> result = new List<int>();
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("A"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("B"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("C"));
            return result;
        }
    }
}

Notice that in Line 7, field is been assigned pointer to new instance of UserManager and that neither class nor method are not static any more. Registry class takes on itself all the burden of adding static attributes.

Testing CompanyManager would in case of repository pattern based solution look like this

        [Test]
        public void GetActiveUsers_TestCaseOfZeroUsers2()
        {
            IUserManager userManager = mockRepository.DynamicMock<IUserManager>();
            Expect.Call(userManager.NumberOfUsersActiveInLast10Days(null))
                .IgnoreArguments()
                .Return(0);
            mockRepository.ReplayAll();

            DALRepository.CompanyManager.UserManager = userManager;
            IList<int> results = DALRepository.CompanyManager.GetActiveUsers();
            Assert.IsTrue(results.Count == 0);
        }

We would create a mocked object of UserManager which would return zero number of users in lines 4-7

In line 10 , we use the DALRepository class CompanyManager static property to set UserManager which would be used by company manager to the mocked one

In line 11, we use the DALRepository class CompanyManager static property to call the GetActiveUsers() method

An additional benefit of Registry pattern based solution

You've probably aware that one of the new cool things Enterprise Library 3.1 can offer is Policy Injection Application block which is a very cool way of removing the cross cutting concern type of code from your business logic. The problem I face usually with PIAB block is that I have to enforce anywhere in the application developers to use PoliciyInjection creation factory method instead of the default constructors objects are offering. Repository pattern used on the way just explained, is IMHO a perfect way of how to encapsulate that policy injection creation code.

We cold rewrite very easy DALRepository class to look like this

using Microsoft.Practices.EnterpriseLibrary.PolicyInjection;

namespace DAL
{
    public static class DALRepository
    {
        public static readonly IUserManager UserManager =
            PolicyInjection.Create<UserManager, IUserManager>();

        public static readonly CompanyManager CompanyManager =
            PolicyInjection.Create<CompanyManager>();
    }
}

The code anywhere would still use the DALRepository,UserManager anywhere in the code without being aware of policy injection occurring in background

I won't go here too deep into the PIAB implementation (I had a session about it already, so in case you care check it out here ) but we can use it easily to perform final redesign of the initial code and remove the validation and caching from business logic.

I would explain here in short how interface based PIAB approach solving this would look.

We would just need to:

  1. decorate IUserManager with next two attributes (lines 5 and 6):
using Microsoft.Practices.EnterpriseLibrary.PolicyInjection.CallHandlers;

namespace DAL
{
    public interface IUserManager
    {
        [CachingCallHandler]
        [ValidationCallHandler]
        int NumberOfUsersActiveInLast10Days(string userName);
    }
}

2. Define appropriate matching rules and active policies in EntLib configuration file

3. Remove the code implementing validation and cahcing

The result will be that method where we have started UserManager.NumberOfUsersActiveInLast10Days would contain only business logic so the it's coherence and maintainability would be highest possible

        public int NumberOfUsersActiveInLast10Days(string userName)
        {
            
            IList<User> userCollection = _userProvider.GetUserCollection(userName);
            int result = 0;
            foreach (User user in userCollection)
            {
                if (user.LastActivity > DateTime.Now.AddDays(-10))
                {
                    result++;
                }
            }
            return result;
        } 

What is wrong with this code?

As far I can tell, there is nothing wrong with this code any more.

Conclusion

We redesigned starting example for testability and that give us clear separation of concerns, loosely coupled component interactions,high maintainability etc.

I do agree that with usage of TypeMocks we could test the original method too, but then we would loose all the side benefits designing for testability brings. Maybe the whole fuzz is because of the wrong terminology: "design for testability" should be called "design for maintainability"  from the reasons this couple of blog posts I hope showed :)

Source code presented in this examples can be found here

Filed under: Uncategorized 1 Comment
2Dec/070

Designing for testability – an valid architecture choice? (Part 2)

This is second part of design for testability series of posts.  First part of the series covered initial separation of concerns between manager and provider classes, so in case you haven't read that already, jump here and read it first, because the story here starts where it ended there. 

Redesign step 2 - IUserProvider

The redesign covered in this bog post would be based on  service stub design pattern. I have already blogged about that pattern so here I would just summarize it as a pattern where direct interaction between two types becomes indirect communication of one class with interface of the second class which decouples second class concrete implementation from communication.

If we would take a look at line 21 of the UserManger class code from last code sample of previous post we would see this line

userCollection = UserProvider.GetUserCollection(userName);

we could see that UserManager is tightly coupled to a UserProvider class and due to the fact that UserProvider talks to DB that prevents easy testing of the UserManager business logic,  because making of any test has a requisite of performing DB related setup.

Provider changes

To get rid of that DB constraint, we would need to define an IUserProvider interface which would have members of the UserProvider class and which UserManager would reference in line 17.

But, right at the start there is a problem:  our provider class is currently static internal class and interfaces are not applicable on any static type members, so to define user provider interface we first had to remove static modifiers from class and method definition.
At the end,  result is provider class implementing provider interface:

 image[46]

UserManager changes

In UserManager class, we would define a static field of the IUserProvider type which would be by default initialized to an UserProvider instance and which NumberOfUsersActiveInLast10Days method would then use that field to call the provider method instead of UserProvider class

We would expose an internal setter property which would set the value of the _userProvider instance field.

Manager code would look then like this

using System;
using System.Collections.Generic;

namespace DAL
{
    public static class UserManager
    {
        private static readonly IDictionary <string , IList<user>> _userCache 
            = new Dictionary<string , IList<user>>();
        private static IUserProvider _userProvider=new UserProvider();

        internal static IUserProvider UserProvider
        {
            set { _userProvider = value; }
        }

        public static int NumberOfUsersActiveInLast10Days(string userName)
        {
            if (string.IsNullOrEmpty(userName))
                throw new ArgumentException("User id not sent");

            IList userCollection;
            if (_userCache.ContainsKey(userName))
                userCollection = _userCache[userName];
            else
            {
                userCollection = _userProvider.GetUserCollection(userName);
            }

            int result = 0;
            foreach (User user in userCollection)
            {
                if (user.LastActivity > DateTime.Now.AddDays(-10))
                {
                    result++;
                }
            }
            return result;
        }
    }
}

In line 8, I there is field of type IUserProvider containing the UserProvider instance

In line 12-15, there is internal setter only UserProvider property which can be used to replace the DB dependable  UserProvider with some mocked/stubbed type implementing the IUserProvider.
All what is needed to be done is to set the UserManager.UserProvider=something and the user manager would use that instead of UserProvider class functionality.

In line 27, UserManager is now using the IUserProvider field value instead of the UserProvider class, decoupling by that manager and provider classes .

Deciding to have internal UserProvider property is solving the request of not exposing internals to the code using class but at the same time it is preventing also test fixture class to use it too :(

Exposing internal members in right amount

Luckily, NET 2.0 offers very easy workaround for this problem and all what is needed is adding of an simple assembly attribute which would declare internal members of the DAL assembly visible to the assembly containing the tests.

We need to open AssemblyInfo.cs  file

image

Presuming our test assembly would be DAL.Test.dll we need to add next assembly attribute definition:

   1: [assembly: InternalsVisibleTo("DAL.Test.dll")]

The result of that setting applied, would be that UserProvider property would be accessible in the Test assembly but not visible to any other assembly. That's how testing the business logic of UserManager.NumberOfActiveUsersInLast10  method could be performed without any DB related set ups because we would just set up UserProvider internal property of the UserManager class to some desired stubbed and mocked provider class

Doing that, we designed for testability (enabled stub/mock  provider injection) and avoid exposing exposing internals to "outer space"

What's wrong with this code?

Let's imagine that there is also CompanyManager class with method GetActiveUsers which calls the UserManager.NumberOfActiveUsersInLast10 days for users which name starts with A, B and C and returns the list of their results. CompanyManager class could look in that case like this

using System.Collections.Generic;

namespace DAL
{
    public class CompanyManager
    {
        private IUserManager _userManager = new UserManager();

        internal IUserManager UserManager
        {
            set { _userManager=value; }
        }


        public IList GetActiveUsers()
        {
            IList result = new List();
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("A"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("B"));
            result.Add(_userManager.NumberOfUsersActiveInLast10Days("C"));
            return result;
        }
    }
}

With current code design,if CompanyManager and UserManager would be in the same DAl.Test.dll we could try to use the fact that internals of user manager would be accessible so the test could look something like this

        [Test]
        public void GetActiveUsers_TestCaseOfZeroUsers()
        {
            IUserProvider userProvider= mockRepository.DynamicMock<IUserProvider>();
            Expect.Call(userProvider.GetUserCollection(null))
                .IgnoreArguments()
                .Return(new List<User>());

            mockRepository.ReplayAll();

            UserManager.UserProvider = userProvider;
            IList<int> results= CompanyManager.GetActiveUsers();
            Assert.IsTrue(results.Count==0);
        }

In line 4, we are (courtesy of RhinoMocks) creating a dummy user provider based on IUserProvider interface.

In line 5-7, we are mocking the desired behavior of the user provider - to return empty collection of users (more about mocking: here and  here )

In line 8, we are setting the UserProvider to the mocked provider.

So, as we can see it is possible to mock user manager  dependency and solve the problem but IMHO it is unacceptable solution because to test the company manager business logic we are forced to know and rely on internal concepts of user manager. Beside the fact that this doesn't look cool in our example, in real world there are  usually more then one dependency, so to test the CompanyManager we would have to know and set up internals of a lot of external components, which would make effective writing tests in real world mission impossible.

(to be continued)

Filed under: Uncategorized No Comments
1Dec/072

Designing for testability – an valid architecture choice? (Part 1)

A back while ago, a whole bunch of people I respect : Ayende, Roy Osherove,Udi Dahan and Eli Lopian, started a debate regarding is designing for testability an overkill or a good practice.
(In case you really care about TDD, I'm sue you would enjoy reading some of this posts:Stop designing for testability, TypeMock is Freedom, Test Driven Design vs. YAGNI, Design and Testability - YAGNI, Design vs. Process, Tools vs. Design, Dependency Injection - Keep your privates to yourself, Testable Designs - Round 2: Tooling, Design Smells and Bad Analogies, The Production Value of Seams, Design & Testability – Sense & Sensibility)

Eli is author of the TypeMock mocking framework  which is based on profiler API and that enables TypeMock to mock anything without the need of designing the code on the way we usually  seen in TDD books and articles (design based on Service locator and dependency injection interface style of programing).
The downside of that "design for testability" is that components are exposing their internals which they wouldn't need to do if there won't be testing (constructor accepting interfaces etc).
Eli's point is that we shouldn't design for testability when we have the tools capable to mock dependencies without any required special design.

To me, Eli's point shakes the fundaments of the TDD philosophy, so I spent last couple of days thinking about it and came up with a conclusion that "design for testability" is still an valid architecture choice for me because:

  • TDD design enforces separation of concerns, which increases maintainability, reusability and coherence.
  • TDD design is loosely coupled design which allows easier development and evolution of the system (change is something certain for every system, application, framework... )
  • TDD design enforce test first which helps better understanding of functional requirements, easier project management by enabling realistic progress data

So there are a lot of upsides but what about downsides Eli points to? This series of blog post would try to present a way how to handle them on easy and efficient way. I would start with a code not designed for testability and redesign it in a series of steps without exposing internals or changing the usage of the methods. I hope by that I would prove that it is possible and easy to designing for testability without negative side effects..

Warning: This would be a couple of long blog post with a lot of code examples explaining ratios behind each one of the different phases in (re)design for testability. I'll try to make them as short as possible, but scrolling posts could look scary :)

Use case

Today's use case would be based on a transaction script type of static DAL class which I'm sure we all seen quite some time

For the sake of simplicity our manager class  would have only one method which would load users from aspnetdb database which name starts with a given string  properties and return the number of that users which were active in last 10 days

image

image

That method would be used from a console application which would print out the number of the active users to end user:

namespace Console
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            int activeUserCount=UserManager.NumberOfUsersActiveInLast10Days("M");
            System.Console.WriteLine(activeUserCount);
        }
    }
}

The task of today exercise is to redesign code so it would be designed for testability without exposing internals or complicating  ways of its usage

Initial state

Let's imagine that UserManager class would initially look like this

using System;
using System.Collections.Generic;
using System.Data.SqlClient;

namespace DAL
{
    public static class UserManager
    {
        private static readonly IDictionary<string, IList<User>> _userCache 
            = new Dictionary<string, IList<User>>();

        public static int NumberOfUsersActiveInLast10Days(string userName)
        {
            if (string.IsNullOrEmpty(userName))
                throw new ArgumentException("User id not sent");

            IList<User> userCollection;
            if (_userCache.ContainsKey(userName))
                userCollection = _userCache[userName];
            else
            {
                using (SqlConnection sqlConnection = new SqlConnection(
                    @"Server=.SQLEXPRESS;Database=aspnetdb;Trusted_Connection=yes;"))
                {
                    sqlConnection.Open();
                    const string query = 
                        @"Select * from User where UserName like '%@UserName%'";
                    using (SqlCommand sqlCommand 
                        = new SqlCommand(query, sqlConnection))
                    {
                        sqlCommand.Parameters.AddWithValue("@UserName", userName);
                        using (SqlDataReader sqlDataReader 
                            = sqlCommand.ExecuteReader())
                        {
                            if (!sqlDataReader.HasRows)
                                return 0;
                            userCollection = new List<User>();
                            while (sqlDataReader.HasRows)
                            {
                                userCollection.Add(
                                    new User(
                                        sqlDataReader["UserId"].ToString(),
                                        sqlDataReader["UserName"].ToString(),
                                        Convert.ToDateTime(
                                            sqlDataReader["LastActivityDate"]))
                                    );
                            }
                        }
                    }
                }
            }

            int result = 0;
            foreach (User user in userCollection)
            {
                if (user.LastActivity > DateTime.Now.AddDays(-10))
                {
                    result++;
                }
            }
            return result;
        }
    }
}

As we can see this class does quite lot things. Key points:

In line 14, method is performing argument checking and throwing exception in case of null or empty user name

In line 18, method is checking if there is cached instance of the already retrieved user collection satisfying given criteria

In lines 22 - 51, method contains standard ADO NET type of code code which reads some data from database,pack that data into the collection of User objects

in lines 54 - 62, method is iterating through collection of user objects and for each user active in last 10 days, increments the result which is returned there (this represents business logic)

What is wrong with this code?

This code has very low coherence and it does a lot of different things so if we would look at this initial solution we would see that it contains three major parts:

  1. Cross cutting concerns: argument validation and cache checking
  2. Database access related code
  3. Method "business logic" - counting active users

Mixing of those 3 things in one method  is:

  • decreasing maintainability
  • preventing potential reuse of database code
    for e.g. there could be a need for the same method without caching for a web page calling that method which would use HttpContext based cache. With this design in place we would need to add a bool cacheOn parameter here (and end with spaghetti code) or we would make another method without caching (and end with redundant pieces of code)

Redesign step 1 - UserProvider

The simplest solution would be to follow separation of concerns principle and create a separate internal provider class which would get all the ADO code while "business logic" would stay in manager level

So the new internal static provider class would look like this:

image 

 

using System;
using System.Collections.Generic;
using System.Data.SqlClient;

namespace DAL
{
    internal static class UserProvider
    {
        public static IList<User> GetUserCollection(string userName)
        {
            IList<User> userCollection = new List<User>();

            using (SqlConnection sqlConnection = new SqlConnection(
                @"Server=.SQLEXPRESS;Database=aspnetdb;Trusted_Connection=yes;"))
            {
                sqlConnection.Open();
                const string query 
                    = @"Select * from User where UserName like '@UserName%'";
                using (SqlCommand sqlCommand 
                    = new SqlCommand(query, sqlConnection))
                {
                    sqlCommand.Parameters
                        .AddWithValue("@UserName", userName);
                    using (SqlDataReader sqlDataReader 
                        = sqlCommand.ExecuteReader())
                    {
                        if (sqlDataReader.HasRows)
                        {
                            while (sqlDataReader.HasRows)
                            {
                                userCollection.Add(
                                    new User(
                                        sqlDataReader["UserId"].ToString(),
                                        sqlDataReader["UserName"].ToString(),
                                        Convert.ToDateTime
                                            (sqlDataReader["LastActivityDate"])
                                        )
                                    );
                            }
                        }
                    }
                }
            }
            return userCollection;
        }
    }
}

As we can see, the provider method returns collection of users which name starts with user name. That method can be called by some other method which could have different approach to cross cutting concerns

The manager class would now look like this

using System;
using System.Collections.Generic;

namespace DAL
{
    public static class UserManager
    {
        private static readonly IDictionary<string, IList<User>> 
            _userCache = new Dictionary<string, IList<User>>();

        public static int NumberOfUsersActiveInLast10Days(string userName)
        {
            if (string.IsNullOrEmpty(userName))
                throw new ArgumentException("User id not sent");

            IList<User> userCollection;
            if (_userCache.ContainsKey(userName))
                userCollection = _userCache[userName];
            else
            {
                userCollection = UserProvider.GetUserCollection(userName);
            }

            int result = 0;
            foreach (User user in userCollection)
            {
                if (user.LastActivity > DateTime.Now.AddDays(-10))
                {
                    result++;
                }
            }
            return result;
        }
    }
}

In line 21, UserManager is calling the UserProvider internal class and geting collection of the users which is then processed on the same like in initial implementation

From usage point of how console application is using the code nothing changed in discoverability of the DAL component because intelisense is not showing the provider class (it is internal class)

What is wrong with this code?

If we look at user manager code now, to test that only active users are counted we would need to insert to database one user with active date earlier then 10 days and one with active date older then 10 days so we could check the results. Problem with that is that we test how business logic works (lines 21-27) ; not how database code in UserProvider class works. That's why we have to further redesign the code so we could remove this obstacle.

(To be continued)

 

Filed under: Uncategorized 2 Comments