I have a list that is cached. When I come to refresh the list I don't want to just throw it away and start again, so I use the following. private void UpdateChildren(IEnumerable<Guid> fromDb, IEnumerable<Model> cached ){ List<Model> unseen = new List<Model>(cached); foreach (Model child in cached) { Guid childID = child.Id; if (fromDb.Contains(childID) == false) cached.Add(CreateModel(childID)); //seen it now unseen.Remove(child); } //Delete missing items foreach (Model child in unseen) { DisposeModel(child); cached.Remove(child); }} private void DisposeModel(Model child){ // Maybe do nothing} private Model CreateModel(Guid child){ return new Model(child);}
This works fine but it isn't very generic. Let's make some changes
Creating a class
Currently this isn't really a class, it could be expressed as three static methods. By including the cached list we can start to provide a reusable class with state.
class CachedCollection{ private List<Model> _cached = new List<Model>(); private void UpdateChildren(IEnumerable<Guid> fromDb) { List<Model> unseen = new List<Model>(_cached); foreach (Model child in _cached)
Mapping from Guid to Model
Currently we know how to compare a Guid and a Model, since the model has a property that exposes that Guid again. We cannot be certain that this so the case every time so now we need to go for a more generic pairing. In this case I suggest moving from a List<Model> to a Dictionary<Guid, Model>.
private Dictionary<Guid, Model> _known = new Dictionary<Guid, Model>();public void UpdateItems(IEnumerable<Guid> iEnumerable){ ... foreach (Model item in iEnumerable) { if (_known.ContainsKey(item))
Now our paired relationship is completely external to the classes themselves. This is a common feature of code that has been made reusable, in that the method of relating things is sub optimal, in this case with a memory overhead. The dictionaries key hashing lookup will provide a performance boost whether that is required for this application or not.
Applying Generic types
Not much makes a code more generic than applying Generic types, in this case we can refactor the fixed types, but this is full of issues. In this case lets consider converting the Guid to TActual and the Model to TDesired
private Model CreateModel(Guid child){ return new Model(child);}
If we look at the CreateModel method we note that this is currently creating a new TDesired from a TActual. We can add a new constraint to the the class defintion, but I don't know of a way to specify that a Type has a particular constructor that takes a single argument.
//Just gets me a ') expected' error.class CachedCollection<TActual, TDesired> where TDesired:new(TActual)
I can of course make my class abstract and leave it to the derivation to handle, but that means I need a derived class.
abstract class CachedCollection<TActual, TDesired> { private List<Model> _cached = new List<Model>(); private void UpdateChildren(IEnumerable<TActual> fromDb) { List<TDesired> unseen = new List<TDesired>(_cached); foreach (TDesired child in _cached) { TActual childID = child.Id; if (fromDb.Contains(childID) == false) _cached.Add(CreateModel(childID)); //seen it now unseen.Remove(child); } //Delete missing items foreach (TDesired child in unseen) { DisposeModel(child); _cached.Remove(child); } } private abstract void DisposeModel(TDesired child); private abstract TDesired CreateModel(TActual child);}
Of course when we consider using abstract and virtual methods, we start to consider functions as pieces of code that can be referred to. Before DotNet 2.0 there the only way to do this was to use delegates. Now where these functions are optional you may see them exposed as events. These days we have some alternatives.
Generic delegates
There are three great Generic delegate types available since DotNet 2.0
//http://msdn.microsoft.com/en-us/library/kt456a2y.aspxpublic delegate TOutput Converter<TInput, TOutput>(TInput input) //http://msdn.microsoft.com/en-us/library/system.action.aspxpublic delegate void Action<T>(T input) //http://msdn.microsoft.com/en-us/library/bfcke1bz.aspxpublic delegate bool Predicate<T>(T input)
These are great building blocks and along with 3.5's lambda syntax let us provide functions almost as simply as by deriving a class. For example we can convert
private abstract TDesired CreateModel(TActual child);public void Update(){ ... x = CreateModel(...); ...}
Into
private Converter<TActualItem, TAlternateItem> _createAlternateItem = new Converter<TActualItem, TAlternateItem>( (TActualItem actual) => {throw new NotImplementedException();} ); public void Update(){ ... x = _createAlternateItem (...); ...}
Conclusion
For me, this is very powerful. It means we can take our original example and produce the following non-abstract class.
class CollectionSync<TActualItem, TAlternateItem>{ public CollectionSync(Converter<TActualItem, TAlternateItem> createAlternateItem) { _createAlternateItem = createAlternateItem; } public CollectionSync(Converter<TActualItem, TAlternateItem> createAlternateItem, IEnumerable<TActualItem> list) : this(createAlternateItem) { foreach (TActualItem actual in list) { _known.Add(actual, _createAlternateItem(actual)); } } private Dictionary<TActualItem, TAlternateItem> _known = new Dictionary<TActualItem, TAlternateItem>(); private Converter<TActualItem, TAlternateItem> _createAlternateItem = null; private Action<TAlternateItem> _disposeAlternate = new Action<TAlternateItem>((TAlternateItem alternate) => { }); private Action<KeyValuePair<TActualItem, TAlternateItem>> _udateSingleAlternate = ((KeyValuePair<TActualItem, TAlternateItem> kvp) => { }); public void UpdateItems(IEnumerable<TActualItem> iEnumerable) { List<TActualItem> unseen = new List<TActualItem>(_known.Keys); foreach (TActualItem item in iEnumerable) { if (_known.ContainsKey(item)) { _udateSingleAlternate( new KeyValuePair<TActualItem, TAlternateItem> (item, _known[item])); unseen.Remove(item); } else { TAlternateItem newLVI = _createAlternateItem(item); _known[item] = newLVI; } } //Delete missing items foreach (TActualItem item in unseen) { _disposeAlternate(_known[item]); _known.Remove(item); } }}
Everything always starts easy. You might only have a few hundred objects in your collection that you need to process. It's easy because you can do everything in one process, one thread, and your results are always returned quickly.
A while later in your life cycle things got more complex and we start do see performance issues. Some architectures have there own ways of dealing with this to a degree.
Web Farms
Multi-threading
Grid computing
However one thing that is more difficult is when a single request needs to work through a lot of data to achieve a simple result. For example, a web farm is a great means of allowing a million users to query small pieces of data. The first problem that arrives is when one user wants a million pieces of data.
Accessing data
Databases are very good at this, and I would be surprised if you or I could produce something general purpose that could beat its returned strategies. They even provide caching capabilities so if a million users all ask for the same million rows, then the database will first take a while to produce that result but then can re-send it to the remaining 999,999 uses.
However the problem comes as we either the data or the query starts to change. Suddenly our percentage of cache misses starts to increase. At this point we actually start to ask ourselves the question that we should have asked a long time ago. Why do I need to gather a million rows?
Client caching
A performance improvement is to not store the million rows in a centralised database and keep querying it, but instead to only reload data that has changed. This leads to update schemes where updates to the the centralised data need to be pushed to the clients, although it is also possible but less desirable for all clients to poll in frequently.
One idea that has come along recently is the concept of shared cache. Products such as Microsoft's Velocity now provide a means for the in memory cache to be shared across many machines. This obviously is targeted to applications where communication between application servers in a farm should be quicker than access to the central database farm. It would also be interesting to see if this works in the client application too, providing peer to peer style updates could certainly be quicker than retrieving back from a centralised server farm, particularly if the clients are at a clients (customers) site (i.e. client -> server, across a WAN; client - client, across a LAN) .
Performing operations
Once you have got your data the next problem occurs with using it. Performing a calculation on 100 numbers is quick, on a million is slower relatively. This is commonly referred to as the Map-Reduce problem, and there are quite a few ways to handle this, although the solutions can vary depending on what you want to do with your calculation.
Simple Map operations
If you need to perform a simple operation on each piece of data, so that for a list of n elements, and each one maps to a single result then you will finish with n results, this is known as mapping. In this case there is no interaction between each element of data and the operations can be performed in parallel. If you run this within a single process, then I would recommend investigating the parallel library, currently expected to become part of core DotNet in v4.0.
The problem with single processes is that they are still limited to a single machine. If you can provide a means of distributing your operation to a set of machines, then you should find that the performance improves, but unfortunately this isn't linearly. Your problem is the overheads, the more you need to start processes, allocate memory or pre-calculate a value, the longer the process will take. I have come to know this as the 'Granularity problem'.
The smaller the granule of work, then the more overheads will be incurred in processing the entire list.
This effect can be exaggerated depending on the architecture you use to run the processing operations, for example, if you have a pre-configured number of processors that exist in the memory of the machines that will do the processing, then you will remove the overhead of process startup. Then will simply need to gather the process operation package, perform it and return the result. Conversely a more dynamic system that loads up each package and then loads in the processor to complete the operation will have a much greater overhead. As a final alternative, a system that loads a package, then loads up the processor, and when complete checks for other packages that can be handled in its current state sounds a great compromise, unless you have prioritisation schemes that need to be considered as well.
Reduce
If you simply need to add up a list of numbers or perform some operation that reduces a list into a single value, then this is known as a 'reduce'. Here there are some improvements can be made if the operation can be applied iteratively. For example, a list of million numbers will add up to the same total whether you add them up one by one, or if you break them into chunks of 1000, and each of the 1000 values up to a total, then add the 1000 totals to get a grand total.
Recently I looked at MVC and as an aside I put together a small framework to handle MVC. It was originally just a case in point, but was so useful in simplifying my code that I've popped in into a library and now reused it in my latest project. However I've just had an interesting experience as I've tried to convert a classic prototypical WinForms application, (think one class, lots of code in event handlers) and turn it into something more maintainable. First however I need to explain MVC mark 3 Originally I wanted to develop this as a PassiveView, but when I got to the point of wanted to develop ControllerAdapters for each View, I decided to change to something more akin to SupervisingController. This had a very positive impact on the code. First some rules, or at least some strong types1. What is a model? Absolutely anything public class Model{}
2. What is a View? Well it's something that wants to be notified when a Model changes, or wants to make a change to the model.
public interface IView<M> :INotifyPropertyChanged { IEnumerable<string> PropertiesViewed { get; } void UpdateViewProperty(string propertyName, M model); void UpdateModelProperty(string propertyName, M model); }
3. So a controller must be
public abstract class Controller<M> where M : INotifyPropertyChanged { public Controller(M model){} private M _model; protected M Model; private Observers<IView<M>> _observers = new Observers<IView<M>>(); private void RegisterView(IView<M> view, string property) public void RegisterView(IView<M> view) public void UnregisterView(IView<M> view) protected void View_PropertyChanged(object sender, PropertyChangedEventArgs e) protected void Model_PropertyChanged(object sender, PropertyChangedEventArgs e) private void UpdateObservers(string propertyName) private void UpdateModelFromView(IView<M> view, string propertyName) private void UpdateViewFromModel(IView<M> view, string propertyName) }
What we then need to implement
Just enough to link it all together. When a view changes, the model gets updated, and all the views interested in the property that changed on the model get an update too.
I've implemented by Views as adapters between control and controller, and with lots of functionality in the hierarchy. As a result, a specific view type needs little code, e.g. this views maps a read-only label to its model.
class LastReadView : SingleControlSinglePropertyView<DownloaderModel, Label>{ public LastReadView(GlassLabel control) : base("LastReadDate", control) { } public override void UpdateViewProperty(string propertyName, DownloaderModel model) { Control.Text = model.LastReadDate.ToString(); } public override void UpdateModelProperty(string propertyName, DownloaderModel model) { throw new NotImplementedException(); }}
It really is that simple.
Models and data longevity
The complexity comes when we look at our models. For my example, consider an application that maintains a list of URLs. That is our Data model.
However we also will be refreshing and analysing the resources attached to the other end of those URLs. That is a Progress model. Our list of URLs we would want to store between runs of our application. Our current state is transient data, we have no need to store it.
The question now is how to implement this, do we make the Progress state properties of the Data controller?
Or is it instead that the processing state is stored externally from the data controller, making the Data Controller a view on the Progress Model?
P.S. There are bonus points for anybody who can read my handwriting.
At this point you might be asking why bother?
In a single threaded environment you are right, YAGNI, however we end up back with a single object and a blocking UI.
As the number of threads increases then so does the complexity and need for synchronization. Our advantages come from,
the tight cohesion between model and controller, making it easier to develop efficient and even lock free synchronisation strategies, that views can easily participate in without needing to be tightly coupled themselves.
and the granularity of each synchronisation strategy, since each model can easily implement its own locking.
And if all this seems a bit much just to download some URLs, don't forget it can be applied to any long running operation, including financial calculations.
I've recently been looking into MVC patterns, and I'm quite surprised by their variety. The big issues I had missed before was that SmallTalk didn't follow the now classic Forms and Controls model. In its case their were objects that displayed UI and objects that handled input, or views and controllers. Martin Fowler has of course done a great job of differentiating these GUI architectures into several patterns, including SupervisingController and PassiveView. In addition we can consider Classic MVC, Application Model MVC and at least two forms of MVP. Aviad has done a good job of describing all these so I won't do him the disservice of repeating his work. What strikes me as interesting is that the most favoured pattern here seems to be PassiveView. This pattern comes up again and again and again. Granularity One thing I don't see being mentioned is the granularity of a model-view update. Should an update occur on only a single property, or when an entire set of change is propagated. Consider the following models Fine updates Word is a great example of MVC in action. Here I have highlighted the Font characteristics for the selected text being displayed in three views. I can change the properties in any number of ways and they will always show the correct information in all three locations. In this case changing the values in the toolbar results in an immediate change to the other views of those values. Coarse updates The alternative to updates with every property is to update on mass, and in fact we see dialogs where this appears to happen already. Here we see a Cancel button. You can make as many changes to the settings in this dialog as you like, but none of them will get propagated back to the main model unless you press OK. Forms and controls As already pointed out MVC was originally developed for SmallTalk, there are pitfalls when you migrate to a standard WinForms world with Controls. Cascading updates Sample code Demo1 Consider this user interface. It has two views of the same data. Change the data in one view and the 2nd should get updated. It works as you expect by using an event on the control to trigger this process off. Event triggered in Form, Form tells view that a change has occurred. View tells controller that a change has occurred View records change in Model then tells each view to update its model View updates from Model and updates the control. Unfortunately what happens is that the other view then triggers a change event as well so we end up with the model being updated again. You can see this if you run the attached project in the debugger, and watch the Output Window. model updated by View2model updated by View1 Fortunately the update is stopped from cascading indefinitely by the fact that the data doesn't really change in the control, it's just to the same value again in the control so no xxxxChanged event is fired. Missing updates This problem gets worse with larger granularity. Given what we have seen above, consider where we have multiple views and a model with multiple properties The controller updates all views and sets the first property of a view causing a changed event to occur The control event triggers a view update, the view calls the controller The controller updates the model, which overwrites the 2nd property The controller updates all views The controller sets the first property on the views, but since this value has already been passed to the view, it doesn't trigger an event Updates complete, there are no more changes that trigger events The 2nd property can now be updated in all views, but it will not be the value that we expected To avoid these issues Since all UI updates have to be done in a single thread, the if we use a BeginUpdate(), EndUpdate() pair in the View, that wraps the update of the controls property we can discard the the changed event from the control. Dynamic Granularity Looking at WPF we can see that it uses the INotifyPropertyChanged interface to handle its updates and that basically relies on PropertyChangedEventArgs with its single piece of data, namespace System.ComponentModel{ // Summary: // Provides data for the System.ComponentModel.INotifyPropertyChanged.PropertyChanged // event. public class PropertyChangedEventArgs : EventArgs { // Summary: // Gets the name of the property that changed. // // Returns: // The name of the property that changed. public virtual string PropertyName { get; } }}
If we modify the controller-view registration process we can ask each view to register only for the properties of the model that it is interested in receiving updates for, and therefore update when only a subscribed property changes. This can significantly reduce the amount of updates that are propagated. In the following example only 2 of the 5 views will be notified when the bold state changes.
Links
http://aviadezra.blogspot.com/2007/07/twisting-mvp-triad-say-hello-to-mvpc.html
http://ssoj.wordpress.com/2008/03/24/some-thoughts-on-model-view-presenter/
http://ctrl-shift-b.blogspot.com/2007/08/interactive-application-architecture.html
Martin Fowler
http://www.martinfowler.com/eaaDev/uiArchs.html
http://martinfowler.com/eaaDev/PassiveScreen.html
http://martinfowler.com/eaaDev/SupervisingPresenter.html
ASP.Net MVC
http://quickstarts.asp.net/3-5-extensions/mvc/default.aspx
http://dotnetslackers.com/articles/aspnet/AnArchitecturalViewOfTheASPNETMVCFramework.aspx
http://haacked.com/...ModelViewPresenterFromSchematicToUnitTestsToCode.aspx
In contrast to what you've just read in the title, I actually really like Test First Design (TFD) and try to embrace it at every opportunity. Unfortunately I am about to pick one of the worst examples I could possibly to illustrate using it. Let's quickly review the functionality we are trying to implement from FtpTask not quite right. FTP Upload, Download and Delete Runs as MsBuild task WinForms GUI to create and Test MsBuild Task Console App with command line arguments I want to deal with Unit Testing a UI later, and for now lets discard the MSBuild Task and console applications as well as they are only frameworks that call the functionality. That leaves the core FTP functionality, the SiteFTPEngine class, which is responsible for the uploads or downloads and of an entire site by gathering a list of files from either the file system or the FTP Server, and then transferring them to the opposite location. Unit Testing should be isolatable As we consider our Unit Testing we have to bear in mind that the unit tests will be run on multiple machines, at least on the developers machine and the build server. Unfortunately we cannot guarantee that an FTP server or even a file system will be available (and write-able to without admin rights) at all times and so therefore the best thing to do is to mock them out. That leaves us with an interesting decision of where to draw the line on the mock. So what do we test We can mock at too low a level. I can implement a mock at the FTPWebRequest level. This would require a lot of time and investigation into RFCs. Currently I know that 200 is the status code for a success, but how many other statuses are there? Conversely if I mock and too high a level, I miss testing any functionality at all. Consider the functionality in an Upload. Get a list of files from the local machine and transfer each one to the FTP server. If I mock out the file system and the FTP server, that leaves me with Get a list of files from a mock and transfer each one to another mock But I know nothing about FTP... Here's a very radical point. I believe that just because not everybody can run your unit tests does not mean that you shouldn't create them. The ultimate benefits in Unit Testing come from the tests being repeatable and ultimately be used in continuous integration builds, such as via a cruise control server. The benefits in Test First Design come from writing tests before writing functionality, and using those tests to prove that functionality. I think this really bad example is actually a fantastic edge case. According to the common rules of Unit Testing there is very little I would attempt to write a test for. However I can write and test lots of functionality by writing tests that may never be run anywhere except on my development box. Developer Tests I personally have never used the FTPWebRequest object in anger before. I didn't for example, have any idea that each Server type can implement its own format for the ls command. When I tackled this piece of work, I started by setting up a local FTP server on my machine. Now I could guarantee that I had an FTP server and a File System. This meant I could use TestDriven.Net to run a single test in the debugger and check exactly what format the IIS 6.0 FTP server produced directory listings in. Later I could perform the same test against a Server 2003 box and ultimately against a Server 2008 machine. I just know there's no point in trying the same on my cruise control machine. Conclusion You add value by providing test harnesses that can be used to test functionality. nUnit/mbUnit is one of the best harnesses around due to its simple nature for writing those tests. However those same test runners encourage all tests to be run all the time. Sometimes you can get really useful development assistance by writing a test that isn't going to run by the test runner, particularly by a continuous integration server. nUnit/mbUnit even give you the chance to
I think the ideas I raised in Never add just one project made so much sense, that I've decided to show what I meant. So sticking with the previous example, lets develop an FTP task for MSBuild.
[More]
When implementing some new functionality, certainly my own best practice is to never add just one project when I start a new piece of work.
[More]
Why it's better to be lazy
by Jeremy Meyer
June 8, 2008
Fantastic article which made me think of my biggest mistake to date. According to fowler I didn't enable, I directed. I was being Actively Intelligent to direct an individual who I believed would be Lazy Stupid.
I hope I don't make those mistakes anymore.
My latest works has still have interfaces that decouple the implementations but now they are used to enable multiple strategies and in some cases dynamically enable new implementations with minimal testing footprints, not to hide the complexity away from those who have more to learn. I am proud of that, but I think I still have more to learn.
Developers often get accused of 'Gold plating'. They take a requirement and don't deliver on time because they are too busy adding in the additional functionality that they think they need to deliver a single requirement. It's a criticism I often have of myself.
Sometimes however that plating is their for a reason, usually because you've worked on a project before that didn't have it and felt the pain afterwards. The additional functionality not in the spec is needed, lets refer to it as 'Nickel plating'.
For example, I've recently been working on implementing the ability to use a standard Report Definition Language in order to describe reports. We already have an internal format, which is spread over quite a few tables and loaded into a two tier object hierarchy, and we also have the same object hierarchy xmlSerialised. In this case my requirement was to patch RDL definitions into our reporting process.
Step 1 : Test driven conversion of existing object to support many sections in a report
I would like to thank Tom for his fantastic suggestion on this piece of work. I took one class, and moved it into three. I created a Superclass 'MultiSectionReport' to support the 'many' relationship, with many instances of a new 'ReportSection' class. Most of the functionality moved out of the 'Report' to either the superclass or the many class The functionality in the Report was basically a means of pretending there was only one Report in a MultiSectionReport.
All existing code touched only the original class which appeared unchanged in functionality and interface. Rework occurred until all existing unit tests passed.
Step 2: TDD RDL as XML into C# objects
First I created a few unit tests to go from the RDL Xml into an RDL object model.
Why introduce an interim format, you've just doubled your work?
A little while ago I worked on a system at a previous employer that had to interface with another system. The format for data exchange was Xml and they controlled the Schema. I chose to implement an object hierarchy that matched their schema, and from there I would map to our object model. When the schema changed our interim object model changed, but that was contained in one dll. It was a simple release.
Another piece of work evolved where an Xml Schema was used to map from the front end GUI to the database. Every time a new control was needed on the front end, a new column was added to the database. This wasn't a simple release.
In my opinion the interim format is essential.
So the interim format is definitely out of scope, but in my experience its required. This has to be 'Nickel plating'.
Step 3: TDD C# RDL.Report into RDL XML
At this point I had a set of secondary objectives
A requirement to convert some existing reports into RDL reports
A future need to edit the RDL and store it back in XML.
Now at this point I had to admit that my work was starting to take an uncomfortably long time with the project schedules around me.
Step 4: TDD C#RDL.Report into C# existing report model
This piece of work caused a considerable frustration. My biggest problem was that my earlier work for step 1, required further work to support other features that we needed in the RDL.
Step 5: Integration
Later came the truth points, first we integrated the new reports in web mode. The full web UI was not available but with some breakpoints and calling of the appropriate methods, Hiren simulated an tabbed environment. It Just Worked. The completely rebuilt reporting mechanism worked for legacy and new reports.
Then came the questions,
"That's great, but I want to export the RDL."
I referred to a Unit Test which did just that. This meant that Step 3 was justified.
XmlDocument doc = RDLParser.WriteRDL(rdlReport);
Then we discussed some features that aren't supported in RDL e.g. Formatting #,##0
Technorati Tags: Ivar Jacobson, EUP, ESSup
Today my employer arranged for Ivar Jacobson to come in and give us a short talk. He was a great speaker and very entertaining. He provided some very sensible advice regarding processes, which was that most processes start as something good but then grow to incorporate all the good ideas that come out afterwards. So basically his EssUP (Essential Unified Process) is about concentrating less on the whole process and more on the Good practices that it contains. In fact in enables you to drive the practices you want, just get out there and start doing some of them now.
For me personally I can relate to this with the introduction of Unit Testing for my teams code. We did it, and proved it worked. However our adoption rate outside of our team is low, (and possibly only high in my team because I keep pushing for more tests all the time). Oh well, better get the CruiseControl.Net coverage reporting working.