Dan Maharry

Digging through the serializer: Migration Woes Part 4

by

After posting my problems with the XML serializer yesterday, Scott Hanselman blogged a possible solution for the bug and a further explanation of what to look for in the serializer. Now I'm not one to quibble here with whether or not the .NET 2.0 serializer \ xsd.exe should be intelligent enough to figure out that the generic case for serialization should appear last in a series of if statements or whether I've been riding my luck with the .NET 1.1 serializer for the last sixteen months, and thus whether or not it counts as a bug, but I was happy enough to accept it if it did the trick. To recall the solution, given the following schema declaration

<element name="epp" type="epp:eppType" /><complexType name="eppType"><choice><element name="hello" /><element name="greeting" type="epp:greetingType" /></choice></complexType>

running xsd.exe produces the following class.

[System.Xml.Serialization.XmlTypeAttribute
(Namespace = "urn:ietf:params:xml:ns:epp-1.0", TypeName = "eppType")]
[System.Xml.Serialization.XmlRootAttribute
("epp", Namespace = "urn:ietf:params:xml:ns:epp-1.0", IsNullable = false)]
public class EppType
{
private object item;
[System.Xml.Serialization.XmlElementAttribute
("hello", typeof(object))]
[System.Xml.Serialization.XmlElementAttribute
("greeting", typeof(GreetingType))]
public object Item
{
get { return this.item; }
set { this.item = value; }
}
}

The issue is that the two XmlElementAttributes should be switched around like so

[System.Xml.Serialization.XmlElementAttribute
("greeting", typeof(GreetingType))]
[System.Xml.Serialization.XmlElementAttribute
("hello", typeof(object))]
public object Item

to produce the correct decision semantics in the runtime generated serializer code for the class. Sure enough (after misreading his post once and going nowhere in a fit of hopeful excitement) Scott's suggestion worked on my reduced demo code. Time to apply it to the actual solution code and hope it continued to work. I decided first to reflect the genuine EPP Core Schema (RFC3730) in the demo code. The full schema actually contains five possible choices for an <epp> element...

<element name="epp" type="epp:eppType" /><complexType name="eppType"><choice><element name="greeting" type="epp:greetingType" /><element name="hello" /><element name="command" type="epp:commandType" /><element name="response" type="epp:responseType" /><element name="extension" type="epp:extAnyType" /></choice></complexType>

... but the principle should be the same. If I switch the XmlElementAttribute for hello to the bottom of the stack like so, the serializer if statements correspond to the attribute stack and the generic object comparison goes last.

[System.Xml.Serialization.XmlElementAttribute
("greeting", typeof(GreetingType))]
[System.Xml.Serialization.XmlElementAttribute
("command", typeof(EppCommandType))]
[System.Xml.Serialization.XmlElementAttribute
("response", typeof(ResponseType))]
[System.Xml.Serialization.XmlElementAttribute
("extension", typeof(ExtAnyType))]
[System.Xml.Serialization.XmlElementAttribute
("hello", typeof(object))]
public object Item {
get {
return this.item;
}
set {
this.item = value;
}
}

Unfortunately not. OK, time to roll out Scott's way to debug the XML Serializer against the two debug builds with the hello attribute at the top of the stack and at the bottom of the stack. I end up getting the below comparison. The working code generated with hello on top is on the right and the failing code for the hello on the bottom of the stack is on the left.

winmerge1

What’s immediately apparent is that neither set of if statements here match the order in which the stack is written in the source code and, ironically, the if statement for hello is at the bottom of the stack in the right hand pane where the elementattribute for hello was on the top of the stack in the original code. To see if this was an arbitrary placing of the if statements in the serializer dll, I moved the hello attribute up the stack one place at a time, rebuilt and viewed the temp dll code and the order of the ifs never matched the attribute stack. More annoyingly, I didn't (through luck or otherwise) get the if statement for hello to appear as the last of the if statements in the DLL again which would produce exactly the required results.

Using their initials as abbreviation, these are the results I got.

Order in the attribute stack (top first) Order of if statements in serializer DLLs (top first)
H-G-R-C-E (as generated by xsd.exe) G-E-R-C-H (which would work)
G-R-C-E-H (as suggested by Scott) E-C-G-H-R
G-H-R-C-E (matching EPP schema) G-E-H-C-R
G-R-H-C-E H-E-G-C-R
G-R-C-H-E
E-H-G-C-R
G-R-C-E-H (should match earlier results) E-C-G-H-R (yup)


Which made me wonder about the original choice-of-two example. Rather than switching the hello and greeting attributes around and rebuilding enough times probably wouldn't cause the bug to occur with hello underneath greeting in the attribute stack of two but creating it from scratch a few times might. My initial sample code to Scott it seems wasn't demonstrative enough. Apologies Scott. My bad.

When debug build behaviours don't match release builds: Migration Woes Part 3

by

Scenario : I have a multi-project solution which has just been migrated to VS2005. I rebuild the whole solution successfully (plus a few warnings about obsolete methods of calling and signing assemblies) and everything continues to work as normal. I make a few changes to remove some of the obsolete methods and recompile. The solution now exhibits the serializer bug described in ‘Migration Woes Part 2’ but only in the release build. Debug build continues to work fine. I finish removing obsolete calls from the solution and the serializer bug is now present in both debug and release builds. By retracing each change I made, I realised that I could switch the bug on and off in debug build by varying the way in which I signed that project.

If I signed it in the ‘NET 1.1 style’ whereby I added the following to assemblyinfo.cs in the project, the debug build worked correctly.

[assembly: AssemblyKeyFile(@"..\..\..\Key\nameof.key")]

If I signed it in the ‘NET 2.0 style’ where I remove that line from assemblyinfo.cs, right click the project in Solution Explorer, hit the Signing tab and select the key to sign the project with from there, the debug build shows the bug. Release builds are buggy with both methods of signing.

The killer is that if I now revert all obsolete methods corrections back to their original calls as they were immediately after migrating to VS2005, the bug still manifests itself in release build but not in debug build. It should be noted that none of the code being changed at this time had anything to do with the serialization code at all with the exception of how the project containing the serialization code was signed.

Steps to diagnosis so far : Hard to know where to start with this one. It's almost as if VS had cached a set of release DLLs for my permanent befuddlement fund. The general approach was to replicate the serializer bug in a single project solution and work upwards.

Eventually we did replicate this bug and got it verified by Microsoft as mentioned previously. But there were differences.

  • No matter what we tried the bug either appeared in both debug and release build or neither.
  • We could only turn the bug on in our reproductions whereas our original multi-project solution seemed to provide a way to turn it off, at least in Debug build anyway.

Learning the bug had an arbitrary behaviour explains away some of the mystery but it did bring up some interesting things to check in the meantime as we looked for possible reasons to explain the reasons why the bug switched on in certain cases.

Scott Hanselman provided some insight into the actual differences between debug and release builds. mcdeeis added some further clarification in Phil Haack's link to Scott's original post. However, in our particular case, our project has no #if debug statements to change its behavior in that fashion and turn off compilation optimisation in release build did not switch the bug off either.

Perhaps, as noted earlier, .NET was caching a copy of DLLs somewhere with the bug switched on? The CLR resolves DLLs by looking in the same directory as the calling application, then the GAC, then a subdirectory of the calling application. You can check the GAC using the .NET Framework Configuration Admin tool for the appropriate version of the framework and selecting ‘Manage The Assembly Cache’. Use Windows Explorer to check subdirectories for copied DLLs and then use VS2005’s solution explorer to right-click the solution and Clean Solution to make sure there are no DLLs hanging about anywhere. You can even change the version number of the solution assemblies as well if you are strong naming them. Then rebuild and see if the results are any different.

So what about the on\off switch caused by the way the project with the serialization code was signed? Did this in any way affect the version of the .NET framework running the project? Well, no. Unless the solution file states otherwise, a solution in VS2005 is compiled under .NET 2.0. What does seem to be true though is that if some projects in a multi-project are signed in the ‘NET 1.1 style’ and others are signed in the ‘NET 2.0 style’ the solution still works but some feathers are ruffled somewhere under the covers. Explicitly forcing the solution to run under the .NET 2.0 using the following additon to app.config and rebuilding caused a runtime error to occur stating that there was an error in the type initializer for one of the classes in a ‘NET 1.1 style’ signed project.

<configuration><startup><supportedRuntime version="v2.0.50727"/></startup></configuration>

Out of curiosity, I switched the version number to that of .NET 1.1 and tried to recompile in VS2005. I got an error saying the app may not be fully signed with a private key. Presumably as not all projects were signed in exactly the same way.

Current Conclusion : I've now brought the solution back to the stage where all warnings of obsolete methods in .NET 2.0 have been addressed and all projects are signed in the ‘NET 2.0 style’ which at least produces a consistent response from the arbitrary bug in the serializer. Bug aside, this case does emphasize the need to sign your projects in a consistent fashion across a solution lest it come back to bite you at a later stage.

NB. Note that it’s only my current conclusion. As and when the bug is resolved, I may well revisit the debug vs release build issue and see if I can replicate it again.

XML Bugs in .NET 2.0 and The MS Support Call Process : Migration Woes Part 2

by

Scenario: We upgraded one of the XML servers we run to .NET 2.0 recently and started noticing a problem with its serialization of response messages. We have boiled this problem down to a problem the XML Serializer has with a schema <choice> group containing an element with no content model. In our case, it looks like this.

<element name="epp" type="epp:eppType" /><complexType name="eppType"><choice><element name="hello" /><element name="greeting" type="epp:greetingType" /></choice></complexType>

The .NET 1.1 framework serializes a greeting element correctly

<?xml version="1.0" encoding="utf-8"?><epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><greeting><svID>Test</svID><svDate>2006-05-04T11:01:58.1Z</svDate></greeting></epp>

but although it seemed to be fine initially in .NET 2.0, we started getting this instead.

<?xml version="1.0" encoding="utf-8"?><epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><hello d2p1:type="greetingType"xmlns:d2p1="http://www.w3.org/2001/XMLSchema-instance"><SvID>Test</SvID><svDate>2006-05-04T10:55:07.9Z</svDate></hello></epp>

Diagnosis: It turns out that the new XmlSerializer code in .NET 2.0 has a bug in it when it deals with empty elements in a <choice> group. In .NET 2.0 if struct/class member can have multiple types (multiple XmlElementAttribute in CLR, choice complexType in XSD) .NET 2.0 does not serialize it according to derivation hierarchy which causes the wrong xml output above. When the code is generated for the temporary dll which performs the actual serialization, the order in types are checked to choose how to serialize the member is arbitrary so the error may or may not be reproduced.

This is now logged with Microsoft in their bug database here and is still awaiting resolution.

For us at least, the hard part was replicating the bug with, as it turned out, ‘arbitrary behaviour’. Indeed, on a clean machine in .NET 2.0 the bug seemed not to occur unless you kickstarted it. A bit of clarification on this. We created a simple command line app that demonstrated the bug. It’s linked to in the MS Bug Report if you're interested. The problem was that

  • If we created a Command Line App project in VS2005 (Proj1) and copied in the code, the bug didn't appear when we built and ran it.
  • If we created a Command Line App project in VS2003 (Proj2), copied in the code, the bug didn't appear when we built and ran it until we opened Proj2 in VS2005 and migrated it to .NET2.0. Then we built it and ran it again and hey presto - the bug appeared.
  • If we created a Command Line App project in VS2005 again (Proj3) after migrating Proj2, and copied in the code, the bug did appear when we built and ran it.

But hang on, we now have three apps with identical code exhibiting different behaviours, the only difference being that one was built and run before the bug was kickstarted. Even if we ran Proj1 again after Proj3 there were still no signs of the bug. Now Microsoft note that the behaviour of the bug itself is arbitrary, but there seems to be a pretty definite on switch. Where's the off switch I wonder?

While I’m waiting for MS to get back to me with a fix, I’ve been looking at workarounds. Two spring to mind:

Both make sense except that in this case, the former means changing a schema which is laid out in an RFC \ de-facto standard which I can’'t do, and the latter (as far as I am aware) means altering code which was automatically generated by xsd.exe so should there be a need to regenerate this code again (an extension to the schema perhaps) there also need to be several warnings and explanations on how to re-edit the new code so that it serializes correctly again. Neither are great. Ah well

Comments: It’s ironic that the reason we moved the code affected by this problem to .NET 2.0 was a different bug in .NET 1.1 SP1 involving generated classes from schemas spread across different XSD documents. We avoided that by not installing SP1 on our Win2k boxes. As we upgraded to Wn2K3, it became apparent that the version of .NET1.1 installed by default with the OS included the bug we had previously avoided. Now we’re hit by another one in .NET 2.0. It can be worked around, but you can appreciate the irony.

This is the first time I've ever used one of the Microsoft Support Calls that come with my MSDN subscription and aside from an email going astray initially, you've got to hand it to the MS support staff. They've been pretty responsive thus far with diagnosis. Of course, I've got to wait now for one of the actual XML team to create a quick fix that I can test, but the whole process was explained nicely. For reference, it works like this.

  • A support engineer is assigned to your support call. He verifies the problem and pass it to the MS dev team.
  • The dev team may require further investigation or suggest a workaround. They may need a business case to evaluate the urgency of the case.
  • If they confirm it is a bug and can fix it, you receive a private fix for testing.
  • Once you confirm that the private fix solves the problem, MS build the official fix and release it in the KB.
  • Depending on the severity / complexity of the issue the whole process may take several weeks but although the problem is fixed for you as soon as you have the private fix.

Before we pushed it out to Microsoft, we struggled for a few days to isolate this bug as it was partially hidden inside another migration issue (see Migration Woes part 3 for more on that) but once we realized there were two separate issues, it was interesting to learn that the temporary dll that serializes classes to XML now identifies System.Object differently between .NET 1.1 and 2.0. In .NET 1.1, the hello class is described as hello.System.Object... In .NET 2.0, it’s helloZSystem.Object, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089..

There’s not much documentation on how the XML Serializer works (or doesn’t) that I could find, but Kirk Allen Evans, Christoph Schittko, and Scott Hanselman all had useful posts on ways to approach the problem before we concluded that it was an actual .NET bug. Worth reading for future reference if you’re interested.

Where did my DB Connection go? .NET 1.1 to 2.0 Migration Woes Part 1

by

Scenario : I have a .NET 1.1 console application that connects to a SQL 2000 database, queries some data and spews it out to a text file. This is successfully running on both Windows 2000 Server and Windows 2003 server. I decide to migrate the project to .NET 2.0, so open the project in VS2005, use the conversion wizard and rebuild. The newly built app works fine on Windows 2003 but not Windows 2000, where I get a connection timeout error. The .NET 1.1 version of the app still works fine on both boxes. I can also create a DSN to the database from both boxes that connects successfully. If I change the connection string in the .NET 2.0 app to use the IP address of the database server rather than its friendlier computer name, I still get a connection timeout error.

Diagnosis : The .NET 2.0 app is failing to make a connection to the database through either TCP or Named Pipes.

Comments : A lot of blog posts and indeed blogs have dedicate themselves to diagnosing connection issues between .NET 2.0 apps and SQL 2005 but the same principles apply here. The fact that the .NET 1.1 app and DSNs work fine indicate it's a client-side connection issue rather than one on the server-side. If you didn't know that though, you might want to check out the following issues.

  1. Firewall : Client and database should be communicating on port 1433 and 1434, so make sure that these are open from the client side. If you haven't got a tool that can do this already, Microsoft make available a small port scanner called PortQry and, if you don't like command line apps, a GUI frontend called PortQryUI which automatically runs scans on common port groups for certain tasks. It's not brilliant and you could use nmap or something similar if you want but it does the job.    
  2. Server Config : Unless you've deliberately switched them off, SQL 2000 will be listening via shared memory protocol (used only when you reference a database using (local) or (local)\instancename), named pipes and TCP. In SQL Enterprise Manager, right click the database you're trying to connect to and select properties. The General tab of the ensuing dialog will have a button named Network Configuration which you should click to brign you to the SQL Server Network Utility window where you can check what protocols are enabled and how they are enabled. Assuming both named pipes and TCP are up and running, hit properties for each and record the name of the pipe and the tcp port being used for comms to compare against your client config.    
  3. Client Config : Your first port of call should be the connection string. Make sure it is correct. If it's not that, run cliconfg.exe from the command line. This application is the client-side equivalent of the Server Network utility. .NET 2.0 seems to need more clarity when it comes to connections than .NET 1.1 especially with pipes, so there's now a handy guide at http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q328306 to work through and see if you can debug the problem yourself. Remember to compare client settings with those you records from the server. You could also check your installation of MDAC for inconsistencies using the guide at http://support.microsoft.com/kb/307255/.

If you can't figure out the problem from here, you can also try the forums at SqlJunkies.com, the microsoft.public.sqlserver.connect newsgroup and the SqlProtocols team blog, all of which I used to trace down my own problem and are well worth a read as well.

My own solution was finally reached by building a clean Windows 2000 VPC, making sure I could run the same .NET 2.0 app in that and comparing client settings. Sure enough, the named pipe on the Win2k box was incorrect as can often be the case for access to a clustered database, and my TCP setting was also incorrect via an alias built to the database. I'm still not sure why the .NET 1.1 version of this app continued to work despite this, but there you go. Hopefully if you come across a similar problem, you'll at least have a head start on its diagnosis and solution.

Starting the scramble to the end

by

How quickly time flies. Christmas and new year have passed peacefully. Am now filled with resolution to actually work eight hours a day, finish book and get on with life. To which purpose, the final draft of chapter 1 is now complete and the handler chapter is in mid-flow. It would probably be finished by now if it wasn’t for the fact that I keep managing to find a new way to teach the material in this chapter and rewriting bits accordingly. Still this mini draft - about number 7 - is a lot better than number 1.

My first anniversary as a freelancer has passed and I’m still managing to rub two coins together so I haven’t done all that badly. The house wasn’t flooded like most of southern England but it is now under two inches of snow which makes for a rather cold office. I finally got my copy of VS.NET Everett in the post. All I need now is that copy of .NET server rc2 and I can install it too. If my desktop decides to work, that is.

Ooooo. Kill Bill