XML Bugs in .NET 2.0 and The MS Support Call Process : Migration Woes Part 2

by DanM 24. May 2006 15:09

Scenario: We upgraded one of the XML servers we run to .NET 2.0 recently and started noticing a problem with its serialization of response messages. We have boiled this problem down to a problem the XML Serializer has with a schema <choice> group containing an element with no content model. In our case, it looks like this.

<element name="epp" type="epp:eppType" />
<complexType name="eppType">
<choice>
<element name="hello" />
<element name="greeting" type="epp:greetingType" />
</choice>
</complexType>

The .NET 1.1 framework serializes a greeting element correctly

<?xml version="1.0" encoding="utf-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
<greeting>
<svID>Test</svID>
<svDate>2006-05-04T11:01:58.1Z</svDate>
</greeting>
</epp>

but although it seemed to be fine initially in .NET 2.0, we started getting this instead.

<?xml version="1.0" encoding="utf-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
<hello d2p1:type="greetingType" 
xmlns:d2p1="http://www.w3.org/2001/XMLSchema-instance">
<SvID>Test</SvID>
<svDate>2006-05-04T10:55:07.9Z</svDate>
</hello>
</epp>

Diagnosis: It turns out that the new XmlSerializer code in .NET 2.0 has a bug in it when it deals with empty elements in a <choice> group. In .NET 2.0 if struct/class member can have multiple types (multiple XmlElementAttribute in CLR, choice complexType in XSD) .NET 2.0 does not serialize it according to derivation hierarchy which causes the wrong xml output above. When the code is generated for the temporary dll which performs the actual serialization, the order in types are checked to choose how to serialize the member is arbitrary so the error may or may not be reproduced.

This is now logged with Microsoft in their bug database here and is still awaiting resolution.

For us at least, the hard part was replicating the bug with, as it turned out, ‘arbitrary behaviour’. Indeed, on a clean machine in .NET 2.0 the bug seemed not to occur unless you kickstarted it. A bit of clarification on this. We created a simple command line app that demonstrated the bug. It’s linked to in the MS Bug Report if you're interested. The problem was that

  • If we created a Command Line App project in VS2005 (Proj1) and copied in the code, the bug didn't appear when we built and ran it.
  • If we created a Command Line App project in VS2003 (Proj2), copied in the code, the bug didn't appear when we built and ran it until we opened Proj2 in VS2005 and migrated it to .NET2.0. Then we built it and ran it again and hey presto - the bug appeared.
  • If we created a Command Line App project in VS2005 again (Proj3) after migrating Proj2, and copied in the code, the bug did appear when we built and ran it.

But hang on, we now have three apps with identical code exhibiting different behaviours, the only difference being that one was built and run before the bug was kickstarted. Even if we ran Proj1 again after Proj3 there were still no signs of the bug. Now Microsoft note that the behaviour of the bug itself is arbitrary, but there seems to be a pretty definite on switch. Where's the off switch I wonder?

While I’m waiting for MS to get back to me with a fix, I’ve been looking at workarounds. Two spring to mind:

Both make sense except that in this case, the former means changing a schema which is laid out in an RFC \ de-facto standard which I can’'t do, and the latter (as far as I am aware) means altering code which was automatically generated by xsd.exe so should there be a need to regenerate this code again (an extension to the schema perhaps) there also need to be several warnings and explanations on how to re-edit the new code so that it serializes correctly again. Neither are great. Ah well

Comments: It’s ironic that the reason we moved the code affected by this problem to .NET 2.0 was a different bug in .NET 1.1 SP1 involving generated classes from schemas spread across different XSD documents. We avoided that by not installing SP1 on our Win2k boxes. As we upgraded to Wn2K3, it became apparent that the version of .NET1.1 installed by default with the OS included the bug we had previously avoided. Now we’re hit by another one in .NET 2.0. It can be worked around, but you can appreciate the irony.

This is the first time I've ever used one of the Microsoft Support Calls that come with my MSDN subscription and aside from an email going astray initially, you've got to hand it to the MS support staff. They've been pretty responsive thus far with diagnosis. Of course, I've got to wait now for one of the actual XML team to create a quick fix that I can test, but the whole process was explained nicely. For reference, it works like this.

  • A support engineer is assigned to your support call. He verifies the problem and pass it to the MS dev team.
  • The dev team may require further investigation or suggest a workaround. They may need a business case to evaluate the urgency of the case.
  • If they confirm it is a bug and can fix it, you receive a private fix for testing.
  • Once you confirm that the private fix solves the problem, MS build the official fix and release it in the KB.
  • Depending on the severity / complexity of the issue the whole process may take several weeks but although the problem is fixed for you as soon as you have the private fix.

Before we pushed it out to Microsoft, we struggled for a few days to isolate this bug as it was partially hidden inside another migration issue (see Migration Woes part 3 for more on that) but once we realized there were two separate issues, it was interesting to learn that the temporary dll that serializes classes to XML now identifies System.Object differently between .NET 1.1 and 2.0. In .NET 1.1, the hello class is described as hello.System.Object... In .NET 2.0, it’s helloZSystem.Object, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089..

There’s not much documentation on how the XML Serializer works (or doesn’t) that I could find, but Kirk Allen Evans, Christoph Schittko, and Scott Hanselman all had useful posts on ways to approach the problem before we concluded that it was an actual .NET bug. Worth reading for future reference if you’re interested.

Comments

5/24/2006 5:26:05 PM #

Scott Hanselman

You could get around it immediately by using IXmlSerializable...

Scott Hanselman United States

5/24/2006 5:27:05 PM #

Scott Hanselman

Could you send me an example, schema BTW? I'd like to repro it.

Scott Hanselman United States

5/24/2006 10:50:05 PM #

Dan Maharry

Scott, you can get my sample replication code and schema from the MS bug report at lab.msdn.microsoft.com/.../viewfeedback.aspx Please note that I've added some more to the post about activating the bug etc which you should probably read before trying to replicate it.

Dan Maharry United Kingdom

5/25/2006 7:31:05 AM #

Scott Hanselman

With respect, your code is incorrect. Reverse the two lines on Item.

[System.Xml.Serialization.XmlElementAttribute("greeting", typeof(GreetingType))]
[System.Xml.Serialization.XmlElementAttribute("hello", typeof(object))]

If you have hello of type object as the FIRST stacked attribute, then ANY object will match that. Reverse them as seen here and you'll get:

<?xml version="1.0" encoding="utf-8"?><epp xmlns="urn:ietf:params:xml:ns:epp-1.0
"><greeting><svID>Test</svID><svDate>2006-05-25T06:26:28.484375Z</svDate></greet
ing></epp>

Scott Hanselman United States

5/25/2006 7:52:05 AM #

Scott Hanselman

One other thing, if I may. Your sample has this "EppMessageSerializer" a lot of utilty methods and MemoryStream work to get the object into a string.

This would work and save you lots of code, removing both the EppMessageSerializer and the EppXml class:

EppType epp = CreateGreeting();
XmlSerializer x = new XmlSerializer(typeof(EppType));
StringWriter sw = new StringWriter();
x.Serialize(sw, epp);
Console.WriteLine(sw.ToString());

Scott Hanselman United States

5/25/2006 8:23:05 AM #

Dan Maharry

Hi Scott,

Thanks for taking an interest.
I've tried switching the ElementAttributes around - in fact I switched them around to see if there was any difference - but the problem persists for me. As I noted in the post, it is as if I have switched the bug on and can't now switch it off.

I also tried the simplified code you suggested. This actually produced a different ouput than I have seen so far but still not actually the correct one as follows.

<?xml version="1.0" encoding="utf-16"?>
<epp xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="urn:ietf:params:xml:ns:epp-1.0">
  <hello xsi:type="greetingType">
    <SvID>Test</SvID>
    <svDate>2006-05-25T07:13:07.9Z</svDate>
  </hello>
</epp>

The reason for the EppMessageSerializer and the MemoryStream is that this is a reduction from an XML registry server which would normally send an XML response back over the wire to a client rather than send it to screen. I didn't want to change the behaviour of the code if I could avoid it.

Dan Maharry United Kingdom

5/25/2006 1:56:05 PM #

Scott Hanselman

As far as sending content back to a server, you can serialize directly to a stream, for example directly to the Response.OutputStream like this:

EppType epp = CreateGreeting();
XmlSerializer x = new XmlSerializer(typeof(EppType));
x.Serialize(HttpContext.Current.Response.OutputStream, epp);

Scott Hanselman United States

5/25/2006 2:11:05 PM #

Scott Hanselman

Try an explicit REBUILD ALL (Not just BUILD). This worked on two machines at my home office using the original code with the one modification.

If you like, run del %temp%\*.* but that's not officialy needed,

Scott Hanselman United States

Comments are closed

Powered by BlogEngine.NET 1.6.0.0
Theme by Mads Kristensen, adapted by Dan Maharry

About Dan

Dan Maharry Dan Maharry
Web developer at Co-operative Web and tech writer. More...
Creative Commons License View Dan's bookmarks on Delicious LinkedIn Facebook Facebook Last.fm
Last.fm Twitter Subscribe to Dans RSS Feed Download Dans OPML File Add blog to technorati favorites

RecentComments

Comment RSS