Dan Maharry

Writing about web development since 1997

DDD East Anglia in Review

And so it was that, inspired by the grand prix at Silverstone, we found ourselves pelting towards Cambridge at somewhat naughty speeds for the debut DDD East Anglia. A glorious day it was with good weather, good talks and good meets with friends and new faces alike. The rooms were perhaps a little stuffy but that small gripe aside, Phil Pursglove and crew deserve a massive vote of thanks for organising it all, the staff at the Cole Hauser forum (not named after the actor) for the food and drink, and the speakers for the sharing of their knowledge.

I was able to attend four sessions today as well as the grok talks which I presented with Dave Sussman and Richard Dutton.

More...

DDD Southwest Session Notes 4 : Redis Is the New Black

The final session of my day at DDD Southwest was an introductory talk by Chris Hay about Redis, “an open-source advanced key-value store”. It introduced Redis, looked at some of the basic features and commands and their possible applications. There’s full documentation of all Redis commands here.

So What Is Redis then?

Released on April 10 2009 Salvatore Sanfilippo (@antirez)
Open source, sponsored by VMWare

It is an open source, in-memory key value data store, which does more than just keys and values. (so NoSQL with knobs on.) It also does sets, lists & counters. It was written in C by Salvatore Sanfilippo (@antirez) and is very very fast. Comparisons by use as a cache server would include AppFabric& memcached. It is used by stackoverflow, reddit, & others.

Redis runs on Linux. There is a WIndows build-ish, but use Linux.

Redis is a server – you use a client or a client API library connect to it. A command-line client is installed wit the server, but there are many thrid party redis clients – listed here. For .NET developers, there are two good options.

Booksleeve uses a lot of clever async stuff with Task<T> but the demos all used ServiceStack.Redis because it is easier to work with and simpler to read.

Working with KEYS

There are three redis commands to work with key value pairs – SET, GET & EXPIRE

redis-cli SET MyKey "Hello World"
redis-cli GET MyKey
redis-cli EXPIRE MyKey 10  <- after 10 seconds

In C#

using ServiceStack.Redis;

using (var cli = new RedisClient(IPaddress))
{ 
  ViewData[MyKey] = cli.Get<string>("MyKey"); 
  // cli.Set<string>("MyKey", "Hello World");
  // cli.Expire("MyKey",10);
}

Redis Keys are useful for Caching and Sessions. There is a Microsoft session provider written for use against redis. (SQL Server provider really sucks)

Working with COUNTERS
Commands: increment, decrement, increment by, decrement by

Default increment is 1

redis-cli incr  aCounter 
redis-cli decr  aCounter 
redis-cli incrby aCounter 2 
redis-cli decrby aCounter 2

Redis runs on a single thread so it is completely thread-safe. incr, decr ain't going to produce clashes and same results for same things. Counters useful for web page hit counters, perf counters, analytics, unique ids, sequences etc.

Working with SETS
Commands: sadd, smembers, srem

SADD MySet C   (adds C to MySet) 
SMEMBERS MySet (gets the members of MySet) 
SREM MySet C   (removes C from MySet)

Remember that sets are unordered and all members are unique. The power of sets comes in the set arithmetic

SUNION MySet1 MySet2 
SDIFF MySet1 MySet2
SINTER MySet1 MySet2

You can find the rest of the commands here.

Q\A : Redis deals with strings typically, but it is generally pretty good with other types as it stores things at byte level. JSON Serialization works well but prefer to avoid it if possible.

Sets are useful for tagging (a la Stack Overflow).

SORTED SETS

Useful for indexes, object graph relationships.

Demo of ChukGraph
(or How I almost got myself shot in the head by gangsters...)

LISTS

(Queue - first in first out)
LPUSH mylist A
RPOP mylist

(Stack - last in last out)
RPUSH mylist A
LPOP mylist

PIPELINING

Typically, you send command to Redis without waiting for a response. A reply is returned in response to each request. However, it is possible to pipeline/queue requests and wait for a response which is a cumulative result of all the requests.

TRANSACTIONS

Redis sort of supports transactions

Atomic : yes
Consistent : need to add a WATCH command for that
Isolated : Yup - be aware of threads though (single threaded only)
Durability : Kind of, but the more durable you want it (persisted to disk for instance), the slower redis is. It takes about 10ms to do this. (Which is longer than you want - really)

Redis also supports LUA Scripting

PUB\SUB

PUBLISH channel message
SUBSCRIBE, UNSUBSCRIBE channel  

REPLICATION

You can set up a group of Redis servers in a master-slave configuration

  • Slaves are read-only
  • Why do this? Redis is single-threaded, so push intensive queries to the slaves to prevent blocking and have them publish the changes back to master

At some point in the future, there will be Redis clustering.

http://redis.io

DDD Southwest Session Notes 3 : Defensive Programming 101

The third session of my day at DDD Southwest was a talk by Niall Merriganentitled Defensive Programming 101. Or, as it turned out, “Top 10 Things You Shouldn’t Forget To Do To Start Securing Your Website”. This was probably my favourite of the sessions I attended at DDD Southwest. Niall’s a funny bloke with useful things to teach so if you get a chance to see him talk, do.

Further information and resources can be found on Niall’s site here at http://www.certsandprogs.com/2012/05/dddsw-roundup-and-resources.html. You can contact him on Twitter at @nmerrigan

Writing secure code is hard and takes time. Sales people do not care and we trust each other  that our code is secure. Not that many people try to screw up a site they are visiting. We write for general users.

The Top 10 "Screwables"

10. Restrict your HTTP Headers. It can tell those who care too much, such as the webserver being used, .net version etc. Don’t forget also to restrict easy access to sensitive files. Kill elmah.axd, trace.axd. Search for the phrase googlehacks and you’ll find the number of ways google can help the hacker attack your site simply because you’ve allowed them to index your *.axd files, *.pst files etc.

9. Passwords. Don't use the same password for every single site. Cross-pollination hurts. Don’t save passwords in plain text connection strings. Encrypt them, change them regularly and don’t ell them to others. NB You can't encrypt your password in a connection string when using the entity framework.

8. Patching. Try and follow the server patch bulletins sent out by Microsoft and make sure you test your web applications with new patches attached as they can break.

7. Validation. Don't rely only on client-side scripting for validation. Turn off javascript in your browser and make sure server-side validation is working. Prefer whitelists to blacklists. Validate to RFC rules. Use a central validation source – i.e. a single place to update validation rules.

Search for e.g. RFC email - shoud point to ISO standard validation routine

6. Email & Custom Errors. Email presents too much information to users, and not enough to developers. Error messages should only show users what they expect to see. No stacktraces etc. Very ditto for emails. Turn CustomErrors ON IN PRODUCTION with pages handling specific error codes – 404, 500 etc. Use web config transforms to kill it or set it to localonly and set up custom errors. Handle your errors correctly.

5. Database and AppPool Permissions. Don't use sa in your connection string. Don’t create a user and set it to dbo. Use two connection strings, one for reading, one for writing. Never run AppPools as *Admin user accounts. Always use the minimum permissions. Don't give access to SQL tables if sprocs and views are all that’s required. Perhaps give user only execute permissions if you only use sprocs. (Yes this should include insert perms. Oh, and don’t use dynamic sql in sprocs)

4. Directory traversal. How do I send a file to client that opens a save as dialog? Ans: Whatever you do, don’t do something like this: download.aspx?file.txt. Because download.aspx?..\web.config will work. IIS isn't handling and therefore automatically blocking the request at this point, your handler is. You could probably download the whole source code in dll files and decompile it. If you used the asp.net website template, you can get all the cs files as well. In short, don't use a file name for downloads. Use a GUID or include a hash for the file to check you're downloading the file you think it is.

Find link on blog to WAM_USER WAM_PASSWORD metabase vulnerability on Niall's site,

3. Injection attacks. Addition of scripts to your site against your will. Don't set enablePageValidation to false. Very silly. Don't use cookieless sessions. Only make cookie on server-side readonly.

2. SQL Injection. nuff said. Don't allow naked SQL in your code.

1. Users will never do what you think they are going to do.

DDD Southwest Session Notes 2 : Web Sockets &amp; SignalR

The second session of my day at DDD Southwest was a talk by Chris Alcock of The Morning Brew about Web Sockets, a new web standard for real-time communications between a web server and a page on a client browser, and SignalR, an open source library that make use of web sockets.

N.B. The section of Chris’ talk on Web Sockets and how they work is better fleshed out here on DeveloperFusion than the notes below.

Q: What do we mean by real-time web applications?
A: Time-dependent applications such as the stock market and weather.

Q: What do we mean by interactive web applications?
A: Applications such as chat and auctions (which are also real-time) which work only with users communicating with each other.

Q: How do we handle these communications?
A: Currently we use periodic refreshes from the browser client to update what a user sees but it takes place at the page's\user’s discretion which is a bit rubbish. Long polling works a bit better. Flash is also a possibility.

Web Sockets? What? When? How?

A new option in real-time web communication has appeared called Web Sockets. A Web Socket connects two computers - client and server.

  • It is a web standard – RFC6455
  • W3C defines the WebSocket API as a draft standard and part of the wider HTML5 standard.
  • Currently there is limited support for it. The RFC version (there were 12 pervious versions) is only supported in a few recent browser versions. IE10, FF11, Chrome16, Opera

How do they work?

  • Over normal web requests, so firewall friendly. Port 80, 443 using ws:\\ and wss:\\ respectively.
  • Request includes the upgrade header to notify this is a web socket request.
  • Coms are two-way a la traditional sockets & can stream data.
  • Supports cross-domain requests.

After the initial handshake, the socket is left open for communication. Your proxy server will need to understand this.

Web Socket Client API

The Web sockets API has a straightforward set of functions

  • new WebSocket(url)
  • onopen
  • onclose
  • onerror
  • onmessage
  • close
  • send
  • readystate (enum - Connecting, Open, Closing, Closed)

Web Socket Server Support

  • Many implementations are available
  • For .net developers, web sockets are only supported in IIS8 on Windows 8 (they require lots of changes to HTTP stack). NB IIS8 Express on Windows 8 does not support it.
  • .net 4.5 adds two new important members to the HttpContext class
    • HttpContext.IsWebSocketRequest
    • HttpContext.AcceptWebSocket
  • nuGet Microsoft.WebSockets package for ease of development.

When to use Web Sockets...

  • Only modern browsers support Web Sockets as a client API. “Polyfills for ws” such as socket.io do exist however.
  • Windows 8 is required to act as a web sockets server.
  • You won’t be able to host a Web Socket server in the Azure cloud until after Windows 8 has been deployed to the cloud.
  • Performance is good though - you control the messages sent.

SignalR. What? When? How?

SignalR is an async library for .net to help build real-time, multi-user interactive web apps

  • on github, v0.5
  • It is not an official MS project. Damian Edwards and David Fowler team leads (both from ASP.NET team)
  • Available on nuGet
  • Runs on Mono
  • Requires .net 4.0+
  • Uses Dynamic typing, TPL, jQuery and more.
  • Two types of connection (persistent like web sockets and hub)
  • Supports multiple methods of connecting (long polling, websockets, forever frame (IE only), server sent events
  • More than just one connection (seamless)

SignalR supports persistent connections with a similar API to WebSockets (see docs on github)

It also supports a more interesting Hub API

  • The Hub API is a RPC-like implementation.
  • It allows you to make method calls from server on the client and vice versa
  • It allows you to share variables between the two
  • Server implementation uses hub base class
  • Dynamic types are used for proxy
  • SignalR Clients vary how API is presented.

Demo

  • Methods are dynamic so intellisense is not available.
  • SignalR requires two scripts - jquery.signalr.js is the base library. signalr/hubs/ is where we implement the hub connections.
  • SignalR deals with calling pascal-cased methods on the server (e.g. Reflect) with camel-cased methods on the client (e.g. reflect)
  • Don't use a folder called signalr in you own app as SignalR reserves that folder for its own use and things won't work.

Notes on Implementing A Hub

  • You must wire up your client side code before connecting .
  • A new hub instance is required on each request so you can't store hub member variables to store state.
  • Transports can timeout and thus disconnect - makes for odd debugging experiences
  • Server has lots of concurrent connections so it will need optimizing. See github docs for a guide.

There are several SignalR clients - five are included with SignalR and there are several third party ones.

SignalR Hosting Options

  • Win/Mono
  • ASP.NET, Self Host, OWIN
  • It can scale out to Webfarms (Redis, Azure Queues)

WebSockets is an available transport for SignalR, but only on Win8 for reasons noted earlier. Also note that currently SignalR on Web Sockets is broken in the v0.5 build but there is a workaround available on the github site.

The slides for this presentation are available at  http://www.cwa.me.uk/?page_id=68.

DDD Southwest Session Notes 1 : Performance & Scalability

The first session of my day at DDD Southwest was a talk by Marc Gravell of Stack Overflow about how they approach performance and scalability issues. As with any talk specific to a work place or specific site, your mileage may vary but the two tools he demonstrated could certainly be used anywhere.


Performance Myth #1: Adding a server will make your site faster. This is NOT TRUE. If you add a server to your web farm or cluster, your site will not get served any faster. Websites do not trouble the CPU very much. Indeed, StackOverflow runs at 10% CPU load give or take. That's for a website serving ~7 million page views a day according to Quantcast.

Profiling

When a performance-related issue is logged on a site, sometimes it is possible to use a coarse-grained event log or profiling tool to find and diagnose the issue. Most of the time, this is not possible.

Of course, the finer-grained the level of profiling on your live site, the greater the overheads and the slower the actual site. Beware net admins with guns. Especially NRA members (such at those at StackOverflow :-)

One tool written by the StackOverflow team to address this issue is called (MVC) MiniProfiler, a very lightweight, almost zero friction profiler for websites.

Install it into your site using nuGet

PM> Install-Package miniprofiler

This adds two new references and two new files to your web project. Simply uncomment the call to RenderIncludes in your layout.cshtml page to have MiniProfiler start working. On each page, MP throws in basic profile info in top left. Clicking on the MP tab will display the time for each function to complete before the rendering of the page.

MiniProfiler appears in the top left fo your browser window

MiniProfiler can also monitor database requests by wrapping your DbConnection object or DbContext object if your site uses Linq2SQL or the Entity Framework.

DbConn conn = new SqlConn(...); 
conn = new ProfiledDbConnection(conn, MiniProfiler......); 
// do something with profiled connection

When run MP now also displays the number of SQL commands sent to db and what they were (formatted nicely). It will also alert you if a page is running duplicate commands or other (n+1) scenarios which you can go back to improve. Look for the ! in the MP smart tag which will appear if this is happening. MP also has a share button (provider-based) for passing the profiling info to others.

Access to MP can be configured in App_Start/Miniprofiler.cs. By default, MiniProfiler appears onscreen only if the page request is to localhost but you can change that to something role-based, page-based etc.

  • Q\A : MP should work with any MVC ViewEngine as it just wraps the ViewEngines much like the DB connections. See A-S/MP.cs for code.
  • Q\A : MP can't really account for the 'works on my machine' / 'fails on production environment' scenarios
  • Q\A : MP also works with an AJAX update. MP adds 
    extra timings for the async call down the left hand side fo the screen.
  • Q\Ans: You can use the DBProfiling side of it without the Web stuff. Ask the prfiling object to get the db data and then work with it.
  • Q\A : MP would work with webservices and CMD apps but you need to think about how and when MP would show its info.
  • Q\A : MP doesn't currently support async multi-threaded stuff....

Introducing Dapper, A Read-Focused Data Helper class

StackOverflow was originally built using LINQ2SQL in .NET 3.5 When .NET v4.0 came out, LINQ2SQL appeared to stall reasonably frequently. A database request would suddenly take 400ms and not 4ms. The LINQ library wasn’t open source, so SO had to diagnose and then figure out how to go around the problem?

  • Tried their own sql generation trees
  • Tried just executing raw SQL rather than letting L2S generate it. That didn't work either.

In the end they wrote their own data access stack called Dapper. Heavily favouring reads. 1in500-ish commands in Stack Overflow are updates. The rest are reads. Dapper has a very similar syntax to LINQ Context queries but uses anon objects to pass parameters to queries. The speed increases in Dapper are the improvements in the materialiser (getting the field names etc and pushing them into the object) which appears to have been the issue in LINQ2SQL v4

  • Q/A : Dapper objects are not connected to the LINQ context object so calling SubmitChanges doesn't work. SO still uses L2S for writes mostly (although Dapper is starting to do writes also).
  • Q/A : Dapper wraps generic Connection objects. So it should work over any .net conn object as long as the syntax for that particular DB is covered (diff plsql syntax over t-sql etc)
  • Q/A : Dapper source includes perf tests against several other ORMs (run in release mode)

StackOverflow uses Redis for caching. You could also use it for session state etc. Redis is a hi-perf key-value store.

In conclusion, Marc offered some simple rules for improving the performance of your data access routines:

  • Learn SQL - dont rely on LINQ
  • Keep it simple
  • Cache , cache,cache
  • Investigate other tools – other ORMs, write your own? e.g. a noSQL store for Caching, Session such as Redis
  • Check the serialization - how much is serialized? Can it be reduced etc.