Techniques and Strategies to Break-up Large Classes

Introduction

Typically, features are added to software systems as little tweaks. This usually involves adding a little code to existing methods and/or adding a few more methods to existing classes. It is very tempting to just make the changes to existing classes, because at first glance it seems easier to just add the code to an existing class.

This seemingly 'easier' way of making changes can lead to serious problems such as:

When you keep adding code to existing classes, you end up with very long methods and large classes.
The software turns into a swamp.
It takes a long time to understand how old features work.
It takes more time to understand how to add new features.

This article highlights some of the problems with large classes, explain why they should be avoided and techniques/tactics to break-up an existing LARGE class and get it under unit tests.

Problems with Large Classes

What are the problems with big classes?

1). Large classes cause confusion.

If a class contains 50 or 60 methods, it is difficult to get a sense of what you have to change and whether it is going to affect anything else.
If a class have lots of instance variables, it's hard to know what the effects are of changing a variable.

2). Large classes cause problem with task scheduling

If a class has 20+ responsibilities, then more than likely it will have to be changed while adding features.
Several programmers maybe adding different features and have to work in the same huge class with 20+ responsibilities.
If they work concurrently on the same huge class, this can lead to serious trashing and code conflicts.

3). Big Classes are difficult to test

Classes that are too big, often hide too much.
Encapsulation is good when it helps us reason about code. But when we encapsulate too much in a single class, the stuff inside rots and festers.
There is no easy way to sense effects, so developers fall back to "Edit and Pray" programming. In this case changes will end up taking too long or bug counts increases.
You will pay for the lack of clarity typically in big classes.

How to Make Changes without Making a Big Class worse

Confront with a big class, how can the you make changes and add features without making things worse?

1). Use Sprout Class and Sprout Method techniques:

When you have to make changes, you should consider putting the code into a new class or method.
Sprout Class really keep things from getting worse.
Sprout Method involves adding code to the existing big class, but it is also identifying and naming another thing that the class does. Later this can help to pull those methods into smaller classes.

2). The key remedy for big classes is refactoring.

You can breakdown big classes into a set of smaller classes.
You will need to figure out what the smaller classes should look like.

3). Use Single-Responsibility Principle (SRP).

Use SRP to figure out what the smaller classes should look like.
SRP states that, every class should have a single responsibility. Each class should have a single purpose in the system.
There should be only one reason to change a class.
SRP defines the "main purpose" of the class.

4). How to Find class responsibilities.

The name of a class should give an indication of its responsibility.
Look at method names and see if there is a natural way to group the names of methods.
Identify entry points to the class.
Big class usually means that the class have too many responsibilities.
In the real-world of big classes, the key is to identify the different responsibilities and then figure out a way to incrementally move toward more focused and smaller classes each with a Single Responsibility.

5). Identifying Responsibilities within a class.

Ask the questions : "Why is this method here?" and "What is it doing for the class?"
Use "Group Methods" technique to grouped the methods based on the answers to the questions in a list, putting together methods that have similar reasons to be in the class.
Learning to identify responsibilities is a Design Skill. It take practice.
Learn how to identify responsibilities and how to separate them well.
Legacy code offers more possibilities for the application of design skills , than new features do.
You can reason about design trade-offs when you see the code that will be affected.
It is easier to see whether structure is appropriate in a given context because the context is real and there in front of you.
In identifying responsibilities in big a class, you are not inventing them, you are just discovering what is already there in the class.
All code does identifiable things - no matter how bad the code is. Sometimes it will be hard to identify what the code does, but there are techniques that can be used to help.
The more you start noticing the responsibilities inherent in the code, the more you learn about it.

Techniques for Identifying Responsibilities in Existing Code

1). Group Methods

Look for similar methods in the code.
Write down all the methods on a class including the access type.
Find methods that seems to go together and group those.
You don't have to categorize all of the methods into new classes. Instead you should just find methods that look like they are part of a common responsibility.
Wait until you have to make a change to a method you have categorized and then decide if you should extract a new class or not.
Put up poster of big classes with lots of methods and have team members see if they can group the methods over time.

2). Look at Hidden Methods

Pay special attention to private and protected methods.
If a class has many private and or protected methods, it often indicates that there is another class in the class dying to get out!
Big classes can hide a lot.
In general, if you have to test a private method, the method shouldn't be private.
If making the method public bothers you chances are it is because it is a part of a separate responsibility and should be in another class.

3). Look for Decisions that Can Change

Look for decisions that you have already made.
Is there some way of doing something (e.g. Connecting to a database, or web service,...) that seems hard-coded. if so, can this be changed.
Paying attention only to the name of a method doesn't tell the full story.
Big classes tend to contains methods that does things at various levels of abstraction. These methods does more than one thing.
In such cases, you should do a little extract method refactoring before really settling on classes to extract.
What method to extract, can be determined by looking at how many things are assumed in the big methods code.
Is the code calling methods from a particular API? Is the code assuming it will always be accessing the same database?

if so, then extract methods that reflect what you intend at the higher-level.
if you are getting information from a database, then extract a method Named after the information it is getting.
These extractions can lead to many methods, but method grouping will be easier
Often resources are completely encapsulated behind a set of methods.
When you extract class from them, you'll have broken some dependencies on low-level details.

4). Look for Internal Relationships

Look for relationships between instance variables and methods.

Ask questions like, are certain instance variables used by some methods and not others?
Very few classes have methods that all use all instance variables.
Usually there is some sort of lumping of methods and instance variables.
Two or three methods might be the only ones that use a set of three variables.
Often the method names will help you to see these relationships.

Use "feature sketches" diagrams to make sketch of the relationships inside a class. To create feature sketch diagram:

Draw a circle for each instance variables.
Draw a circle for each method.
Draw a line from each method circle to the circles of any instance variables and methods that it accesses or modifies.
Add an arrow on the line that point in the direction of a method or variable that is used by another method or variable.
Skip the constructors.
Draw a big circles around the set of variables and methods with lots of connections. Typically, these are candidate for new classes.
Before creating a new class, figure out whether the new class has a good, distinct responsibility and can be named.

5). Look for the Primary Responsibility

Try to describe the responsibility of the class in a single sentence.
Remember that SRP says "a class should have a single responsibility".
Therefore this single responsibility should be easy to write down in a single sentence.
Apply this technique to a big class in your current system.

As you think about what a client need from the class, add clauses to the sentence. E.g. The class does this, and this, and that, etc...
Is there any one thing that seems more important than anything else. If so, then this could be the key responsibility of the class
The other responsibilities should be factored out into other classes.

SRP can be violated at the interface and implementation levels.

SRP is violated at the interface level, when a class presents an interface that makes it appear that it is responsible for a very large number of things.
SRP is violated at the implementation level, when the class have more than a single responsibility.

The focus should be more on SRP violation (avoidance) at the implementation level.

Is the class really handling that many responsibilities or
Is it just delegating to other classes to do the work.
If it is just delegating then we don't have a large monolithic class; we just have a facade, a front end for several little classes, which can be manage easier.

Apply the Interface Segregation Principle (ISP).

ISP is a technique of making an interface for grouping of methods in a big class for a particular set of clients.
When a class is very large, rarely do all its clients use all of its methods.
Particular clients tends to use different grouping of methods.
If we create an interface for these groupings, then each client can see the 'big class' through that particular interface.
This helps to hide information and also decreases dependency in the system.
The clients no longer have to recompile whenever the big class does.
When there are interfaces for particular sets of clients, code can be move from the big class to a new class that uses the original class.

Be careful, applying ISP refactoring is harder than it sounds.

More methods must be exposed in public interface of the original big class, so that the new class will have access to everything it needs to work.
Client code has to be changed to use the new class rather than the old big one.
To do so safely, tests must be around those clients.
However, refactoring using ISP, helps to whittle away at the interface of a big class.

6). When All Else Fails, Do Some Scratch Refactoring

If you are having a lot of trouble seeing responsibilities in a class, do some "scratch refactoring".
Remember the things seen when you scratch are not necessarily the things you'll end up with when you refactor.

7). Focus on the Current Work

Pay attention to what (the new feature, bug fix, etc) you have to do right now.
If you are providing a different way of doing anything, you might have identified a responsibility that you should extract and then allow substitution for.
It easy to become overwhelmed by the number of distinct responsibilities you can identify in a class.

The changes that you are currently making are telling you about some way that the software can change.
Often just recognizing that way of changing is enough to see the new code you write as a separate responsibility.

8). Other Techniques

Read more books about design patterns.
Look at other programmer's code. Take some time to browse and see how other people are doing things.
Look at open-source projects.
Pay attention to how classes are named and the correspondence between class names and the names of methods.
Over time you'll get better at identify hidden responsibilities and start to see them when you browse unfamiliar code.

Moving Forward After Identifying all the Responsibilities in a Big Class
After identifying all the responsibilities in a big class, how do you move forward? Should you take a week and start to whack at the big classes in the system? Should you break them down into little bits? To answer these questions use the following strategies and tactics:

1). Strategies

Usually large 'refactoring binge' leads to system stability breakdown for a little while ( even with unit tests in place).

If it is early in the release cycle, then refactoring binge can be fine.
But, don't let the bugs dissuade you from other refactoring.

The best approach to breaking down big classes is to:

Identify the responsibilities.
Ensure all team members understands the responsibilities.
Break down the class on an as-needed basis. This helps to spreads out the risk of the changes.

2). Tactics

Often when breaking up large classes, at the beginning SRP can only be applied at the implementation level.

In these cases simply start by Extracting classes from big classes and delegate to them.

Introducing SRP at the interface level requires more work.

The client of the class must change
Tests must be written for them.

The tactics to extract classes from a big class depends on certain factors:

How easy it is to get tests around the methods that could be affected?
Take a look at the class and list all of the instance variables and methods that you'll have to move.
That list should give you an idea of what methods you should write tests for.

If you can get tests in place, you can extract class in a very straightforward way using Extract Class refactoring.
If you are not able to get tests in place, then move forward using a more conservative approach, outline in the following steps:

Identity a responsibility that you want to separate into another class.
Figure out whether any instance variables will have to move to the new class.

if so move them to a separate part of the class declaration (the "Moving Section"), away from other instance variables.

Extract bodies of whole methods to be moved to the new class to new methods.

The name of each method should be the same as its old name, but with a unique common prefix eg. "MOVING"
Make sure to "preserver signatures" for these methods.
Put the extracted methods in that section of the class next to the variables you are moving.

If only parts of the methods should be moved, then extract them from the original methods.

Use the prefix MOVING again for their names.
Put them in the separate "moving section" of the class.

Review the "Moving Section" of the class containing variables and methods to be moved.

Do a text search of current class and subclasses to make sure that none of the variables that you are going to move is used outside of the methods you are going to move.
DO NOT depend on the compiler, because of problems with variable shadowing.

Move All instance variables and methods in the "moving section" directly to the new class.

Create an instance of the new class in the old big class.
Rely on the compiler to find places where the moved methods have to be called on the new instance rather than the old class.

After completing the move and the code compile then remove the prefix "MOVING" from method names.

Lean on the compiler to navigate to the places where you need to change the names.

These steps are very involved, but necessary if you are in very complex and tangled code and you want to extract classes without tests in place.
Extracting classes without test, can cause:

Subtle bugs related to inheritance ( and method overriding).
If you move a method that overrides another method in a base class, the compiler will not complain. Callers of that method on the original big class will simple call the method with the same name in the base class.
A variable in a subclass can hide a variable with the same name in the superclass, so moving the variable from the subclass will make the hidden variable visible.

To get around method override problems: you don't move the original method at all. You create new methods by extracting only the body, but leave the actually method in place in the big class.
To get around variable shadowing: do a manual text search for usage of the variables before you use them. Remember to Just be careful.
When breaking down large class into smaller testable classes:

Be careful not to get too over-ambitious.
Remember the existing structure in place in the big class works.

It supports the functionality.
It just might not be tuned towards moving forward and adding new features.

Sometimes the best thing that you can do is formulate a view of how a large class is going to look after refactoring and then just forget about it.

You did it to discover what is possible.

Be sensitive to what is there and move accordingly, not necessarily an ideal design, but at least in a better direction.

REFERENCES:

Working Effectively with Legacy Code by Michael Feathers

DRPowell Apps - Great Mind, Original Ideas

Saturday, March 15, 2014