Inheriting a
Sloppy Codebase

A Practical Guide to Wrangling Chaotic Code

Casey Kinsey

@cordiskinsey

github.com/ckinsey

linkedin.com/in/caseykinsey

heapsortjobs.com/casey

hirelofty.com

http://github.com/loftylabs

heapsortjobs.com

@heapsortapp

Get These Slides

hirelofty.com/djangocon14

Where does sloppy
code come from?

  1. Inexperience - Not our topic today.
  2. Emphasis on development speed
  3. Lack of long-term ownership

Rapid prototyping is a lie.

Or at least what we're calling "prototyping".

In the rest of the
engineering world



A prototype is a proof of concept. Prototypes never see production.

In the Software
Engineering World



Prototypes go into production all the time.

You're Gonna
Need Some Tools

A Good Editor/IDE

  • Code navigation
  • Code introspection
  • Refactoring tools



PyCharm, PyDev (Eclipse), emacs, and more

Debuggers

(Django Debug Toolbar Doesn't Count)

There will be much debugging. If you're not familiar with a Python debugger, choose one and get comfy with it.



pdb, ipdb, pudb, and many more.
Bonus Points: Have one integrated with your IDE.

Coverage.py

Super easy to use:


$ coverage run manage.py test        		       
            		        

Here's a basic .coveragerc for Django:


[run]
source = .
omit = *migrations*,*wsgi.py,*settings.py       		        
            		        

Generate a report:


$ coverage report         		        

Name                                 Stmts   Miss  Cover
--------------------------------------------------------
campaign/__init__                        0      0   100%
campaign/admin                          27      5    81%
campaign/forms                          38      2    95%
campaign/middleware                      0      0   100%
campaign/models                         89      2    98%
campaign/tests                         139      0   100%
campaign/views                          99      6    94%
--------------------------------------------------------
TOTAL                                  392     15    96%        	    	        
            	    	    

PyLint and pep8

  • PyLint helps clean up poorly structured code. Very useful during refactors.
  • pep8 will keep your code legible.
    It will also make you irreversibly anal.



Bonus Points: Have these integrated with your IDE.

Cool. You've just gotten
your first look at the sloppy codebase.

Don't. Touch. Anything.

Everything is Hot Lava

You're Gonna Do Some Prep-work

Do This First


$ git checkout -b original_project master
                        

Now, when you want to see the way the code originally worked:


$ git diff original_project master -- myapp/models.py                        
                        

And once everything is all cleaned up... Bragging rights:


$ git diff original_project..master --stat
                        

Fire It Up

The goal is to get the project running in its last known working configuration.

  • Create a virtual environment. But don't get attached to it.
  • If the project shipped with a dependency list, use it.
  • If the project shipped with tests, try to get them running.
  • Be sure to freeze your dependencies into requirements.txt if you added new ones.

You're Gonna Need
Test Coverage

We're not going to go too crazy here

  • Integration test coverage for high-level Django concepts.
  • Integration and/or unit test coverage for low-level business critical processes.

Integration Tests

For our purposes, an integration test end-to-end test of the Python/Django code.

Our goal is coverage, first and foremost. We want to catch landmines and see that refactors in one module do not unexpectedly affect another.

Integration Tests: An Example


class HeapsortIntegrationTests(HeapsortIntegrationTestCase):

    def test_home_anon(self):
        """
        Tests the homepage view works for anonymous users
        """

        # Unauthenticated
        self.client.logout()
        response = self.client.get(reverse('home'))

        self.assertEqual(response.status_code, 200)
        self.assertTemplateUsed('home.html')
                        

Unit Tests

Our goal is not to achieve full unit test coverage. Unit tests will be largely negated by upcoming refactoring.

Instead, we'll focus unit testing efforts to core components where accuracy is critical.

Unit Tests: An Example



def test_jobs_expire_after_30_days(self):
    """
    A job should no longer be returned by the manager when it expires
    """

    now = datetime.datetime.now()
    the_future = datetime.timedelta(days=30)

    datetime_mock = mock.Mock()

    with mock.patch.multiple('jobs.models', datetime=datetime_mock):

        datetime_mock.datetime.now.return_value = now

        # Create a job.  The job should be returned by the manager
        test_job = self._create_job("My Test Job")
        self.assertTrue(test_job in Job.active.all())

        # Fast-forard to... THE FUTURE
        datetime_mock.datetime.now.return_value = now + the_future
        self.assertFalse(test_job in Job.active.all())

                

Test Coverage: How much is enough?

  • Integration test coverage for all Django views and user paths, management commands, 3rd party APIs.
  • Unit test coverage for business critical processes and non view-logic.
    • Billing processes
    • Transactional communication
    • Asynchronous tasks
  • Coverage > 95% is ideal. No individual files left uncovered.

It's Time to
Start Hacking

There's No Telling
What You'll Find

The Patchwork Quilt of
Dependencies Pattern


    $ git diff --stat original_initial_product development -- requirements.txt
     requirements.txt | 135 +------------------------------------------------------------------
     1 file changed, 2 insertions(+), 133 deletions(-)
                        

All Dependencies Introduce Complexities

Whether or not to use a 3rd party module is a matter of determining if the ends justify the means

  • Each dependency is a potential obstacle to upgrades.
  • Low-level Django extensions can conflict and make debugging a nightmare.
  • 3rd party modules installed from VCS make no stability or availability guarantees.

Trim the Fat

Remember that virtual environment you created?

Good. Now throw that shit away.

Install the absolute bare minimum of requirements you know you need. (Django, South, database drivers, APIs that drive core functionality)

Audit each other requirement one-by-one

Research each requirement and know exactly what it is, how it works, and why it is used. They will fall into one of three categories.

  1. Totally unnecessary, or even unused.
    • Remove these now.
  2. Potentially unnecessary, but requires refactoring to remove.
    • Make a judgement call.
  3. Necessary functionality for the project.

Test and Upgrade

Run the tests! If you missed something,
go back to the previous step.

Once your tests are passing, this is a good
time to upgrade packages.

Upgrade. Test. Repeat

The Monolithic App of Death Pattern

One app to rule them all.

Monoliths are an
organizational nightmare.

  • You cannot succinctly evaluate the
    functionality of any one component.
  • Nothing is portable.
  • Usually rife with
    from my_app import *

Refactoring Monoliths

  • Create a sane app structure.
  • Kill off 'import *' and let PyLint do its thing.
  • Migrate models to their new homes.
    • Everything else will follow.

Migrating Models Across Apps

If the project is pre-production, you can get away with automatic schema migrations.



The "Every Model is
an App" Pattern

"I like my INSTALLED_APPS to be a list of database tables."

The "Every Model is
an App" Project

This type of structure is another organizational nightmare,
for similar reasons to a monolith but due to an
opposite design pattern.

  • You cannot succinctly evaluate the functionality
    of any one component.
  • Nothing is portable.
  • You'll get very acquainted with:
    
    Traceback (most recent call last):
      ...
      File "/app1/foo.py", line 1, in <module>
        from app2 import bar
      File "/app2/bar.py", line 1, in <module>
        from app1 import foo
    ImportError: cannot import name foo   
                                

The "Every Model is
an App" Project

Follow the same steps as a monolith. The challenges are the same; create an organized structure, migrate models across apps into their new home, and everything else follows.

This should be less work than a monolith,
as one of your apps will become the 'host' for others.

The "Receivers Everywhere!" Pattern

It's like a Hail Mary.

Signals/Receivers Are Useful

  • Signals allow for a decoupling of models
    which initiate side effects.
  • They are especially useful for triggering
    side effects in 3rd party code.
  • Signals are great for maintaining data integrity.

... But Not Always

  • Signal receivers can hide important functionality
    from developers.
  • Signal receivers can become unnecessary
    abstraction between two models in the same application.
  • Signal receivers can implement over-zealous business logic.

Getting Rid of Superfluous
Signal Receivers

Be on the lookout for:

  • Signal receivers that interact between two
    database related models.
  • Signal receivers in which the instance simply
    operates on itself.
     
class MyModel(models.Model):
    # ...
    def save(self, *args, **kwargs):
        # Pre-save!
        self.some_property = True
        
        super(MyModel, self).save(*args, **kwargs)
        
        # Post-save!
        self.foreign_relation.some_poperty = True
        self.foreign_relation.save()
        
                            

Getting Rid of Superfluous
Signal Receivers

Be on the lookout for:

  • Signal receivers that implement transactional
    or business logic.

class UserReceiver(ModelReceiver):
    @staticmethod
    def pre_save(sender, **kwargs):
        instance = kwargs['instance']
        instance.notify = instance.pk is None
        
    @staticmethod
    def post_save(sender, **kwargs):
        instance = kwargs['instance']
        if instance.notify:
            NewUserEmail().send(extra_context={'user': instance,}, 
                to=constants.NOTIFY_EMAILS)
                                

The "Context Processors/Middleware Are the Coolest Thing Ever" Pattern

Taking DRY to the Max

... (reminds me of when I discovered list comprehensions in Python)


def publish_all():
    # NOT SURE WHY IM DOING THIS
    # BUT IT IS COOL AS HELL!
    [p.publish() for p in Post.objects.all()]                                    
                        

Context Processors

Context Processors are widely abused and can quickly impair performance. You should evaluate each context processor in the project and see if:

  • 1. The context processor is better suited as a templatetag.
    
    # in myapp.context_processors.py                                
    
    from myapp.utils import products_by_category
    def products(request):
        return {'products_by_category': products_by_category()}
                                    
    
    # in myapp.templatetags           
    
    from django import template
    from myapp.utils import products_by_category
    
    register = template.Library()
    
    @register.assignment_tag(takes_context=True)
    def get_products_by_category(context):
        return products_by_category()
                                    

Assignment Tags are effective here.

  • 2. The context processor logic should be refactored into relevant views.
    
    # in myapp.context_processors.py                                
    
    from myapp.utils import products_by_category
    def products(request):
        return {'products_by_category': products_by_category()}
                                    
    
    # in myapp.views
    
    from myapp.utils import products_by_category
    
    class MyView(DetailView):
    
        def get_context_data(self, **kwargs):
            context = super(MyView, self).get_context_data(kwargs)
            context.update({'products_by_category': products_by_category()})
            
            return context
                                    

Middleware

Not as widely abused as Context Processors, but potentially have larger performance implications.

Do I need this logic to execute on every single request and/or response?

If not, you should engineer a way around.

The Undocumented Black Box Full of Black Boxes

# Who put all these hashtags in my code?

The only thing worse than code with no comments...


def get_classification(self):
    if self.classification and len(self.classification) == 2:
        level = CLASSIFICATIONS.get(self.classification[0], '')
        class_level = self.classification[1]
        if class_level == 'U':
            class_level = ''
        else:
            class_level = 'class %s '%class_level
        string = '%s%s'%(class_level, level)
        return dict({'level':level, 'class': class_level, 'string':string,})
    else:
        return None
                        

... is code with just enough comments to piss you off


def get_classification(self):
    if self.classification and len(self.classification) == 2:
        level = CLASSIFICATIONS.get(self.classification[0], '')
        class_level = self.classification[1]
        if class_level == 'U':
            class_level = ''
        else:
            class_level = 'class %s '%class_level
        string = '%s%s'%(class_level, level)
        # returns dict
        return dict({'level':level, 'class': class_level, 'string':string,})
    else:
        return None
                        

There's No Easy Way Out of Undocumented Code

You could try and brute force your way through it, but that's impractical.

Document as You Go

Make it policy: If you edit the code, you document the code. If you have to dissect code to understand it, comment what you find.

Document as You Go

  • Every module, function, class, and method
    get a docstring. No exceptions!
    • These are the workhorses of your
      documentation effort.
  • Break apart long, multi-step logical
    branches with comments.
  • If a statement is difficult read or has some
    sort of ambiguity, leave a comment!

Other Gotchas to Watch For

  • Hard-coded URLs
    • Try and chase these down before refactoring.
  • Broad Try/Except Blocks
    • try: except: pass <-- This ain't @PHP
  • VCS Requirements
    • Fork 'em! Whoever owns the project
      should own the repo.
  • Misnamed Concepts
    • Bite the bullet and rename them.
  • Premature Configuration
    • Pre-configured code overcomplicates
      deployments.

Key Concepts

  • Tests first.
  • Dependencies introduce complexity.
  • Organization is important.
  • Watch for hidden signals and receivers.
  • Audit context processors and middleware.
  • Document as you go.

Thank You!

http://hirelofty.com/djangocon14



Casey Kinsey

casey@hirelofty.com

@cordiskinsey