HTML Tidy bindings for Python (PyTidyLib)

A Python wrapper for HTML Tidy, which allows you to convert most invalid (X)HTML markup into valid markup. E.g. this Python tidy library will correct unescaped ampersands, unclosed tags, missing elements, missing attributes, etc. HTML Tidy is highly configurable; it can output HTML or XHTML, and perform other functions such as converting named entities to numeric entities (named entities work only along with an HTML or XHTML doctype; numeric entities work in generic XML data).

The importance of web standards and validating HTML has been covered in books such as Zeldman's Designing with Web Standards, Cederholm's Web Standards Solutions, Murphy & Persson's HTML and CSS Web Standards Solutions: A Web Standardistas' Approach (all recent editions), and others. HTML Tidy is not a replacement for web-standards knowledge. However, you can be sure that what you run through it will very probably validate, and it's great for cleaning up input from third-party sources as well as in some cases your own code.

PyTidyLib is released under the MIT license.

Recent changes

0.2.1: Supports custom unicode subclasses, not just unicode objects. Better unit-test coverage. Thanks: Greg Phillips.

0.2.0: Now supports 32- and 64-bit Windows, with the proper DLL. (Also supported are Linux & BSD platforms including OS X, and 64-bit versions of same.) Major documentation update and minor cleanups. Thanks: Kevin A.

Installation using pip or setuptools

You will need to download HTML Tidy source or binaries for your platform either from the HTML Tidy web site or, for 32- or 64-bit Windows, from int64.org. Then:

pip install pytidylib

Or:

easy_install pytidylib

Usage example

from tidylib import tidy_document
document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
    options={'numeric-entities':1})
print document
print errors

Links

Feedback

Please direct all suggestions or responses to Jason Stitt at js@jasonstitt.com.

Share this content

Comments

Avatar picture for inquibiny Following the Second World War, the pursuit of pleasure domains the world atmosphere, <a href=[Link to www.saclancel-soldes.com>Lancel<]; (Lancel) to adapt promptly for the demands of the period of time to adjust their ownmethods to <a href=[Link to www.saclancel-soldes.com>vente] privee lancel</a> Inexpensiveimprovement. Meeting the leisure culture in 1950, resulting in soft bag sequence <a href=[Link to www.saclancel-soldes.com>sacs] lancel</a>. The 1stmodest pocket of the <a href=[Link to www.saclancel-soldes.com>soldes] lancel</a> on the cover of the sequence, but also for the first time the production practice of the mixture of nylon and leather, to ensure the colour parts and tone leather nylonblended with organic, <a href=[Link to www.saclancel-soldes.com>lancel] pas cher</a>,<a href=[Link to www.saclancel-soldes.com>sac] lancel</a> Theexquisite and refined style are deeplyrespected.
Avatar picture for chandesh mehta did
from tidylib import tidy_document
gives me an error as given below.

OSError: Could not load libtidy using any of these names: libtidy,libtidy.so,libtidy-0.99.so.0,cygtidy-0-99-0,tidylib,libtidy.dylib,tidy

what should i do??
Avatar picture for chandesh mehta I got the answer.

Just install libtidy-dev and fix the error.

Post Comment

All comments are personally reviewed and must be:

  • On-topic
  • Courteous
  • Not self-linking or spam
(Optional. This is your one self-link.)