Unstructured Core Library

The unstructured library is designed to help preprocess structure unstructured text documents for use in downstream machine learning tasks. Examples of documents that can be processes using the unstructured library include PDFs, XML and HTML documents.

Library Documentation


Instructions on how to install the unstructured library on your system.

Getting Started

Check out this section to learn about basic workflows in unstructured.


Learning more about partitioning, cleaning, and staging bricks, included advanced usage patterns.


Examples of other types of workflows within the unstructured package.


We make it easy for you to connect your output with other popular ML services.