In order for an automatic system to answer queries like 'birds with short beaks and blue wings' or 'planes with engines on the nose' one would expect underlying representations of these categories via their parts and attributes. However, building such models is challenging because exhaustive labeling of these parts and attributes can be very expensive. In this talk I'll present two projects that aim to discover these from weak annotations that can be effectively collected via crowd-sourcing. The first aims to discover parts that represent discriminative patterns from sparse landmark annotations. These parts which we call 'poselets', examples of which include faces of humans, or wheels of bicycles, etc., can serve as a basis for a range of recognition tasks such as detection, segmentation, pose estimation and attribute recognition, outperforming the state of the art in some. I'll also describe some recent work that simplifies this annotation task even further, and extending it to categories for which landmarks are hard to define. The second aims to discover describable attributes suitable for fine-grained discrimination. We propose a novel annotation task which consists of asking annotators to describe the differences between images and develop a structured topic model to analyze these descriptions. The output of this are clusters of words into parts and modifiers, and relations between clusters that represent attributes.