Discussion about this post

User's avatar
Orkun's avatar

*(children cannot be copy/pasted)* peak daveshap experience

thanks for insights as always!

Expand full comment
Ronald Dupuis's avatar

Humans naturally integrate multiple types of sensory information and experiences:

Multi-modal learning: As you noted, a child experiences dogs through sight, sound, touch, smell, and even interactions - they get a rich, multi-sensory understanding of what "dog" means

Context: Children see dogs in different situations - at home, in parks, on walks - learning about their behaviours and roles in human life

Temporal understanding: They observe how dogs move, play, and interact over time, not just static snapshots

Social learning: They also learn about dogs through language, stories, and how other people interact with dogs

Current AI systems are often trained on just one type of data (like images) in isolation. While there is progress in multi-modal AI that can process both images and text, or video and audio together, these systems still don't integrate information in the rich, embodied way that humans do naturally from birth.

This "embodied cognition" - learning through multiple senses and physical interaction with the world - might be one of those key algorithmic differences the tweet is talking about. It could help explain how humans can build such robust understanding from relatively few examples, compared to AI systems trained on millions of isolated data points.

Expand full comment
5 more comments...

No posts