Lecture 13: Robotics and Vision

The Library of Dresan: Dr. Anthony G. Francis, Jr.'s Weblog

Introduction to Artificial Intelligence for Public Health

Rollins School of Public Health at Emory University Instructor: Dr. Anthony G. Francis, Jr.

Lecture 13: Robotics and Vision

Robotics is the development of general-purpose machines. Unlike industrial robotics, which focuses on manipulator design and control theory, AI robotics focuses on sensing, thinking and acting in mobile autonomous systems. Robotic architectures specify how the modules that perform these tasks are organized; architectural paradigms have evolved from deliberative through reactive approaches into a hybrid, multilayered approach. Vision is one of the most important robotic senses, involving image processing and scene modeling. Originally thinking involved deliberative means-ends analysis but now incorporates techniques from operations research. Robotic action has moved from planned sequences of actions through reactive control. Modern autonomous robots are used for everything from scientific exploration to cognitive experimentation to pure entertainment.

Outline

What is robotics?
Robotic Architecture
Sensing: Vision
Thinking: Planning
Acting: Reactive
Learn: Decision Theory
State of the Art: Cog, Asimo, Sojourner

Readings:

Artificial Intelligence: Chapters 6 and 10 (21, 22 optional)
Machines Who Think: Chapters 10 and 11

Robotics

What is robotics?

The design of general purpose machines
- Multifunction
- Programmable
- Often perform human tasks in a human-like way
Two major branches
- Industrial robotics for manufacturing
- Autonomous robotics for space exploration

Types of Robots

Industrial Robotics
- Programmable Arms
- Mobile Carts
- Teleoperated Systems
Autonomous Robotics
- Research Platforms
- Planetary Rovers
- Unmanned Autonomous Vehicles (UAVs)
- Spinoffs: Softbots and Immobots
Spinoffs
- Softbots: web agents, chatterbots
- Immobots: building HVAC systems, HAL
- Graphics: orcs, spectators
- Games: NPCs, opponents

Robotic Architectures

Information Processing in Robotics

Sense - Extract information from the world
Think - Decide what to do in the world
Act - Make changes to the world
Learn - Improve based on experience

Robotic Architectures

The Hierarchical Paradigm
- Examples: Shakey
- Stages in the paradigm:
  - Sense the world
  - Plan your actions
  - Act on your plan
- Relies on a world model representation
- Problems
  - Slow
  - Brittle
  - Ineffective
Major Contributions
- STRIPS operators
- World modeling
The Reactive Paradigm
- Examples: Walter's machine, Braitenberg's machines, Rodney Brooks' work
- Stages in the paradigm:
  - Act based on your sensations
- No explicit representation of the world!
- Problems
  - No foresight - easy to get trapped
  - Hard to program
- Major Contributions
  - Schema-based reactive control
  - Subsumption architectures
The Hybrid Paradigm
- Examples: Most modern robots
- Stages in the paradigm:
  - Active behaviors react to sensations
  - Dynamic controller configures behaviors based on plan
  - Planner updates plan based on changes in world
- Combines best of fast reaction and deliberative foresight
- Problems
  - No universal consensus on the "right" architecture
  - Learning is still being integrated into the paradigm
- Major Contributions
  - Reactive Action Packages (RAPs)
  - Three Level Architectures (TLA)
The Horizon: Learning and Modeling

Sensing

Issues

The Traditional Senses
- Vision - one of the most important robot senses
- Hearing - also important
- Touch - currently very simple
- Smell - not integrated into robotics
- Taste - little research
Other Senses used in Robotics
- Proprioception - position of body parts
- Kinesthetics - dead reckoning based on movement
- Location senses - e.g., global positioning system (GPS)
- Ranging senses - e.g., sonar, lidar, occasionally radar
Challenges of Sensation
- Dataflow: huge amounts of input data
- Processing: complicated algorithms
- Noise: grainy, low resolution cameras
- Errors: sonar holes, lidar spikes
- Ghosts: reflections (sonar or otherwise)
- Reality: shadows, textures, fog

Case Study: Vision

Stages in Vision
- Image Processing
  - Digitization
  - Feature Extraction
- Scene Analysis
  - Object Recognition
  - Motion Detection
Marr's Model of Vision
- 2D Image - the image as intensitiy values
- 2½ Sketch - extract the edges and regions
- 3D Model - geometric model of objects in scene
Extracting the 2½ Sketch
- Eliminating Noise
  - Images may contain lots of irrelevant details
  - "Smoothing" - averaging over pixes - makes algorithms more robust
- Finding the edges
  - Edges:
    - Capture strong transitions in color, intensity or texture
    - Do not exist in nature - must be extracted from the image
  - First derivative of an image shows the peaks where things change fast
  - Second derivative shows the zero crossings that represent edges
- Convolution: Implementing smoothing and edge-finding
  - Apply a pixel-combining operator over a whole image
  - Convolution is extremely processing intensive!
  - Neuromorphic chips under development promise improvements
- Common Image Operators
  - Identity - I(x,y) - the original image
  - Gaussian - G(x,y) - the "blur" operator
    G(x,y) = (1/2πο²)*e^((x²+y²)/(2ο²))
  - Lapacian - the "edge" operator
    L(x,y) = ∂²/∂x² + ∂²/∂y²
    - Performs the second derivative in all directions
    - Turns edges into zeros
    - Still sensitive to noise
  - Sombrero - effective edge+blur operation
    S(x,y) = ∂²G(x,y)/∂x² + ∂²G(x,y)/∂y²
  - Zero-crossing - find the edges in an image
    MH(x,y) = (S(x,y)==0)
  - AKA "Marr-Hildreth Operator"
  - Combines smoothing, edge enhancement, and edge extraction
  - Similar to processing that goes on in retinas!
  - Threshholding: Accept values only over a certain cutoff
  - Other, more sophisticated operator exist
- Detecting Nonlocal Features
  - Region Detection: combine pixels that are similar into areas
  - Line Detection: aggregate zero crossings into lines
  - Texture Detection: look for repeated patterns
Extracting the 3D Model
- Labeling Intersections to Model Surfaces
- Building World Models out of Generalized Cylinders
- The Object Libarary
More Advanced Vision Models
- Feedback between stages improves performance
- Stereo vision combines information from multiple sources
- "Cognitive" and "4D" vision models exploit optical flow information
- Dynamic gaze with a high-resolution fovea can resolve ambiguities

Thinking

Issues

Dynamic worlds - cannot sit and think forever
Unreliable inputs - cannot trust your sensors
Unreliable outputs - cannot trust your actions
Unclear which actions are even appropriate

Case Study: Planning

Early Planning: The General Problem Solver

Planning Method: means-ends analysis
Knowledge: difference table ordered problems and operations that resolve them
- Type of difference resolved
- Type of operation
- Precondition of operation
- Add list for conditions added
- Delete list for differences removed

Example:

Difference	Operation	Precondition	Delete	Add
< 100 yards	Walk	At Start Not Raining	At Start	At Destination
< 100 miles	Drive	At Start Have Car	At Start	At Destination
< 100 miles	Taxi	At Start Have Money	At Start Have Money	At Destination
> 100 miles	Fly	At Start Have Ticket	At Start Have Car Have Ticket	At Destination

Issues: Sussman anomaly was too hard to solve
- Initial state: block B on floor, block C on block A
- Goal state: block A on block B on block C
- Problem: MEA would put B on C, then remove B from C, C from A,

Linear Planning: STRIPS
- Planning Method: search in state space
- Knowledge: STRIPS operators encapsulated knowledge in earlier difference table
  - Operator name and variables
  - Precondition list
  - Add list for conditions added
  - Delete list for differences removed
- Predicate calculus can be used in precondition, add and delete list
- Example:
```
put block1 block2
    pre: clear block1
         clear block2
    add: on block1 block2
    del: clear block1
         clear block2
    	 		
```
- Issues:
  - Still could do a lot of extra work
  - Expensive to come up with the best plan
  - Made unnecessary commitments to action ordering
Nonlinear planning: NONLIN, UCPOP
- Planning Method: search in plan space
- Knowledge: STRIPS operators
- Store partial plans as lists of dependencies based on operations
- Example: If you need to get car keys, wallet and ticket to fly to the airport, it doesn't matter which you pick up first
- Issues:
  - Still uses a low-level grain of detail
  - Does not address the issue of unreliable action
Hierarchical planning: SIPE
- Planning Method: search in space of task networks
- Knowledge: Decomposable operators:
  - Look like STRIPS operators
  - High-level versions do not capture many details
  - Can be decomposed into lower level operations
- Example: If you need to get car keys, wallet and ticket to fly to the airport, it doesn't matter which you pick up first
  - High level operation: Fly from start to destination
  - Decomposition: Fly: get ticket, get bags, drive to airport, get on plane ...
  - Decomposition: Get Ticket: go online, select start, select destination ...
- Issues:
  - Current frontier in deployed planning systems
  - People make money with this
  - Extremely powerful formalism
  - Theoretical results now just coming in
Graph Planning: GRAPHPLAN
- Planning Method: construction of planning graph
- Knowledge: STRIPS operators
- Planning graph encapsulates whole familes of operations
- Issues:
  - Current frontier in theoretical planning systems
  - Generalizes over nonlinear planning approaches
  - Now extending to handle range of existing systems
Decision-Theoretic Planning
- Planning Method: partially observable markov decision processes
- Knowledge: probabilistic relationships
- Planning now over states of possible actions
- Issues:
  - Current frontier in robotic planning systems
  - Incorporates learning directly into framework
  - Now extending to handle range of existing systems

Acting

Issues

Responsiveness is crucial
Effectors are unreliable
Thinking is not always needed

Case Study: Schema-based reactive control

Control method: sensor-based vector computation
Knowledge: behaviors:
- releaser that determines when behavior is active
- perceptual schema determines what to pay attention to
- motor schema: determines what to do in world
Motor schema: often implemented as superimposable vector fields
- Wandering: random motion
- Approach: vectors towards target
- Avoidance: vector away from target
- Navigation: a "lane" of forward motion with wall repulsion
- Formations: compute based on vectors of opponents
Issues:
- Can be computed quickly from sensor data
- At low level, vulnerable to box canyons and other traps
- Can be orchestrated by higher level planning systems
- Can be learned through a variety of approaches
  - Neural networks
  - Genetic algorithms
  - Decision theory
  - Case-based reasoning