|
|
|
Gio Wiederhold |
|
November 2003 |
|
|
|
|
|
Obtaining relevant data |
|
Always incomplete |
|
Extracting relationships |
|
Imputing causality |
|
Finding applicability |
|
Determining leverage points |
|
Inventing candidate actions |
|
Assessing likely outcomes and benefits |
|
Selecting action to be taken |
|
Measuring the outcome |
|
Ý Collecting data for next
round |
|
|
|
|
|
Database administrators |
|
Focus on data collection, organization, currency |
|
Analysts |
|
Focus on slicing, dicing, relationships |
|
Middle managers |
|
Focus on their costs, profits |
|
MBAs |
|
Focus on business models, planning |
|
Executives |
|
Must make decisions based on diverse inputs |
|
|
|
|
|
|
|
|
Two choices |
|
(rare) Collect data specifically for analysis |
|
allows careful design -- |
|
model causes and effects |
|
Purchase = f(price, color, size, custumer inc.,
gender,. ,, |
|
costly |
|
often small to make collection manageable |
|
imposes delays |
|
(common) Use data collected for other purposes |
|
take advantage of what is readily available |
|
low cost |
|
filtering, reformatting, integration |
|
incomplete - rarely covers all causes / effects |
|
biased -- missing categories |
|
only people with phones, cars -- shopping in
super markets |
|
|
|
|
|
Needed when sources have inadequate coverage |
|
in distinct DBs for |
|
Prices, Number
purchased |
|
Customer segments (supermarket, stores, on-line) |
|
implies some expectations |
|
append attributes where keys match: Joe |
|
include semantic match Joe = 012 34 567 |
|
append rows where key types match: customer |
|
include semantic match customer = owner |
|
|
|
|
|
|
Find relationships |
|
already known - ignore or adjust in next round |
|
requires comparison with expert knowledge |
|
now have quantification |
|
unknown |
|
uninteresting per expert |
|
interesting per expert |
|
|
|
|
|
|
|
|
|
|
Already known -- Prior Model |
|
But is it complete, i.e., does it explain all effects ? |
|
Analyze relationships |
|
use expertise
to decide direction |
|
often obvious |
|
"common world knowledge" |
|
sometimes ambiguous |
|
smoking Ø Cancer Ø not-smoking |
|
often major true cause not captured in data |
|
food color 10%, |
|
food price 20%, |
|
buyer gender 2% |
|
unknown
75% |
|
guess: ethnicity, income |
|
|
|
|
1. Is a
Volvo a safe car? |
|
|
|
|
To use results of data mining |
|
have to understand direction of relationships |
|
|
|
|
|
|
|
Language of analyst / Language of modeling |
|
Many causes -- independent variables |
|
A few may be controllable |
|
Some may be controlled by our competition |
|
Others are forces-of-nature |
|
Even more effects -- dependent variables |
|
A few may be desired |
|
Some may be disastrous |
|
Many are poorly understood |
|
|
|
Intermediate effects |
|
Provide a means for measuring effectiveness |
|
Allow correction of actions taken |
|
|
|
|
Analyze Alternatives |
|
Current Capabilities |
|
Future Expectations |
|
|
|
|
|
Process tasks: |
|
List resources |
|
Enumerate alternatives |
|
Prune alternative |
|
Compare alternatives |
|
|
|
|
|
|
|
Back-of-the-envelope |
|
Common |
|
Adequate if model is simple |
|
Assumptions are easily forgotten after some
time, not
distinguished from data "Why are we doing this" |
|
Spreadsheets |
|
Most common computing tool |
|
Specialist modeler can help |
|
New, recent data can be pasted in |
|
Awkward for the tree of future alternatives |
|
3. Constructed to order |
|
Costly, powerful technology |
|
Specialist modelers required |
|
Expressive simulation languages |
|
Requires specialists to set up, run, and rerun
with new data |
|
|
|
|
|
|
|
Wide variety, but common
principle |
|
Inputs Model Output (time, $, place, ...) |
|
Spreadsheets |
|
Identify independent, controlable, and
resulting values |
|
2. Execution specific to query: what-if
assessment |
|
may require HPC power for adequate response |
|
3. Continously executing: weather prediction |
|
Search for best match ( location, time ) |
|
4. Past simulations results collected for future
use |
|
Typically sparse -- the dimension of the futures
is too large: |
|
Tables in a design handbook: materials |
|
Perform inter- or extra-polations to match
query parameters |
|
|
|
|
|
Still needed: Value of alternative outcomes |
|
Decision maker / owner input |
|
Benefits
and Costs |
|
Potential
Profit |
|
|
|
|
|
|
|
|
|
|
|
|
|
Correct for risk, and adjust to present value |
|
|
|
|
Relationships from analyses of past data |
|
Data representing the current state |
|
List of actionable alternatives |
|
Tree of subsequent alternatives |
|
Probabilities of those alternatives |
|
Values of the outcomes |
|
Ability to predict the likelihood of futures |
|
|
|
|
|
|
Support of decision-making requires dealing with
the futures, as well the past |
|
Databases deal well with the past |
|
Streaming sensors supply current status |
|
Spreadsheets, simulations deal with the likely
futures |
|
Future information systems should combine all
these sources |
|
|
|
|
|
Build super systems |
|
Coherent, consistent |
|
Expensive |
|
Unmaintainable |
|
Too many cooks: |
|
Database folk |
|
Data miners |
|
Analysts |
|
Planners |
|
Simulation specialists |
|
Decision makers |
|
|
|
|
|
|
|
|
|
|
|
Simulation results are mapped to |
|
alternative Courses-of-actions |
|
Information system should support model driving
the the computation and recomputation of likelihoods |
|
Likelihoods change as now moves forwards and
eliminates earlier alternatives. |
|
|
|
|
|
|
|
|
|
|
What
human interfaces can support the decision maker? |
|
How to
move seamlessly from the past to the future? |
|
What
system interfaces are good now and stay adaptable |
|
How can
multiple futures be managed (indexed)? |
|
How can
multiple futures be compared, selected? |
|
How
should joint uncertainty be computed? |
|
How can
the NOW point be moved automatically? |
|
|
|
|
How little of the model needs to be exposed? |
|
How can defaults be set rationally? |
|
How should expected execution cost be reported? |
|
How should uncertainty be reported? |
|
Are there differences among application areas
that require different language structures? |
|
Are there differences among application areas
that require different language features? |
|
How will the language interface support
effective partitioning and distribution? |
|
|
|
|
Interfaces define service potentials |
|
Server is an independent contractor, defines
service |
|
Client selects service, and specifies parameters |
|
Server’s success depends on value provided |
|
Some form of payment is due for services |
|
|
|
|
|
A new service for Decision Making: |
|
follows database paradigm |
|
( by about 25 years ) |
|
coherence in prediction |
|
displacement of ad-hoc practices |
|
seamless information integration |
|
single paradigm for decision makers |
|
simulation industry infrastructure |
|
investment has a potential market |
|
should follows database industry model: |
|
Interfaces promote new industries |
|
|
|
|
|
|
|