Thursday, March 30, 2006

Replies about my post "How to speed up the code?'

I got several interesting and useful replies from various people. Some of those people have experience the same problem before. Except for improving the algorithm, they give various solution, like 1): change to the compilers which can compile your code optimally. like icc. 2): optimal cache/memory. 3) MMX. 4) calling intel performance lib...


I gather them as following

  1. Intel P4 Manual
  2. no offense, but i think these made-up test cases are waste of time
  3. He at least speed up his own code by 15% by trying different things.
    tiling/padding the for loop to improve the cache performance.
  4. Yes, support LZ.
  5. These test cases are not designed correctly. For example, looping a lot oftimes to do a simple assignment is not right. If you compile in releasemode with optimization on, it will be optimized. In VC, just compiling in release mode usually will double the speed. Your 15% is really nothing. Atleast, you need to do all the test again in release mode, which might already optimized the performance and you modification might not helpat all.Double precision calculation is slow. But float is quite fast. Look at the benchmark of the functions in intel performance primitive lib. Many float point based operations are faster than integer.Memory bandwith is another bottle neck. Trying to improve L1/L2 cache hit rate is important.I once optimized an algorithm and got it more than 10 times faster (of courseincluding algorithm chnage and MMX and calling intel performance lib).
  6. well it's good to do some experiments, but i don't think his experiment is really valuable...
  7. I don't think he has much clue either.I think we should still encourage the spirit of trying.
  8. the most part for your 10 fold increasing is likely due to algorithm change,other code optimization is not likely to improve the efficiency sodramatically.
  9. For large dataset applications,optimize toward cache/memory can even speed up 100x or more.
  10. Of course. I just want to point out that the most important thing isto optimize the algorithm and data/work flow. Then some mmx intrinsicand cache and so on. He should start from the code he had instead of testing some trivialthings (not mentioning those unrealistic testing cases). Find the most time consuming parts first and then focus on those parts.I have found out some approximated calculation of arctan is even faster than lookup table. But those optimization depends on your algorithm andthe accuracy you need. Get deep understanding of the code you are optimizing is the most important thing.
  11. The utmost rule for optimization, is to shorten your CODING time.Only in very rare case the code need special optimization tohit the speed requirement on modern CPUs, even for commercial programs.90% of codes have much more coding time than run time......If I estimate that a program can get result in one week on one CPU,I will not optimize it at all. I have to use SSE float instructionsto optimize some of my programs, because they runs more than onemonth on 7 CPUs. I give up to optimize a program running 2 weekson 6 CPUs recently, although I know I can speed it up 2x or more.The coding time, is much longer and need more attention.
  12. Faint, I wouldn't suggest SSE optimization for this. I still suggestlooking into the algorithm or optimizing it by multi-threading as youmight have done.If you need to debug, tune parameters, I am not sure how can you not runit several times. Then, several months have passed. Are you saying youwrite the code for more than 1 year? Otherwise, I think it belongs to the10% code you categerized as having more running time than coding time andshould be worth the optimization.Anyway, I prefer to write code that will be used more frequently at acceptable speed. But for different fields, it is hard to compare.
  13. Once you turn the optimization on, compilers can make usually makebetter decisions regarding how to make array access fast. Optimizationby hand is not recommended. But then again, not every compiler can optimize well enough. GCC forexample is not strong at that.
  14. nod. different compilers perform differently. For some c++ codes I recentlywrote, optimized executable by g++ runs faster than icpc optimized executable.
  15. that means you didn't turn on right flags of icpc. gcc/g++ is very hard to beat icc.
  16. Probably not. I only turn on -O2 for both g++ and icpc though.What options are usually recommended for icpc?
  17. what i usually use is -ipo -fast -fp-model -unroll0
  18. I recall that -ipo requires some extra work on the programming side, is thatcorrect?
  19. that's true for some code.
  20. If there is any possible of algorithm optimization, how could I performSSE optimization? I use 7 CPUs of cluster, if not multi-threading,I'm really mad, and maybe the most stupid person in the world.It can burn hundreds of CPUs at the same time if applicapable.The SSE speeds up around 4-5x. It's certainly worth to save the 3-4months of run time on 7-CPU cluster. And the optimization is onlyperformed on less than 500 lines of C++ code, within a project ofmore than 10k lines. And no one will run debug version on full dataset. There will alwaysbe a small dataset for debuging. And the program should write intermediatedata to disk and restart from last saved state, if you really needrun it for months.You write code of what you need, not what you prefered.You have NO choice.For more than 90% of the code, execution speed is not a issue at all,even it will run millions of times.

Wednesday, March 29, 2006

Tuesday, March 28, 2006

The Wisdom of Crowds

It is so interesting to hear there is such kind of research. It says that joint decision by many people is better than individual's decision in most cases. This is interesting story below,
The opening anecdote relates Francis Galton's surprise
that the crowd at a county fair accurately guessed the weight of an ox when their individual guesses were
averaged (the average was closer to the ox's true weight than the estimates of
most crowd members, and also closer than any of the separate estimates made by
cattle experts).

As I know, some company will do the forecasting based on the views from the different people.

It's only $15 dollar to get this book in Amazon. Unlike chinese book, I can read an un-math intensive book really fast. Now I already have a long English book list to complete. And this list is longer and longer every day. Mission impossible!!!

Multi-Echolen Problem: Too subtle

Every time, I read mult-echolen related literature and textbook, I will have different questions about definition (like sequence of events and echolen inventory) and proof. Until today, I read the first section of the chapter written by Federgruen called 'Centralized planning models for Multi-Echelon Inventory Systems under Uncertainty'.

Federgruen said the common event of sequence now is different from it by Clark and Scarf. Its prelimary of proof is much clearer than Axsater's book.

Sunday, March 26, 2006

Avian Flu and Y2k

These two may both cause supply disruption. The difference as today CNN news point is quite interesting point of view.
Y2k: You know when it would happen, but you don't know what it costs.
Avian Flu: You know what it costs, but you don't know when it will happen.

Why we know what it costs? In the news, it is said that there is always three big Avian Flu in every century. So maybe people can estimate its cost from the history. But that cost is related how much effort people try to prevent or stop. Anyway, it is interesting criteria to divide different types of disruption, one time or recurrent?

Saturday, March 25, 2006

How to speed your code in C++: int vs double & Global vs Local & Array vs two dimensional Array

Today, my main task is to speed up one of my algorithm. I am really novice in C++. I try seveal modifications.

int vs double
For example, change unnecessary double functions/parameter to int. If you really need double parameter like 1.2, you can compute based on 12. After finishing major computation, you can scale it correspondingly. It will save time especial when there is float multiplication.

global vs local variable
global variables really consume time.
For example, if you define a global variable

int c = 20;

% Then you do the loop in main()
for ( ni = 0; ni <=900000000; ni++)
x = c;

The seconds I got is 2.25, 2.4, 2.31, 2.21, 2.2, 2.27, 2.39

if you define a local variable in main()
Then the second consumed sample is 1.83, 1.83, 1.82, 1.83, 1.82, 1.82, 1.82

The gap is obvious. Another interesting thing is variance of first experiment is also significant.

One dimension array vs two dimension array
I create one dimension array and two dimension array with the same size.

int e[20000];
int d[40][500];

then I do the similar experiment 3
for ( ni = 0; ni <=900000000; ni++)
x = e[10000];
The seconds consumed is 1.83, 1.83, 1.82, 1.83, 1.82, 1.82, 1.82, 1.82

experiment 4
for ( ni = 0; ni <=900000000; ni++)
x = d[20][500];
The seconds consumed is 2.28, 2.27, 2.27, 2.32, 2.27, 2.21

It seems it is better to project two dimension array to one dimension array.

But in individual case, you need consider time of converting two dimension index to one dimension.

Finally, my code speeds up average by 15%.

Anyone can provide his experience about speeding up code?

Friday, March 24, 2006

Oh: Disruption May cause tens of billion US dollar

It is reported that "GM agrees to finance Delphi buyouts".

"The agreement provides a framework that will allow them to get some
significant things done," he said. "It takes the issue of a strike out of the
picture and puts a higher level of certainty on what the costs will be for GM on
both the Delphi restructuring as well as their own
downsizing."
GM said last week a labor deal at Delphi would probably cost the company
between $5.5 billion and $12 billion. The No. 1 automaker also said expects to
increase the charge for its contingent exposure to $3.6 billion in 2005 from the
previous estimate of $2.3 billion.


We can see the potential loss of disruption may cause GM more than billion. But the question is after doing so, can GM gain much lower price from Delphi in the future? We cannot overestimate the loss of disruption, although I hope it will be huge coz I am do research on that

Thursday, March 23, 2006

One heuristic method to solve call center schedule problem

After thinking for a while, I find the following method may work
  1. Divide a day into hourly block
  2. Based on the predetermined performance measure, compute how many servers needed
  3. Use these request of servers in each hour as demand, and then treat the whole schedule problem as LP

This method only suitable for the case that average service time is much less than an hour so that PSA(Pointwise Stationary Approximation) rule is fit. Otherwise, the interation among consecutive hours will destroy the whole formulation.

Wednesday, March 22, 2006

Using IT to boost call-center performance

Recently, McKinsey publishes an article about call center performance.
Almost all large organizations have invested in software to schedule the
workforce, track performance, and monitor quality. In many cases, however, these
systems have proved cumbersome and have led to suboptimal outcomes. Newer
systems let companies manage a call-center workforce with an approach closer to
the theoretical optimum; for example, these systems make it easier to ensure
that all teams, across a number of call centers, have the same capabilities.

It is clear that IT+OR will bring lots of advantage to the big companies. Once the top management realize the value of OR, we can turn our intellegence to real value.

The issue for the Queueing people, optimization is extremely hard for large queueing system. Who can guess which algorithms involves in this problem?

Tuesday, March 21, 2006

Sorry to hear Dantzig passing out

I heard a story about Dantzig and Simplex method and still not sure it is real or not.
One day Dantzig entered classroom late. And he found several questions in the blackboard. He thought it was homework.
A few days later, he came to his professor that the homeworks were too hard and he can only solve one questions. The method to solve that question is so called "Simplex method" now, which is one of top 10 algorithms in 20 century algorithm.

You can edit Wikipedia

I just find that Wikipedia applies wiki tech so that people can edit the content.
For example, you can add more content under Game theory.
But I didn't find any template of Operations Research like Game theory. Maybe someone can fill this blank.

Ultimatum game(Splitting dollar game)

Last week, Giuseppe A. Paleologo came to our department and give a talk on the experiment study on the wholesale price contract.

As expected, people don't behave "rational" in this setting. In the Ultimaturn game, experiment study reveals people tend to be "fair".

Majority of research in OR field are done based on same assumption in Economy - "Rational people". The question is what if this hypothesis fails?

Schweitzer and Cachon examin people behavior in the very simple game newsvendor. I think it is really good start point.

But the problem is that most applications in the OR field may be too complex to do experiment especially when reward is not high to make people more "rational".

Sunday, March 19, 2006

Some useful toolboxs in Matlab for the optimization purpose

I will add more if I find or someone tell me.

Hybrid Toolbox


The Hybrid Toolbox is a Matlab/Simulink toolbox for modeling, simulating, and verifying hybrid dynamical systems, for designing and simulating model predictive controllers for hybrid systems subject to constraints, and for generating linear and hybrid MPC control laws in piecewise affine form that can be directly embedded as C-code in real-time applications.

YALMIP

YALMIP is a MATLAB toolbox for rapid prototyping of optimization problems. The package initially focused on semi-definite programming, but the latest release extends this scope significantly. YALMIP can now be used for:
convex linear, quadratic, second order cone and semidefinite programming.
non-convex quadratic and semidefinite programming (local & global).
mixed integer conic programming.
multi-parametric parametric programming.
geometric programming.

More toolboxes for Matlab are in Matlab Central

A post talking about discrete optimization

In orginal post, the author lists the all kinds of algorithm for the discrete optimization problem
1 Local search methods
2 Ant algorithms
3 Descending search methods
4 Random search methods
5 Tabu search method
6 Adaptive memory search methods
7 Beam search methods
8 Path search methods
9 Evolutionary, genetic search methods
10 Threshold search methods
11 Scatter search method
12 Artificial intelligence methods
13 Simulated annealing method
14 Simulated jumping method
15 Expert systems methods
16 Multiagents methods
17 Neural networks methods
18 Randomized methods
19 Geometric method
20 Cultural algorithms
21 Memetic algorithms
22 Hybrid methods
23 Parallel methods

Then based on others comments, he modifies them,
1 Metaheuristics
1.1 Local search methods
1.2 Simulated annealing method
1.3 Tabu search method
1.4 Evolutionary, genetic search methods
1.4.1 Cultural algorithms
1.5 Ant algorithms
1.6 Memetic algorithms

2 Descending search methods
3 Random search method (historically known as Monte Carlo method)
4 Adaptive memory search methods
5 Beam search methods
6 Path search methods
7 Threshold search methods
8 Scatter search method
9 Artificial intelligence methods
10 Simulated jumping method
11 Expert systems methods
12 Multiagents methods
13 Neural networks methods
14 Geometric method
15 Hybrid methods

We Were Warned: Tomorrow's Oil Crisis

Yesterday, I watched "CNN Presents Classroom: We Were Warned: Tomorrow's Oil Crisis". It seems oil crisis may bring disaster to SCM coz the variable cost of transportation would increase too much. Everything maybe localize. Once fixed cost is relatively low, most OR models will lose their meaning.