Must the following instruction wait for completion of a memory load ?

If a memory load need N cycles to be completed, must N instructions immediately followed the
load instruction wait even these N instructions do not need the result of load ?

No, only the instruction that uses the result of the load (and any instructions following it) will stall waiting for the load.

What’s more, is there any cycle penalty for a instruction which is immediately following a store instruction and change the source register of the store instruction ?
No. The only interlocks are trying to read or write a register that a previous long-latency instruction (load, multiply, divide) is still waiting to write.

Also if you write to a memory address than then try to read from the same address while the store is still in the write buffer (the depth isn’t documented) then there’s a 5 cycle penalty.