What & When happened?

In last weekend, Manta Atlantic halted to produce blocks at block manta’s block #2449972, or polkadot’s block #19720321, exactly at 2024-03-02 06:03:48(UTC). All collators got stuck at importing the block #19720321 from the polkadot.

Here’s the error from running collator’s logs.

Mar 02 06:03:49 xx.manta.systems manta[983]: 2024-03-02 06:03:49 [Relaychain] �<9F><92><94> Error importing block 0xaf540fcfaf1ace474c8a8fe96ca1d2501de72990d3f04e8639219e97f4b99436: consensus error: Chain lookup failed: No block weight for parent header.
Mar 02 06:03:49 xx.manta.systems manta[983]: 2024-03-02 06:03:49 [Parachain] �<9F><92>� Idle (16 peers), best: #2449971 (0x9c5a�<80>�6340), finalized #2449971 (0x9c5a�<80>�6340), �<87> 2.2kiB/s �<86> 0.1kiB/s
Mar 02 06:03:49 xx.manta.systems manta[983]: 2024-03-02 06:03:49 [Relaychain] �<9F><92>� Idle (27 peers), best: #19720317 (0xe126�<80>�1167), finalized #19720318 (0x4349�<80>�f223), �<87> 159.1kiB/s �<86> 107.9kiB/s
Mar 02 06:03:49 xx.manta.systems manta[983]: 2024-03-02 06:03:49 Accepting new connection 1/100
Mar 02 06:03:50 xx.manta.systems manta[983]: 2024-03-02 06:03:50 [Relaychain] �<9F><92><94> Error importing block 0xaf540fcfaf1ace474c8a8fe96ca1d2501de72990d3f04e8639219e97f4b99436: consensus error: Chain lookup failed: No block weight for parent header.
Mar 02 06:03:50 xx.manta.systems manta[983]: 2024-03-02 06:03:50 [Relaychain] Ignoring request to disconnect reserved peer 12D3KooWRymbwueSWtmjxx4HkejtGRmSa4T1QXJMmTtFfVUU7AJj from SetId(1).

This is fatal log:

Mar 02 06:03:50 xx.manta.systems manta[983]: 2024-03-02 06:03:50 [Relaychain] �<9F><92><94> Error importing block 0xaf540fcfaf1ace474c8a8fe96ca1d2501de72990d3f04e8639219e97f4b99436: consensus error: Chain lookup failed: No block weight for parent header.

This log means the client was looking for the polkadot’s block #19720321, but which was not synced completely due to the corruption of polkadot’s db, all collators failed to importing this block and block production stalled.

This is a know issue: ‣, some teams had the same problem before.

How to fix?

There’s two days to fix.

  1. Restart collator with the param --relay-chain-rpc-urls pointing to polkadot nodes.

    This is the quickest way to address this issue.

        --relay-chain-rpc-urls 'wss://1rpc.io/dot' \\
        --relay-chain-rpc-urls 'wss://polkadot-public-rpc.blockops.network/ws' \\
        --relay-chain-rpc-urls 'wss://polkadot.api.onfinality.io/public-ws' \\
        --relay-chain-rpc-urls 'wss://rpc.ibp.network/polkadot' \\
        --relay-chain-rpc-urls 'wss://polkadot-rpc.dwellir.com' \\
        --relay-chain-rpc-urls 'wss://polkadot-rpc-tn.dwellir.com' \\
        --relay-chain-rpc-urls 'wss://rpc.dotters.network/polkadot' \\
        --relay-chain-rpc-urls 'wss://rpc-polkadot.luckyfriday.io' \\
        --relay-chain-rpc-urls 'wss://polkadot.public.curie.radiumblock.co/ws' \\
        --relay-chain-rpc-urls 'wss://rockx-dot.w3node.com/polka-public-dot/ws' \\
        --relay-chain-rpc-urls 'wss://dot-rpc.stakeworld.io' \\
    

    note for collators choose to run in containers, there’s an existing known issue with outdated ca in the latest v4.6.1 manta docker image

    https://github.com/Manta-Network/Manta/issues/1318

    to workaround this mount the host /usr/share/ca-certificates and /etc/ssl/certs into the container

    -v /usr/share/ca-certificates:/usr/share/ca-certificates:ro \\
    -v /etc/ssl/certs:/etc/ssl/certs:ro \\
    
  2. Re-sync polkadot blocks, but completing the synchronization might take days.

So we used the first way to recover block production asap during the last weekend.

How to avoid it in the future?

Currently manta client is falling behind the latest polkadot-sdk a little bit, but manta engineering team will try their best to align the dependencies with polkadot-sdk.

Thanks for your patience and understanding!