Thursday 10 July 2014

Java: How to remove accents

How to remove accents in strings [1]

attempt 1 (remove accents) -- with problems:
  Normalizer.normalize(str, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
  ç gives c
  ' gives '
  ê gives e
  Ö gives O
  ü gives u.
  ´ gives ´
  but Ø gives Ø

Accents are removed! But letters (non-ascii) are not removed.

attempt 2 (Remove non-ascii chars) -- successfully!
  Normalizer.normalize(str, Form.NFD).replaceAll("[^\\p{ASCII}]", "")
  ç gives c
  ' gives '
  ê gives e
  Ö gives O
  ü gives u.
  ´ gives nothing
  Ø gives nothing

[1] This may be useful e.g. if you want to give filenames in non-unicode filesystem.

No comments:

Post a Comment